Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The latency of wespeaker model is to large #225

Open
SheenChi opened this issue Dec 21, 2023 · 1 comment
Open

The latency of wespeaker model is to large #225

SheenChi opened this issue Dec 21, 2023 · 1 comment
Labels
question Further information is requested

Comments

@SheenChi
Copy link

hello @juanmc2005
I use the hbredin/wespeaker-voxceleb-resnet34-LM (ONNX) model to extract speaker embedding in diarization pipeline, but I found the latency is too large(1300ms) when calculate per chunk with the default params (chunk=5s, step=0.5s, latency=0.5), this can not meet the real time requirement.
I found you post the delay performance is 48ms when use cpu and 15ms use gpu. Is there anything I need to pay attention to when reproducing your performance。
Thank you very much for any suggestions

@SheenChi SheenChi changed the title The delacy of wespeaker model is to large The delatency of wespeaker model is to large Dec 21, 2023
@SheenChi SheenChi changed the title The delatency of wespeaker model is to large The latency of wespeaker model is to large Dec 21, 2023
@juanmc2005 juanmc2005 added the question Further information is requested label Dec 27, 2023
@juanmc2005
Copy link
Owner

Hi @SheenChi, the values I reported were obtained from the output of diart.stream with my hardware: CPU AMD Ryzen 9 and GPU Nvidia RTX 4060 Max-Q.

If you find the model too slow on your hardware you can try using pyannote/embedding, which is the fastest one. If that's still not enough you could try quantizing a model you like or distilling it into a smaller model. Depending on your hardware, I think distillation would be my preferred choice as a first step, but it requires training.

For training I recommend you use pyannote.audio, as it's very reliable for this use case and would give you instant compatibility with diart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants