The latency of wespeaker model is to large #225

SheenChi · 2023-12-21T02:22:35Z

hello @juanmc2005
I use the hbredin/wespeaker-voxceleb-resnet34-LM (ONNX) model to extract speaker embedding in diarization pipeline, but I found the latency is too large(1300ms) when calculate per chunk with the default params (chunk=5s, step=0.5s, latency=0.5), this can not meet the real time requirement.
I found you post the delay performance is 48ms when use cpu and 15ms use gpu. Is there anything I need to pay attention to when reproducing your performance。
Thank you very much for any suggestions

juanmc2005 · 2023-12-27T14:47:44Z

Hi @SheenChi, the values I reported were obtained from the output of diart.stream with my hardware: CPU AMD Ryzen 9 and GPU Nvidia RTX 4060 Max-Q.

If you find the model too slow on your hardware you can try using pyannote/embedding, which is the fastest one. If that's still not enough you could try quantizing a model you like or distilling it into a smaller model. Depending on your hardware, I think distillation would be my preferred choice as a first step, but it requires training.

For training I recommend you use pyannote.audio, as it's very reliable for this use case and would give you instant compatibility with diart

SheenChi changed the title ~~The delacy of wespeaker model is to large~~ The delatency of wespeaker model is to large Dec 21, 2023

SheenChi changed the title ~~The delatency of wespeaker model is to large~~ The latency of wespeaker model is to large Dec 21, 2023

juanmc2005 added the question Further information is requested label Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The latency of wespeaker model is to large #225

The latency of wespeaker model is to large #225

SheenChi commented Dec 21, 2023

juanmc2005 commented Dec 27, 2023

The latency of wespeaker model is to large #225

The latency of wespeaker model is to large #225

Comments

SheenChi commented Dec 21, 2023

juanmc2005 commented Dec 27, 2023