Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Bug when calculating speaker embeddings #1614

Open
asusdisciple opened this issue Jan 12, 2024 · 0 comments
Open

Performance Bug when calculating speaker embeddings #1614

asusdisciple opened this issue Jan 12, 2024 · 0 comments

Comments

@asusdisciple
Copy link

Tested versions

Appears in 3.1.0

System information

Ubuntu 22, Lenovo P1 Gen 5 Workstation A4500

Issue description

So I stumbled upon something, which I found very strange.
I used the pyannote 3.1.0 Speaker-Diarization Pipeline to
diarize a sample audio file, which is 15 min long. It took about 24 sec.

For my case I need speaker embeddings so a few months ago I implemented my own method for extracting feature embeddings per speaker from the cropped audio using audio.crop() from pyannote. Extracting the embeddings with Wespeaker Resnet293 takes about 8 seconds.

So far so good.

I noticed that you provide the option to return speaker_embeddings now, which is great and works fine with the small resnet34 model you use. However when I changed the embedding model to resnet293 by using your converting script (I just replaced the numbers and pointed to the right pytorch model from wespeaker in the path) I needed 240 seconds to diarize the file.

I wondered how this is possible given that I only need 8 seconds to manually extract speaker embeddings with the same model. My approach is the same as yours, just way more basic and without batching. I aggregate all VAD timestamps for a speaker, cut the audio accordingly, merge it back together and extract the embedding with the model. The cosine similarity between your and my vectors was ~0.99x. If there happends to be a bug in your code (which I actually do not think) this could be a good performance boost.

Maybe you have a clue if the problem lies with the model converting script or something else. I can provide you with a model file if you want to or the code snippets how I extract the embeddings, but like I said its pretty basic.

Minimal reproduction example (MRE)

Unfortunately this is hard to rewrite in a reproducer with all the files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants