Performance Bug when calculating speaker embeddings #1614

asusdisciple · 2024-01-12T14:49:15Z

Tested versions

Appears in 3.1.0

System information

Ubuntu 22, Lenovo P1 Gen 5 Workstation A4500

Issue description

So I stumbled upon something, which I found very strange.
I used the pyannote 3.1.0 Speaker-Diarization Pipeline to
diarize a sample audio file, which is 15 min long. It took about 24 sec.

For my case I need speaker embeddings so a few months ago I implemented my own method for extracting feature embeddings per speaker from the cropped audio using audio.crop() from pyannote. Extracting the embeddings with Wespeaker Resnet293 takes about 8 seconds.

So far so good.

I noticed that you provide the option to return speaker_embeddings now, which is great and works fine with the small resnet34 model you use. However when I changed the embedding model to resnet293 by using your converting script (I just replaced the numbers and pointed to the right pytorch model from wespeaker in the path) I needed 240 seconds to diarize the file.

I wondered how this is possible given that I only need 8 seconds to manually extract speaker embeddings with the same model. My approach is the same as yours, just way more basic and without batching. I aggregate all VAD timestamps for a speaker, cut the audio accordingly, merge it back together and extract the embedding with the model. The cosine similarity between your and my vectors was ~0.99x. If there happends to be a bug in your code (which I actually do not think) this could be a good performance boost.

Maybe you have a clue if the problem lies with the model converting script or something else. I can provide you with a model file if you want to or the code snippets how I extract the embeddings, but like I said its pretty basic.

Minimal reproduction example (MRE)

Unfortunately this is hard to rewrite in a reproducer with all the files

The text was updated successfully, but these errors were encountered:

hbredin added the cannot_reproduce label Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Bug when calculating speaker embeddings #1614

Performance Bug when calculating speaker embeddings #1614

asusdisciple commented Jan 12, 2024

Performance Bug when calculating speaker embeddings #1614

Performance Bug when calculating speaker embeddings #1614

Comments

asusdisciple commented Jan 12, 2024

Tested versions

System information

Issue description

Minimal reproduction example (MRE)