-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to use it for speaker verification #34
Comments
Hi Willy, I think the task you are mentioning would be open-set speaker identification. Assuming you're using a neural network to do it, there are two approaches.
On the other hand, in speaker verification, you normally have a claimed (or target) speaker and an input utterance. You compare the two and make a binary decision, about whether they are the same or not. |
Thanks JungJee! You are right, my task is for open-set, speaker verification, which is the case in your description in item-2. More question:
Could you please share your opinion on these question? |
Maybe I'm the one who's confused. In my understanding, if what you want is not an "open-set speaker identification" but "speaker verification" you can simply compare the speaker embedding of the enrollment and the test utterance. If there exist multiple enrollment utterances (from your explanation, I think it is "M"), you can average their speaker embeddings to derive one speaker embedding representing one speaker. |
Hi Jungjee, Thanks for your reply. In my understanding, "open-set" means the models can be used to recognize any speaker, and the speaker will sure not be included in the training set of the models. Thanks, |
Hi @wwyl2000 ,
Yes, you can and that what many of us do.
Also yes, for this one. The former would be averaging in the embedding-level and the latter (this one) would be averaging in the score-level. Both works and are employed in various research papers. |
Thanks Jungjee! |
Hi JungJee,
After I trained the models, I want to see use it for speaker verification. I got a test set, say N speakers, and each has (M-enroll utterances, and L-test utterances).
Should I just enroll the N speakers, using M utterances, and then for each speaker N's each utterance in L-test, calculate the scores against the N speakers, and select the speaker who score is the highest?
Thanks,
Willy
The text was updated successfully, but these errors were encountered: