Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Things to check when testing with different DB #55

Open
hash2430 opened this issue Sep 16, 2019 · 3 comments
Open

Things to check when testing with different DB #55

hash2430 opened this issue Sep 16, 2019 · 3 comments

Comments

@hash2430
Copy link

hash2430 commented Sep 16, 2019

This might be a silly question so I willl begin with apology.
I am new to speaker verification and I am trying to apply this repo for VoxCeleb1.
DataLoading and other stuffs seems trivial but I have a question regarding EER calculation.

for thres in [0.01*i+0.5 for i in range(50)]

similarity threshold in this case ranges from 0.5 to 0.99
Does this range need calibration when I am using different DB?
It seems from anther repo (DeepSpeaker) that uses VoxCeleb, that range appers to be different.
`

Calculate evaluation metrics

thresholds = np.arange(0, 30, 0.01)
tpr, fpr, accuracy = calculate_roc(thresholds, distances,
    labels)
thresholds = np.arange(0, 30, 0.001)
val,  far = calculate_val(thresholds, distances,
    labels, 1e-3)

`

I got 16% EER on VoxCeleb1.
Can anybody give any advice on the tuning points I have to adjust?
Or is there anyone who have different EER using VoxCeleb1?

@BarCodeReader
Copy link

I also have a quite high EER on VOX1, also my GE2E loss is quite high, around 20.
I think we can obtain very good result on TIMIT is just because the dataset is simple, 630 people repeating 10 sentences which gives you 6300 utterances. but VOX is 1250 speaker each with 15-30 unique sentences...

have you continue your experiment on the UIS-RNN using the EER 16% model?

@hash2430
Copy link
Author

Thanks for your kind answer.
My purpose of training speaker verification is to use it as objective evaluation for speaker mimicking(generating speech of a new person who is unseen during training).
So UIS-RNN is out of my interest. That is for speaker diarization, right?

Plus, I obtained 16% by following manner.

  1. Increase epoch to 1800 => 18% EER
  2. Do not use centroids from validation set at test time (instead, only enrollment embeddings to calculate centroids) =>16 %
    I did not expect second approach would give better EER. I just thought it would make more sense and I cannot explain why it gave better performance.

I might try using this repository trained with TIMIT to evaluate speaker verification on synthesized speech. Since synthesized speeches are high in SNR than VoxCeleb1 and it is more like TIMIT.
Thanks :D

@BarCodeReader
Copy link

Oh, I see, I only use 300 epoch on the training and the loss decrease very slow and almost stop around 20...actually this is also my question on the GE2E Loss training...how do I know if I trained too much and thus the model is overfitting?
for the TIMIT the loss is very low and converge very fast...atually 300 epoch you can already have loss 1.0+

also, one reminder for you...if you also use this model to create d-vector, you need to change the number in the yaml...you can refer to the question i asked in this repo...i think there are some mistake...but for training the LSTM, the yaml file setting is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants