Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The validation loss is rising #54

Open
3139725181 opened this issue Oct 1, 2021 · 10 comments
Open

The validation loss is rising #54

3139725181 opened this issue Oct 1, 2021 · 10 comments

Comments

@3139725181
Copy link

The loss of my training set looks normal, but the loss of the validation set has been rising. The loss of my training set looks normal, but the loss of the validation set has been rising. The structure of the validation set is:
[speaker, speaker_onehot, (spmel, raptf0, len, chapter)],
spmel and raptf0 were extracted by make_spect_f0.py directly.
Is there any problem with this?

I tried several times and the loss of validation set is rising.

@auspicious3000
Copy link
Owner

Sounds like overfitting.

@3139725181
Copy link
Author

image

This doesn't look like overfitting, is the structure of my validation set correct?

[speaker, speaker_onehot, (spmel, raptf0, len, chapter)],
spmel and raptf0 were extracted by make_spect_f0.py directly.

@auspicious3000
Copy link
Owner

The structure of the validation set does not matter. Just make sure the input to the model is correct.

@3139725181
Copy link
Author

Yes, I want to confirm that both mel and raptf0 are extracted directly through make_spect_f0.py and used as input? Or did you do some processing?

@auspicious3000
Copy link
Owner

Yes. Same as training

@AShoydokova
Copy link

Hello. Did you figure out the issue? I have kinda similar issue where my validation loss isn't getting smaller. It fluctuates around 180-200. Only my training loss for Generator (G) keeps getting smaller, while the training loss for P fluctuates around 0.01 - 0.02.

I've trained the model on speech commands data sets which is 1 word data set of 1 second. Could it be that SpeechSplit won't perform well on such data?

@auspicious3000
Copy link
Owner

@AShoydokova 1. Your training set may be too small. 2. validation setup should be consistent with training

@AShoydokova
Copy link

@AShoydokova 2. validation setup should be consistent with training

Gotcha. Yes, my data is small and I trained even on smaller subset of it to get quick results.

Could you elaborate on the point 2? I've created validation as 0.10 of total data and only marked data point as validation if the speaker was already in the training data. Should I consider more things? Thank you again for the model and quick responses!

@ZZdozeoff
Copy link

@AShoydokova 2. validation setup should be consistent with training

Gotcha. Yes, my data is small and I trained even on smaller subset of it to get quick results.

Could you elaborate on the point 2? I've created validation as 0.10 of total data and only marked data point as validation if the speaker was already in the training data. Should I consider more things? Thank you again for the model and quick responses!

I also use VCTK dataset same as paper , but get the rising validation loss before.
And I found that just concatenate the multiple wavs into a longer wav can slove this problem.
For training set ,one speaker finally has one longer wav , like the demo training data.
I think that some operation in dataloader cause this situation , you can see data_loader.py
So,maybe you can try to concatenate your training data to a longer one , and train it again.

@9527950
Copy link

9527950 commented Mar 3, 2023

@AShoydokova 2. validation setup should be consistent with training

Gotcha. Yes, my data is small and I trained even on smaller subset of it to get quick results.
Could you elaborate on the point 2? I've created validation as 0.10 of total data and only marked data point as validation if the speaker was already in the training data. Should I consider more things? Thank you again for the model and quick responses!

I also use VCTK dataset same as paper , but get the rising validation loss before. And I found that just concatenate the multiple wavs into a longer wav can slove this problem. For training set ,one speaker finally has one longer wav , like the demo training data. I think that some operation in dataloader cause this situation , you can see data_loader.py So,maybe you can try to concatenate your training data to a longer one , and train it again.

Hello, I am using the demo.pkl file provided in the code for my validation set, which has only 2 voices in it and the loss keeps going up during training. Could you please tell me if you have made changes to this part? Can you share your code? Or share the code and hyperparameter settings of the solver part. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants