New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Add Recipe for all 3 Training stages - XTTS V2 #3704
Comments
Ok so here you go. I picked the code for training from this repo.
Wrote a custom
This trains the DVAE to encode and decode mel-spectograms. Few things:
Next step would be to fine-tune a larger dataset. @erogol @eginhard if this is in the right direction, I can convert this into a training recipe PS: The code is a bit dirty since I have just re-used whatever was available as long as it doesn't harm my training. |
I also now understand that the decoder of DVAE is not used, but instead an LM head is used on the GPT-2 to recompute the mel from the audio-codes. Need to understand this a bit better before writing the next stage training code. |
馃殌 Feature Description
Hey, we saw that there is no training code for fine-tuning all parts of XTTS V2. We would like to contribute if it adds value.
The aim can be to make it work very reliably on a particular accent [Indian for eg.], in a particular language[English], in a particular speaking style with very little variability. We tried simply fine-tuning and it seems like it learns the accent somewhat and the speaking style, but is not super robust and mispronounces quite a lot.
Solution
We are not sure if the perceiver needs any fine-tuning.
If licenses permit, we will also share the data.
Does this make sense?
The text was updated successfully, but these errors were encountered: