Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with preserving the speaker identity #16

Open
justinjohn0306 opened this issue Aug 3, 2023 · 3 comments
Open

Issues with preserving the speaker identity #16

justinjohn0306 opened this issue Aug 3, 2023 · 3 comments

Comments

@justinjohn0306
Copy link

Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.

@adelacvg
Copy link
Owner

adelacvg commented Aug 3, 2023

The pre-trained model is trained on VCTK dataset. It is not large enough and may not works well on data in the wild. I am working on improving the generalization of the model by modifying the network structure. You can fine-tune or train the model by yourself for better results.

@justinjohn0306
Copy link
Author

alright, gotcha :)

@rishikksh20
Copy link

@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants