New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fairseq voice cloning #3142
Fairseq voice cloning #3142
Comments
Can you give us a code for us to reproduce the problem? |
Just running with any Fairseq model normally, the same way as with XTTS (which clones just fine, version 2 included): |
Can you try this? https://tts.readthedocs.io/en/latest/inference.html#example-voice-cloning-by-a-single-speaker-tts-model-combining-with-the-voice-conversion-model TTS with VC is not supported on terminal AFAIR |
The thing is that running |
@Poccapx XTTS is a voice-cloning model which does this on its own. (And actually can't do without any cloning audio file). |
That’s very important, thank you. Is there a list for models regarding |
pretty sure there is currently only one official one, and that is (not sure if you have to keep the first part "voice_conversion_models" out from the --model_name argument, as i am not using the CLI) you can find a list of all models here: https://github.com/coqui-ai/TTS/blob/dev/TTS/.models.json#L924 |
Right! In the string |
--source_wav is the speech audio you want to convert. |
Using |
sorry for the late reply. for the first example link, its because the About your second example, i have actually no idea. I would guess it has to do with the encoder model and not with the TTS model. But thats just a guess. So maybe i was wrong and you can somehow convert speakers using some vocoder models. Haven't found anything in the documentation about it, so maybe ask in the discussions https://github.com/coqui-ai/TTS/discussions about it. I hate to make advertising, but in case you want, you can give my Application Whispering Tiger a try. It has multiple TTS plugins (including coqui TTS) and together with the RVC Plugin and a RVCv2 model, you can have probably the currently best voice conversion available. (its currently windows only though) |
should be fixed by now. |
UnboundLocalError: cannot access local variable 'dataset' where it is not associated with a value |
Describe the bug
There seems to be an issue of activating voice conversion in Coqui when using Fairseq models. Argument
--speaker_wav
works fine on identical text with the XTTS model, but with Fairseq it seems to be ignored. Have tried both .wav and .mp3, different lengths, file locations/names, with and without CUDA, several languages. There are no errors, just always the same generic male voice. Is this a known issue with voice cloning and Fairseq on Windows’ command line or is something wrong with my setup?To Reproduce
No response
Expected behavior
No response
Logs
No response
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: