Catalan Text to Speech

Based on Microsoft's FastSpeech

The model has following advantages:

Robustness: No repeats and failed attention modes for challenging sentences.
Speed: The generation of a mel spectogram takes about 0.04s on a GeForce RTX 2080.
Controllability: It is possible to control the speed of the generated utterance.
Efficiency: In contrast to FastSpeech and Tacotron, the model of ForwardTacotron does not use any attention. Hence, the required memory grows linearly with text size, which makes it possible to synthesize large articles at once.

Check out the latest audio samples (ForwardTacotron + WaveRNN)!

🔈 Samples

The samples are generated with a model trained on 2 hoours of data from Catalan Common Voice and vocoded with WaveRNN, MelGAN, or HiFiGAN. You can try out the latest pretrained model with the following notebook:

Name		Name	Last commit message	Last commit date
Latest commit History 698 Commits
assets		assets
docs		docs
models		models
notebook_utils		notebook_utils
notebooks		notebooks
tests		tests
trainer		trainer
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.ipynb		.ipynb
Catalan_Text_To_Speeh_Demo.ipynb		Catalan_Text_To_Speeh_Demo.ipynb
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
gen_forward.py		gen_forward.py
gen_tacotron.py		gen_tacotron.py
osmnx_routing.ipynb		osmnx_routing.ipynb
preprocess.py		preprocess.py
requirements.txt		requirements.txt
sentences.txt		sentences.txt
train_forward.py		train_forward.py
train_tacotron.py		train_tacotron.py
train_wavernn.py		train_wavernn.py

License

mehdihosseinimoghadam/Catalan-Text-to-Speech

Folders and files

Latest commit

History

Repository files navigation

Catalan Text to Speech

🔈 Samples

References

Acknowlegements

Maintainers

Copyright

About

Topics

Resources

License

Stars

Watchers

Forks

Languages