Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic-Speech-Recognition

UnsupTTS is an unsupervised text-to-speech (TTS) system learned from unparallel speech and text data

If you find this project useful, please consider citing our paper.

@inproceedings{Ni-etal-2022-unsup-tts,
  author={Junrui Ni and Liming Wang and Heting Gao and Kaizhi Qian and Yang Zhang and Shiyu Chang and Mark Hasegawa-Johnson},
  title={Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition},
  booktitle={arKiv},
  year={2022},
  url={https://arxiv.org/pdf/2203.15796.pdf}
}

Speech demo

Speech samples can be found here

Dependencies

fairseq >= 1.0.0 with dependencies for wav2vec-u
ESPnet <= 010f483e7661019761b169563ee622464125e56f
ParallelWaveGAN
LanguageNet G2Ps (For models using phoneme transcripts only)

How to run it?

Download the LJSpeech and CSS10 datasets; modify the paths and settings in source_code/unsupervised/run_css10_cpy2.slurm and tts1/css10_nl/run.sh. Current default language is Dutch (nl) with phoneme transcripts, but you can change the $lang variable to change the language and $trans_type variable to change the transcript type.
Run bash run_css10_cpy2.slurm

Pretrained models

LJSpeech	ASR	TTS
en	link	link

CSS10	Unit	ASR	TTS
ja	char	link	link
hu	char	link	link
nl	char	link	link
fi	char	link	link
es	char	link	link
de	char	link	link
hu	phn	link	link
nl	phn	link	link
fi	phn	link	link

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
css10_nl/tts1		css10_nl/tts1
doc/image		doc/image
source_code		source_code
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

css10_nl/tts1

css10_nl/tts1

doc/image

doc/image

source_code

source_code

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic-Speech-Recognition

Speech demo

Dependencies

How to run it?

Pretrained models

About

Releases

Packages

Contributors 3

Languages

License

lwang114/UnsupTTS

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic-Speech-Recognition

Speech demo

Dependencies

How to run it?

Pretrained models

About

Resources

License

Stars

Watchers

Forks

Languages