zho-tts

Web app, command-line interface and Python library for synthesizing Chinese texts into speech.

Installation

pip install zho-tts --user

Usage as web app

Visit 🤗 Hugging Face for a live demo.

You can also run it locally be executing zho-tts-web in CLI and opening your browser on http://127.0.0.1:7860.

Usage as CLI

zho-tts-cli synthesize "长江 航务 管理局 和 长江 轮船 总公司 最近 决定 安排 一百三十三 艘 客轮 迎接 长江 干线 春运。"

The output can be listened here.

# Same example using IPA input
zho-tts-cli synthesize-ipa "ʈʂː|a˧˩˧˘|ŋ|tɕ˘|j|a˥˘|ŋ˘|SIL0|x|a˧˥˘|ŋ|u˥˩|SIL0|k|w|a˧˩˧|n|l˘|i˧˩˧|tɕː|y˧˥ˑ|SIL0|x|ɤ˧˥|SIL0|ʈʂː|a˧˩˧˘|ŋ|tɕ˘|j|a˥˘|ŋ|SIL0|l|w|ə˧˥|n|ʈʂʰ˘|w|a˧˥|n|SIL0|ts˘|ʊ˧˩˧|ŋ˘|kː|ʊ˥|ŋ|s|ɹ̩˥ˑ|SIL0|ts|w˘|ei̯˥˩|tɕ|i˥˩˘|n|SIL0|tɕ|ɥ|e˧˥|t|i˥˩|ŋ|SIL3|a˥|n|pʰ|ai̯˧˥|SIL0|i˥ˑ|p|ai̯˧˩˧|s|a˥˘|n|ʂ˘|ɻ̩˧˥|s|a˥|n|SIL0|s˘|ou̯˥|SIL0|kʰˑ|ɤ˥˩|lː|wˑ|ə˧˥ˑ|n|SIL0|i˧˥ː|ŋ|tɕ˘|j˘|e˥|SIL0|ʈʂː|a˧˩˧|ŋ|tɕ˘|j|a˥˘|ŋ|SIL0|k˘|a˥˩|n|ɕ|j˘|ɛ˥˩|n˘|SIL0|ʈʂʰˑ|w˘|ə˥˘|nː|y˥˩ˑ|nː|。"

The output can be listened here.

Usage as library

from pathlib import Path
from tempfile import gettempdir

from zho_tts import Synthesizer, Transcriber, normalize_audio, save_audio

text = "长江 航务 管理局 和 长江 轮船 总公司 最近 决定 安排 一百三十三 艘 客轮 迎接 长江 干线 春运。"

transcriber = Transcriber()
synthesizer = Synthesizer()

text_ipa = transcriber.transcribe_to_ipa(text)
audio = synthesizer.synthesize(text_ipa)

tmp_dir = Path(gettempdir())
save_audio(audio, tmp_dir / "output.wav")

# Optional: normalize output
normalize_audio(tmp_dir / "output.wav", tmp_dir / "output_norm.wav")

Model info

The used TTS model is published here.

Phoneme set

Vowels: a ɛ e ə ɚ ɤ i o u ʊ y
Diphthongs: ai̯ au̯ ei̯ ou̯
Consonants: f j k kʰ l m n p pʰ ɹ̩¹ ɻ¹ ɻ̩¹ s t ts tsʰ tɕ tɕʰ tʰ w x ŋ ɕ ɥ ʂ ʈʂ ʈʂʰ
Breaks:
- SIL0 (no break)
- SIL1 (short break)
- SIL2 (break)
- SIL3 (long break)
special characters: 。 ?

Vowels and diphthongs contain one of these tones:

˥ (first tone)
˧˥ (second tone)
˧˩˧ (third tone)
˥˩ (fourth tone)
(none)

¹ These consonants contain also tones.

Vowels, diphthongs and consonants contain one of these duration markers:

˘ -> very short, e.g., ou̯˘
nothing -> normal, e.g., ou̯
ˑ -> half long, e.g., ou̯ˑ
ː -> long, e.g., ou̯ː

Tones and duration markers can be combined, e.g., ə˧˥ː

Speakers

Citation

If you want to cite this repo, you can use the BibTeX-entry generated by GitHub (see About => Cite this repository).

Taubert, S. (2024). zho-tts (Version 0.0.2) [Computer software]. https://doi.org/10.5281/zenodo.11048515

Acknowledgments

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 416228727 – CRC 1410

The authors gratefully acknowledge the GWK support for funding this project by providing computing time through the Center for Information Services and HPC (ZIH) at TU Dresden.

The authors are grateful to the Center for Information Services and High Performance Computing [Zentrum fur Informationsdienste und Hochleistungsrechnen (ZIH)] at TU Dresden for providing its facilities for high throughput calculations.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
examples		examples
img		img
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
pylintrc		pylintrc
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

License

stefantaubert/zh-tts

Folders and files

Latest commit

History

Repository files navigation

zho-tts

Installation

Usage as web app

Usage as CLI

Usage as library

Model info

Phoneme set

Speakers

Citation

Acknowledgments

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages