Shanghainese TTS

Dartmouth LING 48 Final Project: Improving TTS for Shanghainese
Yuanhao Chen yuanhao.chen.25@dartmouth.edu Spring 2023

Goal

To build a text-to-speech (TTS) system for Shanghainese from scratch, seeking to improve the production of tone sandhi compared to existing models by paying special attention to preprocessing of text.

Description

See writeup/main.pdf.

Dependencies

pip install -r phonemisation/requirements.txt
pip install -r speech_synthesis/requirements.txt
pip install -r comparison_questionnaire/requirements.txt  # for analysis of questionnaire results

Usage

See speech_synthesis/README.md.

Structure

phonemisation/: contains the phonemisation module
- See explanation of output in phonemisation/__init__.py
- Usage: python -m phonemisation "text to phonemise"
- Mechanism: Chinese sentence — word segmentation ⟶ Chinese words — romanisation ⟶ Shanghainese pinyin — phonemisation ⟶ Shanghainese phonemes
  - jieba is used for word segmentation
  - A Shanghainese dictionary I previously made is used for romanisation
    - Uses Qieyun module to add the tone number 1 to syllables of 陰平 yinping/inbin tone; other tones are phonologically unmarked
  - The romanisation_to_ipa function in romanisation.py contains the phonemisation function
make_metadata.py: uses the phonemisation module to convert transcription into IPA and generate metadata for training
- See below in data/
data/: contains the dataset used for training
- The transcriptions and audio files are adapted from this repo
  - Downsampled to 16kHz for training
  - Currently, only shh.dict.cn/ is used for training
- The */metadata.txt files are generated by make_metadata.py
training/
- Juptyer notebook for training the model
- Intended to be uploaded and run in Google Colab environment; needs to be modified for local use
- Uses the coqui-ai/TTS repo, which contains an implementation of VITS
writeup/: the write-up
speech_synthesis/: contains the speech synthesis model
- See speech_synthesis/README.md for more details
comparison_questionnaire/: contains the questionnaire and audio files used to compare speech produced by this model, the Apple model, and a human speaker
- *-1.wav: produced by this model
- *-2.wav: produced by Apple VoiceOver (MacBook Pro 14-inch, 2021; MacOS Ventura 13.0.1)
- *-3.wav: spoken by myself
- stats.ipynb: Jupyter notebook for analysing the questionnaire results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

comparison_questionnaire

comparison_questionnaire

data

data

phonemisation

phonemisation

speech_synthesis

speech_synthesis

training

training

writeup

writeup

.gitignore

.gitignore

README.md

README.md

make_metadata.py

make_metadata.py

Repository files navigation

Shanghainese TTS

Goal

Description

Dependencies

Usage

Structure

About

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
comparison_questionnaire		comparison_questionnaire
data		data
phonemisation		phonemisation
speech_synthesis		speech_synthesis
training		training
writeup		writeup
.gitignore		.gitignore
README.md		README.md
make_metadata.py		make_metadata.py

edward-martyr/shanghainese-tts

Folders and files

Latest commit

History

Repository files navigation

Shanghainese TTS

Goal

Description

Dependencies

Usage

Structure

About

Topics

Resources

Stars

Watchers

Forks

Languages