Skip to content

edward-martyr/shanghainese-tts

Repository files navigation

Shanghainese TTS

Goal

To build a text-to-speech (TTS) system for Shanghainese from scratch, seeking to improve the production of tone sandhi compared to existing models by paying special attention to preprocessing of text.

Description

See writeup/main.pdf.

Dependencies

pip install -r phonemisation/requirements.txt
pip install -r speech_synthesis/requirements.txt
pip install -r comparison_questionnaire/requirements.txt  # for analysis of questionnaire results

Usage

See speech_synthesis/README.md.

Structure

  • phonemisation/: contains the phonemisation module
    • See explanation of output in phonemisation/__init__.py
    • Usage: python -m phonemisation "text to phonemise"
    • Mechanism: Chinese sentenceword segmentationChinese wordsromanisationShanghainese pinyinphonemisationShanghainese phonemes
      • jieba is used for word segmentation
      • A Shanghainese dictionary I previously made is used for romanisation
        • Uses Qieyun module to add the tone number 1 to syllables of 陰平 yinping/inbin tone; other tones are phonologically unmarked
      • The romanisation_to_ipa function in romanisation.py contains the phonemisation function
  • make_metadata.py: uses the phonemisation module to convert transcription into IPA and generate metadata for training
    • See below in data/
  • data/: contains the dataset used for training
    • The transcriptions and audio files are adapted from this repo
      • Downsampled to 16kHz for training
      • Currently, only shh.dict.cn/ is used for training
    • The */metadata.txt files are generated by make_metadata.py
  • training/
    • Juptyer notebook for training the model
    • Intended to be uploaded and run in Google Colab environment; needs to be modified for local use
    • Uses the coqui-ai/TTS repo, which contains an implementation of VITS
  • writeup/: the write-up
  • speech_synthesis/: contains the speech synthesis model
  • comparison_questionnaire/: contains the questionnaire and audio files used to compare speech produced by this model, the Apple model, and a human speaker
    • *-1.wav: produced by this model
    • *-2.wav: produced by Apple VoiceOver (MacBook Pro 14-inch, 2021; MacOS Ventura 13.0.1)
    • *-3.wav: spoken by myself
    • stats.ipynb: Jupyter notebook for analysing the questionnaire results