Zero-Shot Foreign Accent Conversion without a Native Reference

Code for this paper Zero-Shot Foreign Accent Conversion without a Native Reference

Waris Quamer, Anurag Das, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna

**Code not longer maintained, but you can find alternate implementation here. (See Latent Space Conversion Method)

This is a TensorFlow + Pytorch implementation. This implementation is adapted from the Real Time Voice Clone implementation at https://github.com/CorentinJ/Real-Time-Voice-Cloning.

Installation

Python 3.8

Install PyTorch (>=1.0.1).
Install Nvidia version of TensorFlow 1.15
Install ffmpeg.
Install Kaldi
Install PyKaldi
Run pip install -r requirements.txt to install the remaining necessary packages.
Download pretrained TDNN-F model, extract it, and set PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh to the pretrained model directory.

Dataset

Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
- You also need to set KALDI_ROOT and PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh accordingly.
Speaker Encoder: LibriSpeech, see here for detailed training process.
Accent Encoder: Speech Accent Archive. You can use the subset that I collected here.
Synthesizer and Translator (i.e., Seq2seq model): ARCTIC and L2-ARCTIC. Please see here for a merged version.
Vocoder: LibriSpeech, see here for detailed training process.

All the pretrained the models are available here

Quick Start

See the inference script

Training

Use Kaldi to extract BNF for the reference L1 speaker

./kaldi_scripts/extract_features_kaldi.sh /path/to/L2-ARCTIC/BDL

Preprocessing

python synthesizer_preprocess_audio.py /path/to/L2-ARCTIC BDL /path/to/L2-ARCTIC/BDL/kaldi --out_dir=your_preprocess_output_dir
python synthesizer_preprocess_embeds.py your_preprocess_output_dir

python translator_preprocess_audio.py /path/to/L2-ARCTIC BDL /path/to/L2-ARCTIC/BDL/kaldi --out_dir=your_preprocess_output_dir
python translator_preprocess_embeds.py your_preprocess_output_dir

Training

python translator_train.py PPG2PPG_train your_preprocess_output_dir
python synthesizer_train.py Accetron_train your_preprocess_output_dir

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
data_objects		data_objects
encoder		encoder
kaldi_scripts		kaldi_scripts
synthesis_output		synthesis_output
synthesizer		synthesizer
synthesizer_like_translator		synthesizer_like_translator
translator		translator
utils		utils
vocoder		vocoder
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
embed_visualization.ipynb		embed_visualization.ipynb
encoder_preprocess.py		encoder_preprocess.py
encoder_train.py		encoder_train.py
generate_inferences.py		generate_inferences.py
generate_inferences_vctk.py		generate_inferences_vctk.py
generate_inferences_vctk_c.py		generate_inferences_vctk_c.py
inference_script.ipynb		inference_script.ipynb
requirements.txt		requirements.txt
synthesizer_like_translator_train.py		synthesizer_like_translator_train.py
synthesizer_preprocess_audio.py		synthesizer_preprocess_audio.py
synthesizer_preprocess_embeds.py		synthesizer_preprocess_embeds.py
synthesizer_preprocess_embeds_test.py		synthesizer_preprocess_embeds_test.py
synthesizer_train.py		synthesizer_train.py
translator_preprocess_audio.py		translator_preprocess_audio.py
translator_preprocess_embeds.py		translator_preprocess_embeds.py
translator_train.py		translator_train.py
vocoder_preprocess.py		vocoder_preprocess.py
vocoder_train.py		vocoder_train.py

License

warisqr007/ppg2ppg

Folders and files

Latest commit

History

Repository files navigation

Zero-Shot Foreign Accent Conversion without a Native Reference

Installation

Dataset

Quick Start

Training

About

Topics

Resources

License

Stars

Watchers

Forks

Languages