Skip to content

Code for paper "Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion"

Notifications You must be signed in to change notification settings

guanlongzhao/ppg-gmm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPG-GMM

This repository hosts an open source implementation for the accent conversion system described in our paper in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), titled "Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion."

System requirement

  • OS: Ubuntu 16.04 (tested and recommended) or CentOS 7.5 (tested but you may run into some issues)
  • Matlab: R2019a (tested and recommended) or R2016a (tested); any versions between these two should work but not tested
  • Essentially, as long as you can install Kaldi, the Montreal Forced Aligner, and Matlab on your OS, this package should work just fine
  • Fast CPU and large RAM (>=16GB) are preferred

Data

  • The L2-ARCTIC corpus is an excellent dataset for accent conversion tasks, and it is freely available
  • The CMU-ARCTIC corpus has four high-quality native English speakers

Install

Download the source code

git clone https://github.com/guanlongzhao/ppg-gmm.git
cd test/data
mkdir temp

Install dependencies

  • Install kaldi-5.3
  • Install Montreal Forced Aligner (v1.0)
    • Make sure the aligner binary file is executable on your machine
  • Install mcep-sptk-matlab
    • Run script/installMcepSptkMatlab.m in Matlab
    • Note that you need a working C/C++ compiler installed, and Matlab has to be configured to use that compiler
    • See the documentation for the mex function in Matlab for more details
  • Configure kaldi-posteriorgram
    • Set KALDI_ROOT in dependency/kaldi-posteriorgram/path.sh to the root directory of your Kaldi installation (e.g., /home/kaldi)
    • Give execute permission to all .sh files. For example, chmod u+x *.sh
  • Configure function/dataPrep.m
    • Set aligner to the absolute path of the Montreal Forced Aligner binary (the mfa_align file, e.g., /home/mfa/mfa_align)
    • Set dictionary to the absolute path of the Montreal Forced Aligner dictionary file. If you do not have one, you can download it here
    • Set acousticModel to the absolute path of the Montreal Forced Aligner pre-trained model (the english.zip file, e.g., /home/mfa/english.zip)

Add to the search path

Add all dependencies (packages under dependency) and function to the Matlab search path

  • Use the Matlab GUI tool Set Path, or
  • Run script/addDependencies.m in Matlab, note that this only adds the dependencies to the search path of the current Matlab session

Run tests

  • Prepare test data [important]: in Matlab, run the script script/prepareFixturesForTests.m
  • Run all tests: go to the test folder in Matlab, and type runtests
  • It takes about ~30 min to finish all tests, depending on your machine specifications
  • To run a particular test (e.g., TEST_NAME), type runtests('TEST_NAME')
  • ppgGmmEndToEndTest: The end-to-end system test, can also be used as a reference on how the system works; this one takes about 10 min to finish

Run demo script

  • In Matlab, run script/demo.m
    • This script generates a voice that sounds like the speaker in test/data/tgt but with the accent of the speaker in test/data/src
    • You can find the accent conversion syntheses under test/data/temp/demo/ac_syntheses
    • The resulting syntheses may have low acoustic quality because the demo only uses 30 utterances for training
    • Some higher-quality samples we used in the paper can be found at https://guanlongzhao.github.io/demo/ppg-gmm
  • How to apply your own data? Read and modify script/demo.m

Notes

  • In the paper, we used the TANDEM-STRAIGHT vocoder (TandemSTRAIGHTmonolithicPackage012), and it is not open-source. Therefore, we cannot include that package here
    • Instead, we used WORLD in this implementation
    • We kept the TANDEM-STRAIGHT related code in this repo in case you have access to TANDEM-STRAIGHT. Note that we used MulticueF0v14 from Legacy-STRAIGHT as the pitch tracker to improve the performance of TANDEM-STRAIGHT, as noted in the paper
  • We used the standard mean-and-variance normalization approach to convert the F0 curve in the paper. In this implementation, we used the histogram equalization post-filtering method proposed by Wu et al. in the paper "Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion" (Interspeech'10) as the default F0 conversion method
  • This implementation is not the original experimental code used for the experiments in the paper, but it is a close re-implementation of the original system

Running into issues?

  • In the output folder of your experiment, you can find log files named in the form of log_[TIMESTAMP]. Try to read the logs and see if there is anything suspicious
  • The most common issues are,
    • Why the aligner freezes? If the aligner could not find all the words in your dictionary, it would pause and ask you whether to abort and fix this or continue. Generally, I ignore this and continue. The whole point of the alignment is to find the silent segments more accurately, and there are many other ways to do this
    • Why does the PPG extraction fail?
      • The most probable reason is that the shell scripts under dependency/kaldi-posteriorgram do not have the execute permission
      • Sometimes Matlab loads a different libstdc++ than the one that Kaldi was compiled with, and this makes all the system calls to Kaldi binaries made by Matlab to fail. The solution is to load the libstdc++ that Kaldi uses when starting your Matlab session; see script/addDependencies.m for more details
    • Why does Matlab tell me that some functions are missing?
      • You probably did not add all the dependencies to the Matlab search path
      • If it is a built-in Matlab function, you probably need to use a different Matlab version or install the corresponding toolboxes (e.g., Statistics and Machine Learning Toolbox for pdist2, Communications Toolbox for vec2mat, and Parallel Computing Toolbox for parfor). Some of these functions have open-source solutions. For example, I found a pdist2 function here
  • Feel free to open an issue or initiate a pull request for any bugs you found

Citation

Please cite the following paper if you use this system in your publication,

@article{zhao2019using,
  title={Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion},
  author={Zhao, Guanlong and Gutierrez-Osuna, Ricardo},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2019},
  month={Oct},
  volume={27},
  number={10},
  pages={1649-1660},
  doi={10.1109/TASLP.2019.2926754},
  ISSN={2329-9290}
}

License

  • For everything under dependency, please refer to their respective license terms
  • For codes under function, script, and test, they are released under Apache License 2.0

Contact

Guanlong Zhao (gzhao#tamu.edu) and Ricardo Gutierrez-Osuna (rgutier#tamu.edu), Department of Computer Science and Engineering, Texas A&M University

Replace # with @.

About

Code for paper "Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published