Skip to content

bigpon/vcc20_baseline_cyclevae

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Voice Conversion Challenge 2020 baseline: CycleVAE w/ PWG vocoder

Official homepage: http://www.vc-challenge.org/

News

  • 2020/10/18 update paper information.

  • 2020/4/17 upload the missed conversion pair of SEF2-TEM1 of reference_v.10.

  • 2020/3/18 release the generated samples of reference_v.10.

  • 2020/3/11 release the first version repo and the generated samples of development set (dv50_vcc2020_24kHz).

Introduction

This repo provides a cyclic variational autoencoder (CycleVAE)-based voice conversion (VC) system with parallel WaveGAN (PWG)-based vocoder for Voice Conversion Challenge 2020 (VCC2020). VCC2020 contains intra-lingual VC (Task1) and cross-lingual VC (Task2) tasks. Task1 includes four English source and four English target speakers. Task2 includes the same English source speakers but other six non-English (German/Finnish/Mandarin) target speakers. The goal is to convert the speaker identity of source speech to target speakers while keeping the same English contents.

CycleVAE w/ PWG vocoder

For this baseline VC system, WORLD-based acoustic features, which include spectral (further parameterized into mcep), pitch (f0), and aperiodic (ap) features, are adopted. The CycleVAE model only converts the spectral features. Logarithmic f0 is linearly converted and ap is kept the same as source speaker.

Two training processes of PWG vocoder are provided in this repo. The first PWG vocoder is trained with natural acoustic features and natural waveforms. The second PWG vocoder is trained with artificial and natural acoustic features and natural waveforms. Specifically, the artificial acoustic features include self-reconstructed and pseudo converted (target->source->target) acoustic features, which are generated by the CycleVAE and have the matched temporal structure with the natural waveforms. Because of the reduction of the mismatch between training and testing data, the second PWG vocoder achieves higher speech quality when the input is the converted acoustic features.

Model and demo

The trained CycleVAE and PWG models can be accessed here.
The generated samples can be accessed here.

Corpus

Only VCC2020 corpus is involved in both CycleVAE and PWG trainings.

  • VCC2020 contains all training data of the challenge. Please follow the instruction from the organizers to download it in the desired directory. (default is baseline/egs/cyclevae/wav_24kHz/)

Usage and requirements

Please check baseline/README.md.


References


Citation

If you find the code is helpful, please cite the following article.

@InProceedings{vcc20vaebaseline,
author={Tobing, Patrick Lumban and Wu, Yi-Chiao and Toda, Tomoki},
title={Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational
Autoencoder and Parallel WaveGAN},
booktitle="Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020",
year="2020",
month="Oct.",
}

Authors

Development:
Patrick Lumban Tobing @ Nagoya University (@patrickltobing)
Yi-Chiao Wu @ Nagoya University (@bigpon)

Advisor:
Tomoki Toda @ Nagoya University

E-mail:
patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp
yichiao.wu@g.sp.m.is.nagoya-u.ac.jp
tomoki@icts.nagoya-u.ac.jp