Voice Conversion Challenge 2020 baseline: CycleVAE w/ PWG vocoder

Official homepage: http://www.vc-challenge.org/

News

2020/10/18 update paper information.
2020/4/17 upload the missed conversion pair of SEF2-TEM1 of reference_v.10.
2020/3/18 release the generated samples of reference_v.10.
2020/3/11 release the first version repo and the generated samples of development set (dv50_vcc2020_24kHz).

Introduction

This repo provides a cyclic variational autoencoder (CycleVAE)-based voice conversion (VC) system with parallel WaveGAN (PWG)-based vocoder for Voice Conversion Challenge 2020 (VCC2020). VCC2020 contains intra-lingual VC (Task1) and cross-lingual VC (Task2) tasks. Task1 includes four English source and four English target speakers. Task2 includes the same English source speakers but other six non-English (German/Finnish/Mandarin) target speakers. The goal is to convert the speaker identity of source speech to target speakers while keeping the same English contents.

CycleVAE w/ PWG vocoder

For this baseline VC system, WORLD-based acoustic features, which include spectral (further parameterized into mcep), pitch (f0), and aperiodic (ap) features, are adopted. The CycleVAE model only converts the spectral features. Logarithmic f0 is linearly converted and ap is kept the same as source speaker.

Two training processes of PWG vocoder are provided in this repo. The first PWG vocoder is trained with natural acoustic features and natural waveforms. The second PWG vocoder is trained with artificial and natural acoustic features and natural waveforms. Specifically, the artificial acoustic features include self-reconstructed and pseudo converted (target->source->target) acoustic features, which are generated by the CycleVAE and have the matched temporal structure with the natural waveforms. Because of the reduction of the mismatch between training and testing data, the second PWG vocoder achieves higher speech quality when the input is the converted acoustic features.

Model and demo

The trained CycleVAE and PWG models can be accessed here.
The generated samples can be accessed here.

Corpus

Only VCC2020 corpus is involved in both CycleVAE and PWG trainings.

VCC2020 contains all training data of the challenge. Please follow the instruction from the organizers to download it in the desired directory. (default is baseline/egs/cyclevae/wav_24kHz/)

Usage and requirements

Please check baseline/README.md.

References

CycleVAE [paper] [github]
PWG [paper] [github]

Citation

If you find the code is helpful, please cite the following article.

@InProceedings{vcc20vaebaseline,
author={Tobing, Patrick Lumban and Wu, Yi-Chiao and Toda, Tomoki},
title={Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational
Autoencoder and Parallel WaveGAN},
booktitle="Proc. Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020",
year="2020",
month="Oct.",
}

Authors

Development:
Patrick Lumban Tobing @ Nagoya University (@patrickltobing)
Yi-Chiao Wu @ Nagoya University (@bigpon)

Advisor:
Tomoki Toda @ Nagoya University

E-mail:
patrick.lumbantobing@g.sp.m.is.nagoya-u.ac.jp
yichiao.wu@g.sp.m.is.nagoya-u.ac.jp
tomoki@icts.nagoya-u.ac.jp

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
baseline		baseline
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baseline

baseline

LICENSE.txt

LICENSE.txt

README.md

README.md

Repository files navigation

Voice Conversion Challenge 2020 baseline: CycleVAE w/ PWG vocoder

News

Introduction

CycleVAE w/ PWG vocoder

Model and demo

Corpus

Usage and requirements

References

Citation

Authors

About

Releases

Packages

Languages

License

bigpon/vcc20_baseline_cyclevae

Folders and files

Latest commit

History

Repository files navigation

Voice Conversion Challenge 2020 baseline: CycleVAE w/ PWG vocoder

News

Introduction

CycleVAE w/ PWG vocoder

Model and demo

Corpus

Usage and requirements

References

Citation

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages