Skip to content

nafiuny/voice_conversion_dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

voice_conversion_dataset

Voice Conversion is a technology that modifies the speech of a source speaker and makes their speech sound like that of another target speaker without changing the linguistic information.

We need a data set to run audio conversion models. Here is a list of the most important datasets.

This data set contains audio along with their text file.

  • Ljspeech
    This is a public domain speech dataset consisting of 13,100 short audio clips of a single English speaker reading passages from 7 non-fiction books. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.
    download Ljspeech

  • VCTK
    This VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences.
    download VCTK

  • LibriTTS
    LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research.
    download LibriTTS

  • Common Voice
    LibriTTS is a multi-speaker English corpus of approximately 585 hours of read English speech at 24kHz sampling rate, prepared by Heiga Zen with the assistance of Google Speech and Google Brain team members. The LibriTTS corpus is designed for TTS research.
    download Common Voice