GitHub - Adibian/ResGrad: Unofficial implementation of ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

ResGrad - PyTorch Implementation

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

This is an unofficial PyTorch implementation of ResGrad as a high-quality denoising model for Text to Speech. In short, this model generates the spectrogram using FastSpeech2 and then removes the noise in the spectrogram using the Diffusion method to synthesize high-quality speeches. As mentioned in the paper the implementation is based on FastSpeech2 and Grad-TTS. Also, the HiFiGAN model is used to generate waveforms from synthesized spectrograms.

Quickstart

Data structures:

dataset/data_name/synthesizer_data/
    test_data/
        speaker1/
            sample1.txt
            sample1.wav
            ...
        ...
    train_data/
        ...
    test.txt  (sample1|speaker1|*phoneme_sequence \n ...)
    train.txt (sample1|speaker1|*phoneme_sequence \n ...)

Preprocessing:

python synthesizer/prepare_align.py config/data_name/config.yaml
python synthesizer/preprocess.py config/data_name/config.yaml

Train synthesizer:

python train_synthesizer.py --config config/data_name/config.yaml

Prepare data for ResGrade:

python resgrad_data.py --synthesizer_restore_step 1000000 --data_file_path dataset/data_name/synthesizer_data/train.txt \
                        --config config/data_name/config.yaml

Train ResGrade:

python train_resgrad.py --config config/data_name/config.yaml

Inference:

python inference.py --text "phonemes sequence example" \
                    --synthesizer_restore_step 1000000 --regrad_restore_step 1000000 --vocoder_restore_step 2500000 \
                    --config config/data_name/config.yaml --result_dir output/data_name/results

References 📔

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech, Z. Chen, et al.
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech, Y. Ren, et al.
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech, V. Popov, et al.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

resgrad

resgrad

synthesizer

synthesizer

vocoder

vocoder

.gitignore

.gitignore

README.md

README.md

inference.py

inference.py

requirements.txt

requirements.txt

resgrad.PNG

resgrad.PNG

resgrad_data.py

resgrad_data.py

train_resgrad.py

train_resgrad.py

train_synthesizer.py

train_synthesizer.py

utils.py

utils.py

Repository files navigation

ResGrad - PyTorch Implementation

Quickstart

References 📔

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
config		config
resgrad		resgrad
synthesizer		synthesizer
vocoder		vocoder
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
resgrad.PNG		resgrad.PNG
resgrad_data.py		resgrad_data.py
train_resgrad.py		train_resgrad.py
train_synthesizer.py		train_synthesizer.py
utils.py		utils.py

Adibian/ResGrad

Folders and files

Latest commit

History

Repository files navigation

ResGrad - PyTorch Implementation

Quickstart

References 📔

About

Topics

Resources

Stars

Watchers

Forks

Languages