Speech Super-resolution with Unconditional Diffwave

Source code of the paper Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution.

Training

Install python requirements.

pip install requirements.txt

Please convert all the data files into .wav format and put them under the same directory. The following command will train a 48 kHz UDM.

python train.py model.res_channels=64 epochs=50 sr=48000 train_T=0 dataset.size=120000 dataset.segment=32768 dataset.data_dir=/your/vctk/train/set/ loader.batch_size=12 scheduler.patience=1000000

Evaluation

The numbers in the paper can be reproduced with following commands.

rate: the upscaling ratio.
downsample-type: the downsampling filter.
infer-type: the upscaling method.
lr: the $\eta$ value in the paper.

Spline Interpolation

python vctk_dsp_baseline.py /your/vctk/test/set/ --downsample-type sinc --infer-type spline --rate 2

UDM+

python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --rate 2 -T 50 --infer-type manifold --downsample-type stft --lr 0.67

UDM+ without MCG

python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --rate 3 -T 50 --infer-type inpainting --downsample-type sinc

NU-Wave(+)

The checkpoint of UDM is used for noise scheduling. For training NU-Wave, please refer to here. For evaluating NU-Wave+, change infer-type to nuwave-manifold and specify the value of lr.

python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --nuwave-ckpt /XXXX/checkpoints_nuwave_x2/nuwave_x2_01_07_22_epoch\=645_EMA --rate 2 -T 50 --infer-type nuwave --downsample-type stft

NU-Wave 2(+)

The checkpoint of UDM is used for noise scheduling. For training NU-Wave 2, please refer to here. For evaluating NU-Wave 2+, change infer-type to nuwave2-manifold and specify the value of lr.

python -W ignore vctk_infer.py outputs/XXXX/saved/training_checkpoint_500000.pt outputs/XXXX/.hydra/config.yaml /your/vctk/test/set --nuwave-ckpt /XXXX/nuwave2_08_14_09_epoch\=72_EMA --rate 3 -T 50 --infer-type nuwave2 --downsample-type sinc

We'll release the script for evaluating WSRGlow and NVSR in the future.

Pre-trained Checkpoints

Extending to non-zero phase response lowpass filters

When using IIR lowpass filter to downsample audio, it introduces non-linear phase delays, thus breaking the assumption that the frequency mask is real value. An easy solution to compensate for the delays is applying the same filter again during upsampling but in a backward direction of time. We conducted the same 48 kHz experiment in the paper again but with a 8th order Chebyshev Type I lowpass filter.

	2x	3x
NU-Wave	0.87	1.00
NU-Wave 2	0.73	0.87
NU-Wave+	1.03	1.32
NU-Wave 2+	0.86	1.00
UDM+	0.64	0.79

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
ckpt		ckpt
conf		conf
datasets @ 5ae07b3		datasets @ 5ae07b3
docs		docs
models		models
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
bandwidth_extension.ipynb		bandwidth_extension.ipynb
compute_lsd_lf.py		compute_lsd_lf.py
inference.py		inference.py
loss.py		loss.py
requirements.txt		requirements.txt
train.py		train.py
uncond_gen.py		uncond_gen.py
utils.py		utils.py
vctk_dsp_baseline.py		vctk_dsp_baseline.py
vctk_infer.py		vctk_infer.py

License

yoyololicon/diffwave-sr

Folders and files

Latest commit

History

Repository files navigation

Speech Super-resolution with Unconditional Diffwave

Training

Evaluation

Spline Interpolation

UDM+

UDM+ without MCG

NU-Wave(+)

NU-Wave 2(+)

Pre-trained Checkpoints

Extending to non-zero phase response lowpass filters

About

Topics

Resources

License

Stars

Watchers

Forks

Languages