Skip to content

Latest commit

 

History

History
50 lines (39 loc) · 4.56 KB

File metadata and controls

50 lines (39 loc) · 4.56 KB

Tutorials on DNN-based Vocoders

These are tutorials on some deep-neural-network vocoders in Pytorch and Python.

Features of these tutorials:

  1. Pre-trained model is provided to produce audio samples.
  2. No painful installation of dependency. Just directly run the notebook on Google Colab.
  3. Very detailed implementations, for example, how to cache intermediate output in causal dilated convolution.
  4. Not only DNN but also DSP techniques are explained, e.g., linear prediction, overlap-add ...

All are hosted on the Google Colab platform.

Link Chapter
Introduction and basics
Open In Colab chapter_1_introduction.ipynb entry point and Python/Pytorch conventions
Open In Colab chapter_2_DSP_tools_Python.ipynb selected DSP tools for speech processing
Open In Colab chapter_3_DSP_tools_in_DNN_Pytorch.ipynb selected DSP tools implemented as layers in neural networks;
DSP-based Vocoder
Open In Colab chapter_4_DSP-based_Vocoder traditional DSP-based vocoder included in SPTK toolkit;
Neural vocoders
Open In Colab chapter_5_DSP+DNN_NSF.ipynb neural source-filter model
Open In Colab chapter_6_AR_WaveNet.ipynb Autogressive WaveNet vocoder
Open In Colab chapter_7_AR_iLPCNet.ipynb Autogressive iLPCNet
Open In Colab chapter_8_Flow_WaveGlow.ipynb Flow-based WaveGlow model
Open In Colab chapter_9_GAN_HiFiGAN_NSFw/GAN.ipynb HiFiGAN, and NSF + HiFiGAN
Appendix
Open In Colab chapter_a1_Linear_prediction.ipynb Details on a naive implementation of Linear Prediction;
Open In Colab chapter_a2_Music_NSF.ipynb Application of NSF to music instrumental audios.
Open In Colab chapter_a3_pretrained_vocoders.ipynb Pretrained neural vocoders on a few speech datasets.

Click Open in Colab will open the book. You can also download them from Google Drive.

Models and implementations are for the tutorial, therefore lacking intensive tuning and optimization. Neither am I good at that. If you have ideas on how to improve, your feedback is appreciated!

The above notebooks were used in ICASSP 2022 short course and ISCA Speech Processing Course in Crete.

@misc{Stylianou2022,
author = {Stylianou, Yannis and Tsiaras, Vassilis and Conkie, Alistair and Maiti, Soumi and Yamagishi, Junichi and Wang, Xin and Chen, Yutian and Slaney, Malcom and Petkov, Petko and Padinjaru, Shifas and Kafentzis, George},
mendeley-groups = {misc,self-arxiv},
title = {{ICASSP2022 Shortcouse: Inclusive Neural Speech Synthesis -iNSS}},
year = {2022}
}

By Xin Wang