Tacotron 2 Explained

This repository is meant to teach the intricacies of writing advanced Recurrent Neural Networks in Tensorflow. The code is used as a guide, in weekly Deep Learning meetings at Ohio State University, for teaching -

How to read a paper
How to implement it in Tensorflow

I choose Tacotron 2 because -

Encoder-Decoder architectures contain more complexities then standard DNNs. Implementing one helps you master concepts you would otherwise overlook
Tachotron 2 was released less than a year ago (as of 2018) and is a relatively simple model (compared to something like GNTM). The associated paper explains the architecture well
Other public implementations offer a benchmark to compare results
Public datasets are available to achieve state of the art results
Training requires ~10 days given access to a GPU (comparable to GTX 1080)

Note: This code has no affiliation with the companies I worked at. I used none of the proprietery knowledge of any of those companies to write this code. This was purely an exercise in self study.

The paper followed in this repository is - Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. The repository only implements the Text to Mel Spectrogram part (called Tacotron 2). The repository does not include the vocoder used to synthesize audio.

This is a production grade code which can be used as state of the art TTS frontend. The blog post [TODO] shows some audio samples synthesized with a Griffin Lin vocoder. But the code has excess comments to aid a novice Tensorflow user which could be a hindrance. To read the code, start from train.py

The repository also uses Tensorflow's tf.data API for pre-processing and [TODO] Estimator API for modularity

Directory Structure

The directory structure followed is as specified in Stanford's CS230 Notes on Tensorflow. We modify the structure a bit to suite our needs.

data/ (Contains all data)
model/ (Contains model architecture)
    input_fn.py (Input data pipeline)
    model_fn.py (Main model)
    utils.py (Utility functions)
    loss.py (Model loss)
    wrappers.py (Wrappers for RNN cells)
    helpers.py (Decoder helpers)
    external/ (Code adapted from other repositories)
        attention.py (Location sensitive attention)
        zoneout_wrapper.py (Zoneout)
train.py (Run training)
config.json (Hyper parameters)
synthesize_results.py (Generate Mels from text)

Requirements

The repository uses Tensorflow 1.8.0. Some code may be incompatible with older versions of Tensorflow (specifically the Location Sensitive Attention Wrapper).

Setup

Setup python 3 virtual environment. If you dont have virtualenv, install it with

pip install virtualenv

Then create the environment with

virtualenv -p $(which python3) env

Activate the environment

source env/bin/activate

Install tensorflow

pip install tensorflow==1.8.0

Clone the repository

git clone https://gitlab.com/codetendolkar/tacotron-2-explained.git

Run the training script

cd tacotron2
python train.py

Generate Mels from Text

Synthesize Audio from Mels

Credits and References

"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions" Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu arXiv:1712.05884
Location Sensitive Attention adapted from Tacotron 2 implementation by Keith Ito - GitHub link
Zoneout Wrapper for RNNCell adapted from Tensorflow's official repository for MaskGan. The code contributed by A Dai - GitHub link
And obviously - all the contributors of Tensorflow
Internet

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
experiments		experiments
model		model
sample_data		sample_data
.gitignore		.gitignore
README.md		README.md
config.json		config.json
synthesize_results.py		synthesize_results.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

experiments

experiments

model

model

sample_data

sample_data

.gitignore

.gitignore

README.md

README.md

config.json

config.json

synthesize_results.py

synthesize_results.py

train.py

train.py

Repository files navigation

Tacotron 2 Explained

Directory Structure

Requirements

Setup

Generate Mels from Text

Synthesize Audio from Mels

Credits and References

About

Releases

Packages

Contributors 2

Languages

gauravtendolkar/tacotron-2-explained

Folders and files

Latest commit

History

Repository files navigation

Tacotron 2 Explained

Directory Structure

Requirements

Setup

Generate Mels from Text

Synthesize Audio from Mels

Credits and References

About

Resources

Stars

Watchers

Forks

Languages