Pytorch-End-to-End-ASR-on-TIMIT

BiGRU encoder + Attention decoder, based on "Listen, Attend and Spell"¹.

The acoustic features are 80-dimensional filter banks. They are stacked every 3 consecutive frames, so the time resolution is reduced.

Following the standard recipe, we use 462-speaker training set with all SA records removed. Outputs are mapped to 39 phonemes when evalauting.

With this code you can achieve ~22% PER on the core test set.

Usage

Install requirements

$ pip install -r requirements.txt

Prepare data

This will create lists (*.csv) of audio file paths along with their transcripts:

$ python prepare_data.py --root ${DIRECTORY_OF_TIMIT}

Train

Check available options:

$ python train.py -h

Use the default configuration for training:

$ python train.py exp/default.yaml

You can also write your own configuration file based on exp/default.yaml.

$ python train.py ${PATH_TO_YOUR_CONFIG}

Show loss curve

With the default configuration, the training logs are stored in exp/default/history.csv. Specify your training logs accordingly.

$ python show_history.py exp/default/history.csv

Test

During training, the program will keep monitoring the error rate on development set. The checkpoint with the lowest error rate will be saved in the logging directory (by default exp/default/best.pth).

To evalutate the checkpoint on test set, run:

$ python eval.py exp/default/best.pth

Or you can test random audio from the test set and see the attentions:

$ python inference.py exp/default/best.pth

Predict:
h# hh ih l pcl p gcl g r ey tcl d ix pcl p ih kcl k ix pcl p eh kcl k ix v dcl d ix tcl t ey dx ah v z h#
Ground-truth:
h# hh eh l pcl p gcl g r ey gcl t ix pcl p ih kcl k ix pcl p eh kcl k ix v pcl p ix tcl t ey dx ow z h#

References

[1] W. Chan et al., "Listen, Attend and Spell", https://arxiv.org/pdf/1508.01211.pdf

[2] J. Chorowski et al., "Attention-Based Models for Speech Recognition", https://arxiv.org/pdf/1506.07503.pdf

[3] M. Luong et al., "Effective Approaches to Attention-based Neural Machine Translation", https://arxiv.org/pdf/1508.04025.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Pytorch-End-to-End-ASR-on-TIMIT

Usage

Install requirements

Prepare data

Train

Show loss curve

Test

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Pytorch-End-to-End-ASR-on-TIMIT

Usage

Install requirements

Prepare data

Train

Show loss curve

Test

References