SpecAugment with Pytorch

A Pytorch Implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment is a state of the art data augmentation approach for speech recognition.

The paper's authors did not publish code that I could find and their implementation was in TensorFlow. We implemented all three SpecAugment transforms using Pytorch, torchaudio, and fastai / fastai-audio.

To use:

Run install.sh (I recommend using a unique conda env for the project)

After the install script runs, you should have a torchaudio folder in your project folder.

Check out SpecAugment.ipynb (a Jupyter notebook) for the functions.

Augmentations

Time Warp

Time Mask

Frequency Mask

Combined:

Note on Time Warp

The Time Warp augmentation relies on Tensorflow-specific functionality not supported in Pytorch. We implemented supporting functions for this augmentation in SparseImageWarp.ipynb. You do not need to look at this notebook to use the augmentations. But the Time Warp augmentation depends on code exposed in the SparseImageWarp notebook.

Let's be friends!

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
exp		exp
img		img
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SparseImageWarp.ipynb		SparseImageWarp.ipynb
SpecAugment.ipynb		SpecAugment.ipynb
install.sh		install.sh
notebook2script.py		notebook2script.py
party-crowd.wav		party-crowd.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp

exp

img

img

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

SparseImageWarp.ipynb

SparseImageWarp.ipynb

SpecAugment.ipynb

SpecAugment.ipynb

install.sh

install.sh

notebook2script.py

notebook2script.py

party-crowd.wav

party-crowd.wav

Repository files navigation

SpecAugment with Pytorch

A Pytorch Implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

To use:

Augmentations

Note on Time Warp

About

Releases

Packages

Contributors 3

Languages

License

zcaceres/spec_augment

Folders and files

Latest commit

History

Repository files navigation

SpecAugment with Pytorch

A Pytorch Implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

To use:

Augmentations

Note on Time Warp

About

Resources

License

Stars

Watchers

Forks

Languages