SpeechAugment

Motivation

AI algorithms are mostly data-driven, and the quality of the data determines the quality of the model to some extent. This leads to the inherent shortcoming of deep learning, and data augmentation is an effective way to solve this problem.

Methods

This repo supports audio data augmentations such as :

reverberation
background noise
distortion
packet loss simulation
farfield effect
speed perturbation

After those time domain augmentations, one can apply feature extraction step.

Installation

To install the released stable version, enter the REPL mode

] add SpeechAugment

or

Pkg.add("SpeechAugment")

To install the development version, enter the REPL mode

] add https://github.com/sonosole/SpeechAugment.jl.git

Example

using WAV
using SpeechAugment

# 1. read a wav file as a speech example
batchsize = 8;
data,fs = wavread("/XXPath/ASpeechExample.wav");

# 2. init all the augmentation functions you want
echo  = initAddEcho(fs, (0.05,0.4), (3.0,3.2,2.5,3.5,2.0,3.0));
noise = initAddNoise("XXPathFullOfNoiseWAVs", 2, (5,15));
clip  = initClipWav((0.5,2.0));
drop  = initDropWav(fs, (0.09,0.15));
far   = initFarfieldWav(fs, (0.4,0.9));
speed = initSpeedWav((0.8,1.2));

# 3. make a function list or array
fnlist = [echo noise clip drop far speed];

# 4. augment #batchSize audios
wavs = Vector(undef, batchsize)
for i = 1:batchsize
    wavs[i] = copy(data)
end
wavs = augmentWavs(fnlist, wavs)
for i = 1:batchsize
    wavwrite(wavs[i], "A$i.wav",Fs=16000,nbits=32)
end

# there is also a function called `augmentWav`
# it augments one audio into multiple audios.
audios = augmentWav(fnlist, data, batchsize)
for i = 1:batchsize
    wavwrite(audios[i], "B$i.wav",Fs=16000,nbits=32)
end

Function Parameter Introduction

initAddEcho(fs::Number, T₆₀Span::NTuple{2,Number}, roomSpan::NTuple{6,Number}) -> addecho(wav::Array)

fs sampling rate
T₆₀Span effective reverberation time e.g. (minT60, maxT60)
roomSpan room size e.g. (MinL, MaxL, MinW, MaxW, MinH, MaxH)

initAddNoise(path::String, period::Int, dBSpan::NTuple{2,Number}) -> addnoise(speech::Array)

path a path only full of noise WAVs
period every #period it would change another noise wav.
dBSpan span of SNR e.g. (mindB, maxdB)

initClipWav(clipSpan::NTuple{2,Number}) -> clipwav(wav::Array)

clipSpan how much it would clip a wav e.g. (0.5,2.0)

initDropWav(fs::Real, ratioSpan::NTuple{2,Number}) -> dropwav(wav::Array)

fs sampling rate
ratioSpan span of droping ratio e.g. (0.02, 0.09). 1.0 is the uplimit.

initFarfieldWav(fs::Real, maxvalueSpan::NTuple{2,Number}) -> farfieldwav(wav::Array)

fs sampling rate
maxvalueSpan ranges from (0.0,1.0). Smaller means farther away. (0.2, 0.9) is recommended.

initSpeedWav(speedSpan::NTuple{2,Number}) -> speedwav(wav::Array)

speedSpan range of speed perturbation. (0.85, 1.15) is recommended.

All the NTuple{2,Number} parameters should follow the small on the left and the big on the right i.e. (minvalue, maxvalue). To precisely control the extent of augmentation, the below functions could be used:

addEcho
addNoise
clipWav
dropWav
farfieldWav
speedWav

For details, check the documentation or enter the help?> mode.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
doc		doc
src		src
test		test
LICENSE		LICENSE
Manifest.toml		Manifest.toml
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

src

src

test

test

LICENSE

LICENSE

Manifest.toml

Manifest.toml

Project.toml

Project.toml

README.md

README.md

Repository files navigation

SpeechAugment

Motivation

Methods

Installation

Example

Function Parameter Introduction

About

Releases 1

Packages

Languages

License

sonosole/SpeechAugment.jl

Folders and files

Latest commit

History

Repository files navigation

SpeechAugment

Motivation

Methods

Installation

Example

Function Parameter Introduction

About

Topics

Resources

License

Stars

Watchers

Forks

Languages