Skip to content

AhmedImtiazPrio/Keras-AudioDataGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keras-AudioDataGenerator

Augmented Audio Data Generator for 1D-Convolutional Neural Networks

Note: For analogy with Image Data Generator, please read the ImageDataGenerator documentation. Also has a balanced data generator that creates balanced mini-batches.

The Audio Data Generator generates batches of audio data with real-time data augmentation. Data is looped over in batches. This method enables audio augmentation in CPU while training in parallel on the GPU. Code is tested on Linux computers. CPU usage should look something like this during GPU training:

CPU Usage during GPU training

Basic Usage:

from AudioDataGenerator import AudioDataGenerator

datagen = AudioDataGenerator(
                featurewise_center=True,
                featurewise_std_normalization=True,
                shift=.2,
                horizontal_flip=True,
                zca_whitening=True)

datagen.fit(x_train)

model.fit_generator(datagen.flow(x_train,y_train,batch_size=32),
                    steps_per_epoch = len(x_train)/32,
                    epochs = 100)

Multiple labels (y data) can be passed as a list. Data can be balanced depending on multiple labels but in that case an additional label (meta_label) is required with BalancedAudioDataGenerator instead of AudioDataGenerator. (See arguments)

model.fit_generator(datagen.flow(x_train,**[y_instrument,y_artist,y_genre]**,batch_size=32),
                    steps_per_epoch = len(x_train)/32,
                    epochs = 100)

Arguments:

The following are the currently available augmentation parameters.

featurewise_center: Boolean. Set input mean to 0 over the dataset, feature-wise.

samplewise_center: Boolean. Set each sample mean to 0.

featurewise_std_normalization: Boolean. Divide inputs by std of the dataset, feature-wise.

samplewise_std_normalization: Boolean. Divide each input by its std.

zca_epsilon: epsilon for ZCA whitening. Default is 1e-6.

zca_whitening: Boolean. Apply ZCA whitening.

Note: Make sure to apply zca_whitening when using a separate validation set.

roll_range: Float (fraction of total sample length). Range for horizontal circular shifts.

horizontal_flip: Boolean. Randomly flip inputs horizontally.

zoom_range: Float (fraction of zoom) or range of [lower, upper] Uniform Distribution.

noise: [mean,std,'Normal'] or [lower,upper,'Uniform'] Add Random Additive noise. Noise is added to the data with a .5 probability.

noiseSNR: Float required SNR in dB. Noise is added to the data with a .5 probability(NotImplemented).

shift: Float (fraction of total sample length). Range for horizontal shifts.

fill_mode: One of {"constant", "nearest", "reflect" or "wrap"}. Default is 'nearest'. Points outside the boundaries of the input are filled according to the given mode:

  • 'constant': kkkkkkkk|abcd|kkkkkkkk (cval=k)
  • 'nearest': aaaaaaaa|abcd|dddddddd
  • 'reflect': abcddcba|abcd|dcbaabcd
  • 'wrap': abcdabcd|abcd|abcdabcd

cval: Float or Int. Value used for points outside the boundaries when fill_mode = "constant".

rescale: rescaling factor. Defaults to None. If None or 0, no rescaling is applied, otherwise we multiply the data by the value provided (before applying any other transformation).

preprocessing_function: function that will be implied on each input.

  • The function will run after the audio is resized and augmented.
  • The function should take one argument:
    • one audio (Numpy tensor with rank 2),
    • should output a Numpy tensor with the same shape.

brightness_range: Tuple or list of Float. Range for adding random gain to the audio. Doesn't have any effect if normalization is performed.

data_format: One of {"channels_first", "channels_last"}. "channels_last" mode means that the audio should have shape (samples, width, channels), "channels_first" mode means that the data should have shape (samples, channels, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last".

validation_split: Float. Fraction of data reserved for validation (strictly between 0 and 1).

meta_label: array like y_train. This label defines the criteria on which the balancing is done.

For example : A binary classification problem has data from 3 different sources.
So class labels are [0,1], domains labels are [0,1,2]
So there are 6 unique class available for each data depending of the combination of (class, domain). 

| Class | Domain | Meta_label
:   0   :   0    :   0
:   0   :   1    :   1
:   0   :   2    :   2
:   1   :   0    :   3
:   1   :   1    :   4
:   1   :   2    :   5

Note: This feature is used to balance the training data using the BalancedAudioDataGenerator class

Dependencies:

  • Keras
  • Scipy
  • Numpy

Methods:

.fit() Method

Compute the internal data stats related to the data-dependent transformations, based on an array of sample data. Only required if featurewise_center or featurewise_std_normalization or zca_whitening is True.

datagen.fit(x_train,seed=1)

.flow() Method

Creates an iterable with batches. The iterator iterates indefinitely, therefore loops need to be manually broken.

batch_size = 32
for epoch in range(epochs):
    batch = 0
    for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=batch_size, shuffle=True):
        model.fit(x_batch, y_batch)
        batch += 1
        if batch >= len(x_train) / batch_size :
            break

.flow_from_directory() Method

Not Implemented

About

Augmented Audio Data Generator for 1D-Convolutional Neural Networks

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages