Urban sounds classification with Convolutional Neural Networks

The objective of this project is to implement CNN models to recognize sound events from the UrbanSound9K dataset. The work has been divided into the following notebooks:

Data analysis (and related papers brief)
Pre-processing and feature evaluation
CNN model with MFCC
CNN model with Log-MEL Spectrograms
Data augmentation
Data augmentation pre-processing
CNN model with augmented data (Log-MEL Spectrograms)

Notebooks

Data analysis: a brief about previous works with the URbanSound8K dataset (scientific papers), dataset exploration, distribution analysis, listening.
Pre-processing: an introduction to different audible features we can use to work with digital audio, the pre-processing pipeline, STFT, MFCC and Log-MEL Spectrograms, feature extraction and data normalization.
CNN model with MFCC features: data preparation, CNN model definition (with detailed explanation) using Keras and TensorFlow back-end. Solution of a multi-class classification problem, model evaluation and testing, Recall, Precision and F1 analysis.
CNN Model with Log-MEL Spectrograms: a performance comparison using the same CNN model architecture with MEL spectrograms. Same training and evaluation than notebook #3.
Data augmentation: creation of augmented data from UrbanSound8K original sounds, using common audio effects like pitch shifting, time stretching, adding noise, with LibROSA.
Augmented pre-processing: audible features extraction from the new generated data.
CNN model with augmented data: using the same CNN architecture and almost identical training procedures with the generated data. Model evaluation and test to compare with previous achievements.

Getting the dataset

Download a copy of the UrbanSounds8K dataset from the UrbanSound8K home page.

Make sure to uncompress the dataset root directory into the project root, you should end up with a directory like "UrbanSounds8K" (or a symbolic link to it) in the project root.

Install required libraries

Make sure that Tensorflow, Keras, LibROSA, IPython, NumPy, Pandas, Matplotlib and SciKit Learn are already installed in your environment.

Note that we are using Tensorflow as Keras back-end, you must set this in your ~/.keras/keras.json file, this is an example:

{
    "image_dim_ordering": "tf",
    "image_data_format": "channels_first",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

The UrbanSound8K dataset

The UrbanSound8K dataset is a compilation of urban sound recordings, classified in 10 categories according to the paper "A Dataset and Taxonomy for Urban Sound Research", which proposes a taxonomical categorization to describe different environmental sound types.

The UrbanSound8K dataset contains 8732 labeled sound slices of varying duration up to 4 seconds. The categorization labels being:

Air Conditioner
Car Horn
Children Playing
Dog bark
Drilling
Engine Idling
Gun Shot
Jackhammer
Siren
Street Music

Note that the dataset comes already organized in 10 validation folds. In the case we want to compare our results with other we should stick with this schema.

Dataset metadata

The included metadata file ("UrbanSound8K/metadata/metadata.csv") provides all the required information about each audio file:

slice_file_name: The name of the audio file.
fsID: The Freesound ID of the recording from which this excerpt (slice) is taken
start: The start time of the slice in the original Freesound recording
end: The end time of slice in the original Freesound recording
salience: A (subjective) salience rating of the sound. 1 = foreground, 2 = background.
fold: The fold number (1-10) to which this file has been allocated.
classID: A numeric identifier of the sound class.
class: The class label name.

References

1- Data analysis

Taxonomical categorization (resume): https://urbansounddataset.weebly.com/taxonomy.html
"A Dataset and Taxonomy for Urban Sound Research": http://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_urbansound_acmmm14.pdf
Basics of Digital Audio: https://theproaudiofiles.com/digital-audio-101-the-basics/
Reading wave file headers with Python: https://www.cameronmacleod.com/blog/reading-wave-python
The Wave PCM file specification: http://soundfile.sapp.org/doc/WaveFormat/

2- Data pre-processing

The Nyquist theorem: https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem
Sampling : Signal Digitalization: http://support.ircam.fr/docs/AudioSculpt/3.0/co/Sampling.html
Digital audio conversion: what is Aliasing? https://theproaudiofiles.com/digital-audio-aliasing/
Mel Frequency Cepstral Coefficient (MFCC) tutorial: http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/
Discussion on odd kernel sizes: https://datascience.stackexchange.com/questions/23183/why-convolutions-always-use-odd-numbers-as-filter-size
Discussion about normalized audio for CNNs: https://stackoverflow.com/questions/55513652/which-spectrogram-best-represents-features-of-an-audio-file-for-cnn-based-model/56727927#56727927
A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging: https://arxiv.org/abs/1709.01922

4- Model optimization

Cyclical Learning Rates for Training Neural Networks: https://arxiv.org/abs/1506.01186
CLR Keras implementation by Brad Kenstler: https://github.com/bckenstler/CLR

5- Related papers

Environmental sound classification with convolutional neural networks, Karol J. Piczak
Dilated convolution neural network with LeakyReLU for environmental sound classification, Xiaohu Zhang ; Yuexian Zou ; Wei Shi.
Deep Convolutional Neural Network with Mixup for Environmental Sound Classification, Zhichao Zhang, Shugong Xu, Shan Cao, Shunqing Zhang
End-to-End Environmental Sound Classification using a 1DConvolutional Neural NetworkSajjad Abdoli, Patrick Cardinal, Alessandro Lameiras Koerich
An Ensemble Stacked Convolutional Neural Network Model for Environmental Event Sound Recognition, Shaobo Li, Yong Yao, Jie Hu, Guokai Liu, Xuemei Yao 3, Jianjun Hu
Classifying environmental sounds using image recognition networks, Venkatesh Boddapati, Andrej Petef, Jim Rasmusson, Lars Lundberg
Environment Sound Classification Using a Two-Stream CNN Based on Decision-Level Fusion, Yu Su, Ke Zhang, Jingyu Wang, Kurosh Madani

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
classes		classes
data		data
include		include
models		models
.gitignore		.gitignore
1-data-analysis.ipynb		1-data-analysis.ipynb
2-pre-processing.ipynb		2-pre-processing.ipynb
3-cnn-model-mfcc.ipynb		3-cnn-model-mfcc.ipynb
4-cnn-model-mel_spec.ipynb		4-cnn-model-mel_spec.ipynb
5-data-augmentation.ipynb		5-data-augmentation.ipynb
6-augmented-pre-processing.ipynb		6-augmented-pre-processing.ipynb
7-cnn-model-augmented.ipynb		7-cnn-model-augmented.ipynb
LICENSE		LICENSE
README.md		README.md
apendix-1-avg-peak-amp-normalization.ipynb		apendix-1-avg-peak-amp-normalization.ipynb
appendix-2-average-amp-normalization.ipynb		appendix-2-average-amp-normalization.ipynb
appendix-3-mfcc-normalization.ipynb		appendix-3-mfcc-normalization.ipynb
appendix-4-mfcc-per-channel-normalization.ipynb		appendix-4-mfcc-per-channel-normalization.ipynb
appendix-5-normalizations.ipynb		appendix-5-normalizations.ipynb
appendix-6-log_scaled_mfcc.ipynb		appendix-6-log_scaled_mfcc.ipynb

License

GorillaBus/urban-audio-classifier

Folders and files

Latest commit

History

Repository files navigation

Urban sounds classification with Convolutional Neural Networks

Notebooks

Getting the dataset

Install required libraries

The UrbanSound8K dataset

Dataset metadata

References

Comments, suggestions and corrections always welcome

About

Resources

License

Stars

Watchers

Forks

Languages