Skip to content

Group project to classify audio samples with Deep ML models for the Curricular Unit Machine Learning 2 on FCUP @ Porto, Portugal

Notifications You must be signed in to change notification settings

Nia3324/ML_SoundCassification_2324

Repository files navigation

ML2-Project

Authorship

Description

This project is focused on the development of deep learning models for audio classification.
The data used to design and build the models is found in the UrbanSound8K dataset , which was thoroughly used during the development of this project. This dataset contains a total of 8732 labeled audio recordings of urban sounds, each with a duration of up to four seconds. Each excerpt has been labeled with one of the following classes:

Label Class ID
air conditioner 0
car horn 1
children playing 2
dog bark 3
drilling 4
engine idling 5
gun shot 6
jackhammer 7
siren 8
street music 9

The objective of this project relied on defining, compiling, training and evaluating two Deep Learning (DL) classifiers. The DL model types to be considered were:

  • Multilayer Perceptron (MLP)
  • Convolutional Neural Network (CNN)
  • Recurrent Neural Network (RNN)

Furthermore, it was asked to realize performance evaluation on both constructed models by running 10-fold cross validation with the 10 predefined folds that come in the dataset.

Finally, an experiment was conducted with the goal of evaluating each model's robustness against adversarial examples by implementing the algorithm DeepFool.

Solutions Implemented

  • Convolutional Neural Network
  • Recurrent Neural Network with LSTM units

Project Development Phases

CNN Development and Performance Evaluation Notebook

RNN Development and Performance Evaluation Notebook

Models Robustness Assessment with DeepFool Algorithm

Convolutional Neural Network

  • Data Pre-Processing
  • Feature Extraction
    • Mel-scaled Spectrograms (2D arrays)
    • Chromagrams (2D arrays)
    • Spectral Flatness, Bandwidth, Roll-off, Centroid (1D arrays stacked)
  • CNN Architecture Definition
  • Performance Assessment
  • Architecture Changes for Overfit Prevention

Recurrent Neural Network

  • Data Pre-Processing
  • Feature Extraction
    • Log Mel-Scaled Spectrograms (2D arrays)
  • CNN Architecture Definition
  • Performance Assessment
  • Architecture Changes for Overfit Prevention

Performance Evaluation

  • 10-fold Cross Validation
  • Mean and Standard Deviation values of the Accuracy obtained in each iteration
  • Confusion Matrix

Robustness Evaluation Against Adversarial Examples

  • Implementation of the DeepFool algorithm
  • For each model of the iterations of the Cross Validation run, obtain the model's robustness using the results computed with DeepFool for each example in the corresponding test fold

Python and Libraries Versions

Python 3.10.13
Library Version
ipykernel 6.25.2
keras 2.10.0
librosa 0.10.1
matplotlib 3.7.2
numpy 1.26.0
pandas 2.1.3
soundfile 0.12.1
tensorflow 2.10.0

About

Group project to classify audio samples with Deep ML models for the Curricular Unit Machine Learning 2 on FCUP @ Porto, Portugal

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published