Skip to content

antra0497/MLE-humming-and-whistling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

The MiLe End Hums and Whistles Machine Learning Project

This year at Queen Mary University of London we are going to create a new dataset consisting of labelled audio recordings. Each audio recording will consist of a unique interpretation of a small fragment of 8 iconic movie song.

We will consider fragments of approximately 15 seconds of duration from 8 songs. The name of the songs, the label we will use to identify them (in parenthesis, bold font) and an link to an online resource where you can listen to them are listed below:


Data Interpretations

We will record two types of interpretations of the above mentioned songs:

  • Humming.
  • Whistling.

There is no right or wrong way of humming or whistling to a song. When recording ourself, we just hum or whistle as you would normally do (da-da-da, la-la-la, hm-hm-hm, ti-ro-ri, pa-rapa…). We did not sing the lyrics.


Jupyter Notebooks

Basic solution :

Using the MLEnd Hums and Whistles dataset, build a machine learning pipeline that takes as an input a Potter or a StarWars audio segment and predicts its song label (either Harry or StarWars).

Underline Steps:

  • Importing required python libraries
  • Data Cleaning Function
  • Reading and processing Harry Potter and Starwars audio files
  • Merging and creating final dataframe
  • Feature Extraction from the audio: Power, Pitch Mean, Pitch Std., Voice Frame, Interpretation Label, Song Label
  • Data Exploration, Data Normalization, Data Split
  • Dummy check for Humming and whistling classification.
  • Model 1: SVM classifier for classifying Harry Potter or Starwars files
  • Analysing the results:
    • Training Accuracy: 0.6840277777777778
    • Validation Accuracy: 0.5874439461883408
    • Testing Accuracy:0.56

We can improve the model by including advance features of adio processing like mfcc, chroma, melody (to be included in the advance solution)

Advanced solution :

An advanced Machine Learning solution to identify different audio files

Underline Steps:

  • Data Processing of 7 songs
  • Feature Extractions: Previously we used following features from the audio data:

Power, Pitch Mean, Pitch Std., Voice Frame, Interpretation Label, Song Label

  • Advance features which we have added are:

MFFC, Chroma, Mel-fre, Contrast

  • Feature scaling using z-scoreS
  • Model 1: Modified SVM Model
  • Training Accuracy 0.5252725470763132
  • Validation Accuracy 0.38125802310654683
  • Testing Accuracy 0.39080459770114945
  • Model 2: CNN
  • Training Accuracy: 0.9856293201446533
  • Validation Accuracy: 0.43132221698760986
  • Testing Accuracy 0.41379310344827586
  • Unsupervised Gender Classification using hierarcial clustering based on mfcc feature of the audio files. SVM Model: _Referenced Paper: Gender Identification using MFCC for Telephone Applications – A Comparative Study

Common approaches for gender recognition are based on the analysis of pitch of the speech. However, gender recognition using a single feature is not sufficiently accurate for a large variety of speakers. To capture differences in both time domain and frequency domain, a set of features known as Mel-frequency cepstrum coefficients (MFCC) are used. These are widely used state-of-the-art features for automatic speech and speaker recognition. MFCC features are extracted from speech signals over a small window of 20 to 40 milliseconds. These features are also known to work efficiently in noisy environments. Due to their robust nature, they are widely used in speaker recognition tasks

About

A Machine Learning approach to classify Hum and Whistle of 7 iconic songs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published