Speaker Recognition

Project Definition:

Identifying which celebrity is speaking through deep neural networks. Applications for this project include machines matching commands with individuals through their voices, such that in the future, the machines can anticipate personal commands.

Process:

Collected data from VoxCeleb[1], a database of celebrities' voices and images.
Extracted audio features referencing Aaqib Saeed's code

MFCC: Mel-frequency cepstral coefficients
Melspectrogram: Compute a Mel-scaled power spectrogram
Chorma-stft: Compute a chromagram from a waveform or power spectrogram. "In music, the term chroma feature or chromagram closely relates to the twelve different pitch classes. Chroma-based features, which are also referred to as "pitch class profiles", are a powerful tool for analyzing music whose pitches can be meaningfully categorized and whose tuning approximates to the equal-tempered scale." (Wikipedia)
Spectral Contrast: Compute spectral contrast. Spectral contrast is defined as the level difference between peaks and valleys in the spectrum
Tonnetz: Computes the tonal centroid features (tonnetz)

Store features in pandas dataframe and prepare data for modelling
Run models

Visualization examples of mel and tonnetz features:

Miranda Cosgrove's Mel	Smokey Robinson's Mel

Miranda Cosgrove's Tonnetz	Smokey Robinson's Tonnetz

Best Model:

The best model was an 18 layer - CNN model using selu activation function, yielding 0.73 F1-score.

Error and Validation Plots:

ROC Curve for all celebrities:

Demo:

Challenged my audience to try and guess the celebrity from an audio clipped I played. I then ran my model to try and determine the person as well. My model won with 3 more correct guesses than the audience.

Citation:
[1] A. Nagrani, J. S. Chung, A. Zisserman
VoxCeleb: a large-scale speaker identification dataset
INTERSPEECH, 2017

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
AudioRecognition.ipynb		AudioRecognition.ipynb
Error:Validation plot.png		Error:Validation plot.png
MC mel.png		MC mel.png
MC tonnetz.png		MC tonnetz.png
README.md		README.md
ROC curve.png		ROC curve.png
SR mel.png		SR mel.png
SR tonnetz.png		SR tonnetz.png
Speaker Recognition.pptx		Speaker Recognition.pptx
audio_features.ipynb		audio_features.ipynb
demo .ipynb		demo .ipynb
demo.gif		demo.gif
metadata.png		metadata.png
model.json		model.json
model_weights.h5		model_weights.h5

ptbailey/Speaker-Recognition

Folders and files

Latest commit

History

Repository files navigation

Speaker Recognition

Project Definition:

Process:

Visualization examples of mel and tonnetz features:

Best Model:

Demo:

About

Topics

Resources

Stars

Watchers

Forks

Languages