Audio Classifier Using DL

An audio classifier that is capable of classifying an audio as music or speech. The dataset consists of 128 tracks, each 30 seconds long. Each class (music/speech) has 64 examples. The tracks are all 22050Hz Mono 16-bit audio files in .wav format.

The raw audio is processed using discrete Fourier transform. This will return complex coordinates (real and imaginary) which will be then converted into cartesian coordinates indicating magnitude and phase. Finally the resultant values will be converted to a pseudo decibel scale. All these values will be plotted on a Frequency vs Time graph.

A sliding window is created that will snap the Xs and ys (one-hot encoding) of the graph every 250 ms. These values will be store in a list as inputs and labels for the network. All the collected inputs and labels will then be fed to the network (4 layer CNN + 2 fully connected layer).

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
libs		libs
tf_logs/run-20180809220432		tf_logs/run-20180809220432
README.md		README.md
gtzan.ipynb		gtzan.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

libs

libs

tf_logs/run-20180809220432

tf_logs/run-20180809220432

README.md

README.md

gtzan.ipynb

gtzan.ipynb

Repository files navigation

Audio Classifier Using DL

About

Releases

Packages

Languages

jaynilpatel/audio-classifier

Folders and files

Latest commit

History

Repository files navigation

Audio Classifier Using DL

About

Topics

Resources

Stars

Watchers

Forks

Languages