Skip to content

MelihGulum/Music-Genre-Classification

Repository files navigation

MGC Logo

📖 Table of Contents

Table of Contents
  1. ABOUT THE PROJECT
  2. DATASET
  3. PREPROCESS
  4. DEEP LEARNING
  5. WEB APPLICATION - FLASK
  6. HOW TO RUN

📝 About The Project

This project aims to classify music genres. Music Genre Classification is an Audio Signal Processing project. Signal Processing is one of the sub-fields of Deep Learning apart from Image Processing and Natural Language Processing. The GTZAN dataset consists of "wav" audio files. The Librosa library was used to extract the features of these audio files (more on Preprocess section). Different architectures have been created to classification (NN, LSTM, CNN...).


💾 DATASET

The GTZAN dataset was used. Briefly, the data set consists of 10 classes and the CSV file contains many attributes such as MFCC, Chroma, RMS. In addition, there are two different CSV in the dataset, whose attributes are extracted on the basis of 3 seconds and 30 seconds.

The Classes of GTZAN

The Classes of GTZAN (Image by Author)


🛠️ PREPROCESS

As I said earlier, Librosa was used for feature extraction. Features in CSV were not used. I extracted my own features instead of existing features. These features are the top 13 of the MFCCs. Each data was read sequentially. At the same time, the MFCC features are extracted and their labels are respectively added to a json file.

The code cell below, can be seen how MFCC's are extracted.

y, sample_rate = librosa.load(file_path, sr=SAMPLE_RATE)
librosa.feature.mfcc(y, sample_rate, n_mfcc=13, n_fft=2048, hop_length=512)

🖥️ DEEP LEARNING

Various architectures were built for training. Some of these are ANN, Vanilla LSTM, Stacked LSTM and various CNN architectures. The best was CNN architecture. Later the model was strengthened with regularizer, normalization etc. In short, the model consisted of three Convolutional Layers and an output layer. Pooling Layer and Normalization Layers follow Conv Layer. The accuracy of the model on the test dataset is almost 80%. The model architecture can be seen below.

Model Architecture

Model Architecture (Image by Author)


🚀 WEB APPLICATION - FLASK

After the Deep Learning part was over, it was time for the Web Application part. Flask was used to do this. The Web Application consists of 4 pages. These are Home (where the audio file is uploaded), Project (a brief description of the project here), About (dedicated to the team), and finally the Contact page.

📦 DATABASE

Users can contact the team on the Contact Page. After users submit the form, various information is saved/logged in the MySQL database and mailed to the predefined email address.

SQL query that saves data to MySQL database:

CREATE TABLE contacts (
	id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
	fullname VARCHAR(30) NOT NULL,
	email VARCHAR(30) NOT NULL,
	phone_number VARCHAR(50),
	url VARCHAR(50),
	message VARCHAR(200),
	reg_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
	);

🎵 MP3

The model cannot predict MP3 audio files. That's why FFMPEG was used ( ⚠️ but first, path must be saved on system environments). FFMPEG converts uploaded mp3 files to wav files.

📷 SCREENSHOTS

Screenshots


🏃 HOW TO RUN

1.Fork this repository.
git clone https://github.com/MelihGulum/Music-Genre-Classification.git

2.Load the dependencies of the project

pip install -r requirements.txt

3.Now you can run project.

flask --app MGC_flask.py --debug run

About

This project aims to classify music genres. CNN architecture and GTZAN dataset were used for model training. Finally, a Web Application was made with Flask.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published