Skip to content

Our goal is to push the general performance of music genre recognition forward and introduce a new method for pre-processing which allows for faster experimentation and model tuning in the future. We experimented with two different musical representations: mel-spectrograms and manually extracted features.

Notifications You must be signed in to change notification settings

AugusteLef/MGC-2022

Repository files navigation

Music Genre Classification Project

The purpose of this repository is to offer an overview of the methods used during our project and the possibility to reproduce our experiments and results presented in the report.

File Structure

Some scripts may assume the following file-structure (you might have to create missing directories):

Directories

  • Datasets/ : Directory containing all training-, test- and preprocessed data (and original data)

  • Datasets/fma_medium/ : Directory containing all the original data from the FMA dataset

  • Datasets/fma_metadata/ : Directory containing all the metadata of the FMA dataset

  • Datasets/preprocess_mfcc/ : Directory containing 3 subfolders with 30s, 10s and 3s cuts after pre-processing and folder preparation

  • Models/ : Directory containing 3 subdirectories with all the models you train (we add some pre-trained models from our experiments FYI)

  • Models/30sec/ : Directory containing all the models you train with 30sec inputs

  • Models/10sec/ : Directory containing all the models you train with 10sec inputs

  • Models/3sec/ : Directory containing all the models you train with 3sec inputs

  • Figures/ : Directory containing the confusion matrix and the training history (loss and accuracy) in .png format evaluation

  • Results/ : Directory containing .txt files with the accuracy of each previously evaluated model

  • ManualFeatures/ : Directory containing a small subproject for the dimensionality reduction within the Librosa features.

Scripts

  • preprocessing_melspect.py : Script running the preprocessing pipeline
  • training.py : Script allowing to train any models presented in the paper
  • evaluate.py : Script allowing to evaluate any models and save .png and .txt of the results in the corresponding directory

Additionals

Datasets/, Models/, Figures/, and Results/ are empty directories at the beginning. But in order to avoid any issues during the reproduction of our experiments we added some figures and results already obtained to ensure the file structure of the project.

Melspectrogram and CRNN methods

Datasets

[All the (preprocessed)-datasets used for our experiments were too large to be added on Polybox. Therefore in order to run the experiment you will first have to download the original datasets and run the preprocessing script]

Download the FMA dataset, and the metadata:

  1. fma_medium.zip: 25,000 tracks of 30s, 16 unbalanced genres (22GiB)
  2. fma_metadata.zip

Move them to the Datasets/ directory and to the ManualFeatures/data/ directory:

unzip fma_medium.zip

mv fma_medium/* Datasets

unzip fma_metadata.zip

cp -r fma_metadata/ ManualFeatures/data/
mv fma_metadata/* Datasets

The Datasets/fma_metadata/ directory should contain the following files:

  • tracks.csv: per track metadata such as ID, title, artist, genres, tags and play counts, for all 106,574 tracks.
  • genres.csv: all 163 genres with name and parent (used to infer the genre hierarchy and top-level genres).
  • features.csv: common features extracted with librosa.
  • echonest.csv: audio features provided by Echonest (now Spotify) for a subset of 13,129 tracks.

the Datasets/fma_medium/ directory should contain the following files

  • 156 folders: each cointaining tracks in .mp3 format

Running Our Experiments

Guidelines for running our experiments are presented here. We assume that the git-directory has been cloned, that the correct file structure has been set up (i.e. adding missing directories according to the description above) and that the datasets have been downloaded and put in the Datasets/ directory.

Further Preparations

Create and start virtual environment:

python3 -m venv venv
source activate venv

Install dependencies (make sure to be in venv):

pip install -r requirements.txt

Preprocessing

Before running the preprocessing, ensure that Datasets/ contains de following directory:

  • fma_medium/
  • fma_metadata/
  • preprocess_mfcc/

You may need to create the last folder yourself with the following command line:

cd Dadasets/
mkdir preprocess_mfcc
cd ..

You are now ready to run the preprocessing script that will build 30sec, 10sec, and 3sec datasets with the correct architecture needed for the rest of the project. To do so, run the following command line.

python3 preprocessing_melspect.py

Note that we set up preprocessing_melspect.py to reproduce the exact experiments we performed during the project. However, if you want to try different cuts, modify the hyperparameters used for the mel-spectrograms generation, and more; you can modify the global variables at the top of the preprocessing file. Disclaimer: we ensured that the code is reliable for 10s and 3s cuts, using other cut lengths might lead to some kind of error in the process.

Training

Before training the models, you need to ensure that you have the Models/30sec/ , Models/10sec/ , Models/3sec directories in your strucutre. You can run the following command line to create if needed:

mkdir Models
cd Models/
mkdir 30sec
mkdir 10sec
mkdir 3seec
cd ..

Using training.py you can train any models we used during our experiments. To make the process easier we build the script such that it takes different arguments allowing you to train different model architectures :

  • "-30sec" or "-10sec" or "-3sec" : chose if you want to train the model with 30, 10 or 3 seconds samples (mandatory)
  • "-4c" or "-3c" or "-3c" : chose the number of convolution block in the model (mandatory)
  • "-l1" or "-l2" : chose the regularization loss you want to use, l1 loss or l2 loss (optional)
  • "-lrs" : if you want to use the learning rate scheduler (optional)
  • "-gru2" : if you want add a second consecutive GRU layer (optional)
  • "-ep20" : if you want to run only 20 epochs instead of 50 epochs (optional)

When the training is done, the model as well as the history of the training process are stored in the Models/ directory.

Example to train a model for 30sec samples with 4 convolution blocks, l2 loss, a learning rate scheduler, 2 consecutive gru and only 20 epochs:

python3 training.py -30sec -4c -l2 -lrs -gru2 -ep20

If you want to run the model from the CRNN for Music Classification paper by Keunwoo Choi & al. you need to use the specific argument:

  • "-papermodel"

This argument can be combined ONLY with the size of the sample you want to use.

Example: train the model from the paper with 30sec samples:

python3 training.py -30sec -papermodel

Again, note that we ensure that the script allows you to reproduce our exact experiments. If you want to try other cut lengths or architecture you might need to modify the script according to your needs. Also, you might want to use different batch sizes or number of epochs (we forced 32 and 50/20 as we obtained the best results with this configuration). To do so you can modify the global variables at the beginning of the script.

Evaluation

Once a model has been trained, you can now evaluate it. We offer a script that:

  1. Evaluate a given model and save the results in a .txt
  2. Allow you to use our voting system (Divide and Conquer) on models trained with 10s and 3s samples
  3. Save the confusion matrix and the training history (loss and accuracy) in .png format

The script takes as inputs the following arguments:

  • "model_name" : the name of the model you want to evaluate aka. the name of the model's directory saved after the training step (mandatory)
  • "-30sec", "-10sec" or "-3sec" : the sample's size used to train the model you want to evaluate (mandatory)
  • "-voting" : if you want to apply the voting methods (divide and conquer) --> only possible for models trained with 10s or 3s samples (optional)

Example: evaluate the model trained with 4 convolution blocks on 20 epochs with 10sec samples, and using the voting (divide and conquer) method:

python3 evaluate.py "4conv_20epochs" -10 -voting

Warning: as we plot the different figures in time, you might have to close them to continue the process

Feature-based Method and Latent Space Representation

For our setup in the feature-based approach, we used the PyTorchLightning-Hydra Template. The entire structure can be found within the ManualFeatures/ directory and run directly from there. For exact instructions on how to run the project and adjust configs, we refer to the Template documentation (link above).

You can find the experiment runs under ManualFeatures/runs/, where:

  • wandb_export_wd_search.csv contains the runs we used to select appropriate hyperparameters
  • wandb_export_ae.csv contains our experiment runs with our auto-encoder architecture

Final Note

If you want to know more about our preprocessing methods, model architectures, results and more; please refer to our report. You can also have a look at our code (available in this repository) for a better understanding of the different processes that have been executed.

@authors Auguste, Marc and Lukas

About

Our goal is to push the general performance of music genre recognition forward and introduce a new method for pre-processing which allows for faster experimentation and model tuning in the future. We experimented with two different musical representations: mel-spectrograms and manually extracted features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published