nA Custom CNN for Onset Detection

This repository contains a MATLAB example for Onset Detection using a Convolutional Neural Network (CNN). The repository contains a matconvnet distribution for training/evaluation.

Training and Testing Databases

For use, please download the training and testing databases and install them in a folder in your local PC.

Training: Leveau Onset Database (available here)
Testing: Prosemus Onset Database (available here)

Evaluation

For evaluating, several learned models are included in the learned_models folder. To have a look, run onset_test.m

Input spectrograms

The spectrograms of the audio files are first extracted at different time resolutions.


_{Input spectrograms. From top to bottom: with 23ms, 46ms and 93ms time resolution.}

Dimensionality reduction

The spectrograms are then filtered using a mel-spaced frequency filter bank of 80 filters, so to reduce redundancy at the time of learning.


_{Mel-filtered Spectrum.}

This gives an input for the CNN which is comprised of three channels, each of which describes a different temporal resolution, but describing the same frequency components. A possible depiction would contain the same number of frequency bands, but smeared in the time domain. This effect is represented in the next picture, which shows an RGB representation of such an input.


_{An RGB representation of the input for the network.}

However, the CNN will look for the relationships and differences in the input space, finding relationships in both time and frequency.

Trained model

The model that is used in the example is a 10-layer convolutional network with different filter sizes and a rectifying units attached at the end of each convolutional layer. The label which was applied for training is an onset detection function.

The output of the CNN is then post-processed to deliver a generated onset detection function, which is shown below.


_{Generated onset detection function and its ground truth.}

The output shows a correct identification of the onsets which play along with the audio file.

Training

A training script is included in onset_train.m. Optionally, there is the option to use the mel-filtered spectral flux, which exploits the relationships between subsequent time bins in the spectrogram.

References

This implementation is based on: [1]. Jan Schlüter and Sebastian Böck. "Improved Musical Onset Detection with Convolutional Neural Networks." 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014.

For more information on matconvnet visit its official repository

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
images		images
onzet		onzet
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

images

images

onzet

onzet

.gitignore

.gitignore

README.md

README.md

Repository files navigation

nA Custom CNN for Onset Detection

Training and Testing Databases

Evaluation

Input spectrograms

Dimensionality reduction

Trained model

Training

References

About

Releases

Packages

Languages

caoba1/cnn-onzet

Folders and files

Latest commit

History

Repository files navigation

nA Custom CNN for Onset Detection

Training and Testing Databases

Evaluation

Input spectrograms

Dimensionality reduction

Trained model

Training

References

About

Resources

Stars

Watchers

Forks

Languages