Drums-app

Play Drums in your Browser.

Drums-app allows you to simulate in your browser any percussion instrument, by using only your Webcam. All machine learning models run locally, so no user information is sent to the server.

Check the demo at drums-app.com

Quick Start

Simply run the src/index.html in server mode, or enter at drums-app.com.

Select Set Template for building your own drums template by uploading some images and attaching your sounds to them.

Turn on your webcam and enjoy it!

*No cats were harmed during this recording

Implementation Details

This web application is built with and .
The pipeline uses two Machine Learning models.

Hands Model: A Computer Vision model offered by for detecting 21 landmarks for each hand (x, y, z).
HitNet: An LSTM model that has been developed in for this application and then converted to . It takes the last N positions of a hand and predicts the probability of this sequence to correspond with a Hit.

HitNet Details

Building the Dataset

The dataset used for training has been built in the following way:

A representative landmark (Index Finger Dip [Y]) of each detected hand is plotted in an interactive chart, using .
Any time that a key is pressed, a grey mark is plotted on the same chart.
I start to play drums with one hand while pressing a key on the keyboard (with the other hand) every time that I beat an imaginary drum. [Gif Left]
I use the mouse for selecting in the chart those points that should be considered as a hit. [Gif Right]
When click the "Save Dataset" button, all hand positions together with their correspondent tags (1 if the frame was considered a hit or 0 otherwise) are downloaded as a JSON file .

Defining the Architecture

HitNet has been built in , using , and then exported to . In order to not produce any dissonance between the hit on the drum and the produced sound HitNet must run as fast as possible, for this reason it implements an extremely simple architecture.

It takes as input the 4 last detections of a hand [Flatten version of its 21 landmarks (x,y,z)] and outputs the probability of this sequence to correspond with a hit. It is only composed by an LSTM layer followed by a ReLU activation (using dropout with p = 0.25) and a Dense output layer with only 1 unit, followed by a sigmoid activation.

Training the model

HitNet has been trained in , using the following parameterization:

Epochs: 3000.
Optimizer: Adam.
Loss: Weighted Binary Cross Entropy*.
Training/Val Split: 0.85-0.15.
Data Augmentation:

Mirroring: X axis.
Shift: Shift applied in block for the whole sequence.

X Shift: ±0.3.
Y Shift: ±0.3.
Z Shift: ±0.5.

Interframe Noise: Small shift applied independently to each frame of the sequence.

Interframe Noise X: ±0.01.
Interframe Noise Y: ±0.01.
Interframe Noise Z: ±0.0025.

Intraframe Noise: Extremely small shift applied independently to each single part of a hand.

Intraframe Noise X: ±0.0025.
Intraframe Noise Y: ±0.0025.
Intraframe Noise Z: ±0.0001.

The weights exported to are not the ones of the last epoch, but the ones that maximized the Validation Loss at any intermediate epoch.

*Loss is weighted since the positive class is extremely underrepresented in the training set.

Analyzing Results

Confusion matrices show that results are pretty high for both classes putting the confidence threshold at 0.5.

Despite these False Positives and False Negatives could worsen the user experience in a network that is executed several times each second, it does not really affect the playtime in a real situation. It is due to three factors:

Most False Positives come from the frames anterior or posterior to the hit. In practice, it is solved by emptying the sequence buffers every time that a hit is detected.
The small amount of False Negatives detected in the train set comes from Data Augmentation or because it is detected on the previous or the following frame. In real cases, these displacements does not affect to the experience.
The rest of False Positives does not use to appear in real cases since, during playtime, only the sequences including detections entering in the predefined drums are analyzed. In practice it works as double check for the positive cases.

Evolution of the Train/Validation Loss during training confirms that there has been no overfitting.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.github		.github
documentation		documentation
src		src
test_resources		test_resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

documentation

documentation

src

src

test_resources

test_resources

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Drums-app

Quick Start

Implementation Details

HitNet Details

Building the Dataset

Defining the Architecture

Training the model

Analyzing Results

About

Releases

Sponsor this project

Packages

Languages

License

Eric-Canas/Drums-app

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Implementation Details

HitNet Details

Building the Dataset

Defining the Architecture

Training the model

Analyzing Results

About

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

Languages