hf0 is a monophonic pitch tracker based on a shallow convolutional neural network operating over the time-domain normalized autocorrelation function.
This code requires was tested in MATLAB 2018a
version. The MIR Toolbox 1.7.2 is required for the execution of the program.
Execute demo.m
file by replacing filename
variable with the respective audio file.
The Proposed method uses one-sixth of the parameters used in CREPE. The detailed layer-by-layer analysis is provided below. The activation, max-pooling and dropout layers consume zero parameters. The parameters in fully connected layer depend on the input and the output neurons which are updated in the table as width and height of the receptive field. The bias term included for all the layers.
CREPE | ||||
---|---|---|---|---|
Layers | No. of Filters | Width of the Receptive Field |
Height of the Receptive Field |
No of Parameters |
Conv1 | 1024 | 1 | 512 | 525312 |
Conv2 | 128 | 1 | 64 | 8320 |
Conv3 | 128 | 1 | 64 | 8320 |
Conv4 | 128 | 1 | 64 | 8320 |
Conv5 | 256 | 1 | 64 | 16640 |
Conv6 | 512 | 1 | 64 | 33280 |
Softmax | 1 | 2048 | 360 | 737281 |
Total number of parameters |
1337473 | |||
Proposed Method | ||||
Conv1 | 64 | 3 | 3 | 640 |
Conv2 | 64 | 3 | 3 | 640 |
Softmax | 1 | 25600 | 9 | 230401 |
Total number of parameters | 231681 |
Some experiments are conducted over audio files from varied datasets and hf0 is compared with the standard pYIN and CREPE based pitch estimation methods. As pitch contour is not available for all the audio samples, the estimated pitch is superimposed over the spectrogram.