Skip to content

Pradeepiit/hf0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hf0

hf0: A hybrid pitch extraction method for multimodal voice

hf0 is a monophonic pitch tracker based on a shallow convolutional neural network operating over the time-domain normalized autocorrelation function. $hf_0$ works reliabilty over monophonic speech, monophonic songs, emotional speech, para-linguistic speech and infant cry signals. $hf_0$ is robust to varied noises and comparable against the state-of-the-art methods.

Dependencies

This code requires was tested in MATLAB 2018a version. The MIR Toolbox 1.7.2 is required for the execution of the program.

Execution of hf0

Execute demo.m file by replacing filename variable with the respective audio file.

Calculation of Number of Parameters in CREPE vs Proposed method

The Proposed method uses one-sixth of the parameters used in CREPE. The detailed layer-by-layer analysis is provided below. The activation, max-pooling and dropout layers consume zero parameters. The parameters in fully connected layer depend on the input and the output neurons which are updated in the table as width and height of the receptive field. The bias term included for all the layers.

CREPE
Layers No. of Filters Width of the
Receptive Field
Height of the
Receptive Field
No of Parameters
Conv1 1024 1 512 525312
Conv2 128 1 64 8320
Conv3 128 1 64 8320
Conv4 128 1 64 8320
Conv5 256 1 64 16640
Conv6 512 1 64 33280
Softmax 1 2048 360 737281
Total number of parameters
1337473
Proposed Method
Conv1 64 3 3 640
Conv2 64 3 3 640
Softmax 1 25600 9 230401
Total number of parameters 231681

Sample Experiments

Some experiments are conducted over audio files from varied datasets and hf0 is compared with the standard pYIN and CREPE based pitch estimation methods. As pitch contour is not available for all the audio samples, the estimated pitch is superimposed over the spectrogram.

Pitch Contour of a neutral speech taken from CMU-ARCTIC Dataset

ARCTIC_A0038_SPEECH_Comp

Pitch Contour of Crescendo singing voice taken from LYRICS Dataset.

B4_CRESC_U_G3_M1_Comp

Pitch Contour of Glissando singing voice taken from LYRICS Dataset.

B6_GLIS_2_NOTSURE_A_MJ_Comp

Pitch Contour of Soparano singing voice taken from LYRICS Dataset.

S3_VT_U_G4C5_F_M2_Comp

Pitch Contour of an Anger emotion taken from Hindi Emotional Speech Corpus

Anger09_Comp

Pitch Contour of an Disgust emotion taken from Hindi Emotional Speech Corpus

Disgust02_Comp

Pitch Contour of an Happy emotion taken from Hindi Emotional Speech Corpus

Happy08_Comp

About

Hybrid f0 estimation using Convolutional Neural Network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages