hf0

hf0: A hybrid pitch extraction method for multimodal voice

hf0 is a monophonic pitch tracker based on a shallow convolutional neural network operating over the time-domain normalized autocorrelation function. $hf_0$ works reliabilty over monophonic speech, monophonic songs, emotional speech, para-linguistic speech and infant cry signals. $hf_0$ is robust to varied noises and comparable against the state-of-the-art methods.

Dependencies

This code requires was tested in MATLAB 2018a version. The MIR Toolbox 1.7.2 is required for the execution of the program.

Execution of hf0

Execute demo.m file by replacing filename variable with the respective audio file.

Calculation of Number of Parameters in CREPE vs Proposed method

The Proposed method uses one-sixth of the parameters used in CREPE. The detailed layer-by-layer analysis is provided below. The activation, max-pooling and dropout layers consume zero parameters. The parameters in fully connected layer depend on the input and the output neurons which are updated in the table as width and height of the receptive field. The bias term included for all the layers.

CREPE
Layers	No. of Filters	Width of the Receptive Field	Height of the Receptive Field	No of Parameters
Conv1	1024	1	512	525312
Conv2	128	1	64	8320
Conv3	128	1	64	8320
Conv4	128	1	64	8320
Conv5	256	1	64	16640
Conv6	512	1	64	33280
Softmax	1	2048	360	737281
Total number of parameters				1337473

Proposed Method
Conv1	64	3	3	640
Conv2	64	3	3	640
Softmax	1	25600	9	230401
Total number of parameters				231681

Sample Experiments

Some experiments are conducted over audio files from varied datasets and hf0 is compared with the standard pYIN and CREPE based pitch estimation methods. As pitch contour is not available for all the audio samples, the estimated pitch is superimposed over the spectrogram.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
PitchExtraction.m		PitchExtraction.m
README.md		README.md
convModel.mat		convModel.mat
convnetModelNew.onnx		convnetModelNew.onnx
demo.m		demo.m
filterConstruction.m		filterConstruction.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PitchExtraction.m

PitchExtraction.m

README.md

README.md

convModel.mat

convModel.mat

convnetModelNew.onnx

convnetModelNew.onnx

demo.m

demo.m

filterConstruction.m

filterConstruction.m

Repository files navigation

hf0

hf0: A hybrid pitch extraction method for multimodal voice

Dependencies

Execution of hf0

Calculation of Number of Parameters in CREPE vs Proposed method

Sample Experiments

Pitch Contour of a neutral speech taken from CMU-ARCTIC Dataset

Pitch Contour of Crescendo singing voice taken from LYRICS Dataset.

Pitch Contour of Glissando singing voice taken from LYRICS Dataset.

Pitch Contour of Soparano singing voice taken from LYRICS Dataset.

Pitch Contour of an Anger emotion taken from Hindi Emotional Speech Corpus

Pitch Contour of an Disgust emotion taken from Hindi Emotional Speech Corpus

Pitch Contour of an Happy emotion taken from Hindi Emotional Speech Corpus

About

Releases

Packages

Languages

Pradeepiit/hf0

Folders and files

Latest commit

History

Repository files navigation

hf0

hf0: A hybrid pitch extraction method for multimodal voice

Dependencies

Execution of hf0

Calculation of Number of Parameters in CREPE vs Proposed method

Sample Experiments

Pitch Contour of a neutral speech taken from CMU-ARCTIC Dataset

Pitch Contour of Crescendo singing voice taken from LYRICS Dataset.

Pitch Contour of Glissando singing voice taken from LYRICS Dataset.

Pitch Contour of Soparano singing voice taken from LYRICS Dataset.

Pitch Contour of an Anger emotion taken from Hindi Emotional Speech Corpus

Pitch Contour of an Disgust emotion taken from Hindi Emotional Speech Corpus

Pitch Contour of an Happy emotion taken from Hindi Emotional Speech Corpus

About

Resources

Stars

Watchers

Forks

Languages