Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About training and prediction #66

Open
VishwaasHegde opened this issue Aug 22, 2020 · 5 comments
Open

About training and prediction #66

VishwaasHegde opened this issue Aug 22, 2020 · 5 comments

Comments

@VishwaasHegde
Copy link

Hello,
First of all thanks for the amazing paper and the repo !!
I have a basic doubt, the RWC Dataset says that annotated data is at semi-tone intervals, that is 50 cents.
How is CREPE able to predict with 10 or 20 cent intervals?

@jongwook
Copy link
Member

I don't have access to the dataset at the moment, but the dataset was not from the RWC dataset itself but re-synthesized vocal tracks as described in the pYIN paper, in a similar manner as the MDB-stem-synth dataset. We obtained the resynthesized files from the authors, and its labels contained continuous frequency annotations.

@VishwaasHegde
Copy link
Author

Thanks, also since you are taking just one pitch output for a frame, why are you taking 'sigmoid' activation in Dense(360, activation='sigmoid', name="classifier") for the output, would 'softmax' be a better option? I believe sigmoid is usually used for multi label classification, whereas this is multi-class classification

@jongwook
Copy link
Member

It's one of the tricks used for the approach, which is not quite orthodox for classification tasks in ML - it also uses binary cross entropy with soft labels, whereas the labels are usually one-hot in classification models.

We found that this combination (binary cross entropy with soft label) worked more robustly on pitch estimation, combined with the decoding heuristics taking the weighted average of activations near argmax.

@VishwaasHegde
Copy link
Author

Thanks for the info. May I ask how you were able to obtain soft labels? Was it labelled that way in the data itself? I have a similar dataset that has hard pitch frequency labels. The only way I can think of taking soft labels is by having a Gaussian around each pitch frequency and with a standard deviation of 5-10 cents

@jongwook
Copy link
Member

The labels I had contained Hz values (that doesn't necessarily align with semitone intervals), from which I calculated the soft labels using a Gaussian-shaped curve with a standard deviation of 25 cents. You can find an example code in the comments of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants