Replies: 1 comment
-
Rest notes are common in labels, just like APs and SPs in phoneme transcriptions, so escaping them doesn't seem reasonable. We did not use voiced/unvoiced masks because if we do so, we'll have to predict the masks, which can make the architecture much more complicated than now (and uv masks itself isn't very easy to extract). What we do is to interpolate the pitch curve on unvoiced parts, and we also expect the pitch model to learn this. In practice, this doesn't seem to affect the accuracy. By the way, the pitch_acc metric on TensorBoard will exclude all unvoiced frames so that is the real accuracy. If your model cannot learn well, check your labels or try to enable melody encoder, etc. |
Beta Was this translation helpful? Give feedback.
-
I've been trying out the variance model generation which is amazing!
I am struggling to get the model to learn the pitch accurately from the data though and I wonder if the sections where there are note rests may be interfering with the training data?
Would it be possible to put in a parameter to exclude the pitch samples from generating loss values where there are note rests (note_rest)?
Just before this code -
DiffSinger/training/variance_task.py
Line 202 in f958001
checking for any note rests and then scrub the sections in the pitch_pred which relate to them?
Thanks again for the project!
Beta Was this translation helpful? Give feedback.
All reactions