You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finally, to improve the singing rhythm, we modify the duration predictor to specifically predict the phoneme to note duration ratio, helped with singing note normalization.
I think this is a neat idea. I am going to try it when I have time.
The text was updated successfully, but these errors were encountered:
I haven't got relative duration modeling work yet, but here are other experimental results on duration modeling that may be interesting:
Dataset: Ritsu (100/5/5 songs for train/dev/eval)
Findings:
VariancePredictor (with an extra MDN layer) worked better than the normal MDN. Note that adding an extra MDN layer can benefit from Duration modeling considering variances #80.
Increasing number of Gaussians further improved the performance
Also, not can be seen in the figure though, I found adding dropout layers (which were not until now) for the MDN did improve the dev loss/RMSE. Without dropout, the normal MDN tends to overfit. I put the configs for the record below:
Inspired by https://arxiv.org/abs/2110.08813
From their abstract:
I think this is a neat idea. I am going to try it when I have time.
The text was updated successfully, but these errors were encountered: