Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved duration modeling with relative note duration prediction #112

Open
r9y9 opened this issue Jun 9, 2022 · 1 comment
Open

Improved duration modeling with relative note duration prediction #112

r9y9 opened this issue Jun 9, 2022 · 1 comment

Comments

@r9y9
Copy link
Collaborator

r9y9 commented Jun 9, 2022

Inspired by https://arxiv.org/abs/2110.08813

From their abstract:

Finally, to improve the singing rhythm, we modify the duration predictor to specifically predict the phoneme to note duration ratio, helped with singing note normalization.

I think this is a neat idea. I am going to try it when I have time.

@r9y9
Copy link
Collaborator Author

r9y9 commented Jun 11, 2022

I haven't got relative duration modeling work yet, but here are other experimental results on duration modeling that may be interesting:

Dataset: Ritsu (100/5/5 songs for train/dev/eval)

スクリーンショット 2022-06-11 22 42 48

Findings:

  • VariancePredictor (with an extra MDN layer) worked better than the normal MDN. Note that adding an extra MDN layer can benefit from Duration modeling considering variances #80.
  • Increasing number of Gaussians further improved the performance

Also, not can be seen in the figure though, I found adding dropout layers (which were not until now) for the MDN did improve the dev loss/RMSE. Without dropout, the normal MDN tends to overfit. I put the configs for the record below:

1: MDN

stream_sizes: [1]
has_dynamic_features: [false]
stream_weights: [1]

netG:
  _target_: nnsvs.model.MDN
  in_dim: 337
  out_dim: 1
  hidden_dim: 256
  num_layers: 3
  num_gaussians: 1

(NOTE: I hardcoded nn.Dropout(0.5) in the source code)

2: VariancePredictor (with an extra MDN layer)

stream_sizes: [1]
has_dynamic_features: [false]
stream_weights: [1]

netG:
  _target_: nnsvs.model.VariancePredictor
  in_dim: 337
  out_dim: 1
  hidden_dim: 256
  num_layers: 5
  kernel_size: 5
  dropout: 0.5
  use_mdn: true
  num_gaussians: 1

3: VariancePredictor with increased gaussian

stream_sizes: [1]
has_dynamic_features: [false]
stream_weights: [1]

netG:
  _target_: nnsvs.model.VariancePredictor
  in_dim: 337
  out_dim: 1
  hidden_dim: 256
  num_layers: 5
  kernel_size: 5
  dropout: 0.5
  use_mdn: true
  num_gaussians: 4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant