Improved duration modeling with relative note duration prediction #112

r9y9 · 2022-06-09T12:48:13Z

Inspired by https://arxiv.org/abs/2110.08813

From their abstract:

Finally, to improve the singing rhythm, we modify the duration predictor to specifically predict the phoneme to note duration ratio, helped with singing note normalization.

I think this is a neat idea. I am going to try it when I have time.

r9y9 · 2022-06-11T13:53:46Z

I haven't got relative duration modeling work yet, but here are other experimental results on duration modeling that may be interesting:

Dataset: Ritsu (100/5/5 songs for train/dev/eval)

Findings:

VariancePredictor (with an extra MDN layer) worked better than the normal MDN. Note that adding an extra MDN layer can benefit from Duration modeling considering variances #80.
Increasing number of Gaussians further improved the performance

Also, not can be seen in the figure though, I found adding dropout layers (which were not until now) for the MDN did improve the dev loss/RMSE. Without dropout, the normal MDN tends to overfit. I put the configs for the record below:

1: MDN

stream_sizes: [1]
has_dynamic_features: [false]
stream_weights: [1]

netG:
  _target_: nnsvs.model.MDN
  in_dim: 337
  out_dim: 1
  hidden_dim: 256
  num_layers: 3
  num_gaussians: 1

(NOTE: I hardcoded nn.Dropout(0.5) in the source code)

2: VariancePredictor (with an extra MDN layer)

stream_sizes: [1]
has_dynamic_features: [false]
stream_weights: [1]

netG:
  _target_: nnsvs.model.VariancePredictor
  in_dim: 337
  out_dim: 1
  hidden_dim: 256
  num_layers: 5
  kernel_size: 5
  dropout: 0.5
  use_mdn: true
  num_gaussians: 1

3: VariancePredictor with increased gaussian

stream_sizes: [1]
has_dynamic_features: [false]
stream_weights: [1]

netG:
  _target_: nnsvs.model.VariancePredictor
  in_dim: 337
  out_dim: 1
  hidden_dim: 256
  num_layers: 5
  kernel_size: 5
  dropout: 0.5
  use_mdn: true
  num_gaussians: 4

r9y9 added the duration model label Jun 9, 2022

r9y9 mentioned this issue Jun 11, 2022

Add nn.Dropout for MDN #114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved duration modeling with relative note duration prediction #112

Improved duration modeling with relative note duration prediction #112

r9y9 commented Jun 9, 2022

r9y9 commented Jun 11, 2022 •

edited

Improved duration modeling with relative note duration prediction #112

Improved duration modeling with relative note duration prediction #112

Comments

r9y9 commented Jun 9, 2022

r9y9 commented Jun 11, 2022 • edited

r9y9 commented Jun 11, 2022 •

edited