Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to control upsample scales #169

Open
james20141606 opened this issue Sep 29, 2019 · 12 comments
Open

how to control upsample scales #169

james20141606 opened this issue Sep 29, 2019 · 12 comments
Labels

Comments

@james20141606
Copy link

james20141606 commented Sep 29, 2019

I use the default setting of [4,4,4,4] in 20180510_mixture_lj_checkpoint_step000320000_ema.json for umsample parameters, and I got an error from

if c is not None and self.upsample_net is not None:
            c = self.upsample_net(c)
            assert c.size(-1) == x.size(-1)

in wavenet.py
I print the c and x size out: torch.Size([2, 32, 19968]), torch.Size([2, 1, 9984])
it seems its twice the size, I tried to change the parameters to [2,4,4,4] but it did not work.
Or should I change other parameters?

@james20141606
Copy link
Author

by the way I customized some parameters in json file as:

{
  "name": "wavenet_vocoder",
  "builder": "wavenet",
  "input_type": "raw",
  "quantize_channels": 65536,
  "sample_rate": 16000,
  "silence_threshold": 2,
  "num_mels": 32,
  "fmin": 125,
  "fmax": 7600,
  "fft_size": 1024,
  "hop_size": 128,
  "frame_shift_ms": null,
  "min_level_db": -100,
  "ref_level_db": 20,
  "rescaling": true,
  "rescaling_max": 0.999,
  "allow_clipping_in_normalization": true,
  "log_scale_min": -32.23619130191664,
  "out_channels": 30,
  "layers": 24,
  "stacks": 4,
  "residual_channels": 512,
  "gate_channels": 512,
  "skip_out_channels": 256,
  "dropout": 0.050000000000000044,
  "kernel_size": 3,
  "weight_normalization": true,
  "cin_channels": 32,
  "upsample_conditional_features": true,
  "upsample_scales": [
    2,
    4,
    4,
    4
  ],
  "cin_pad": 2,
  "freq_axis_kernel_size": 3,
  "gin_channels": -1,
  "n_speakers": 1,
  "pin_memory": true,
  "num_workers": 2,
  "test_size": 0.0441,
  "test_num_samples": null,
  "random_state": 1234,
  "batch_size": 2,
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "adam_eps": 1e-08,
  "amsgrad": false,
  "initial_learning_rate": 0.001,
  "lr_schedule": "noam_learning_rate_decay",
  "lr_schedule_kwargs": {},
  "nepochs": 2000,
  "weight_decay": 0.0,
  "clip_thresh": -1,
  "max_time_sec": null,
  "max_time_steps": 10000,
  "exponential_moving_average": true,
  "ema_decay": 0.9999,
  "checkpoint_interval": 10000,
  "train_eval_interval": 10000,
  "test_eval_epoch_interval": 5,
  "save_optimizer_state": true
}

Could you help to see what's wrong with the setting?

@james20141606
Copy link
Author

I think I solved it, I found that although I changed the upsample parameters to [2,4,4,4], the train.py did not receive the parameters, so I change the codes in build_model from

upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad

to

upsample_params = hparams.upsample_params
upsample_params["cin_channels"] = hparams.cin_channels
upsample_params["cin_pad"] = hparams.cin_pad
upsample_params['upsample_scales'] = hparams.upsample_scales

and this time the hparams.upsample_params can pass the upsample scale parameters from json file

@r9y9
Copy link
Owner

r9y9 commented Sep 29, 2019

As noted in

"upsample_scales": [4, 4, 4, 4], # should np.prod(upsample_scales) == hop_size
, np.prod(upsample_scales) must be equal to hop_size. This is the reason you got the assertion error.

Looks like you are using an old json file. Top-level upsample_scales doesn't exist anymore (It did in v0.1.1 though)

@r9y9
Copy link
Owner

r9y9 commented Sep 29, 2019

Ah, I haven't updated https://github.com/r9y9/wavenet_vocoder/tree/c0ac05e41f9f563421172034e9398633df172b4f/presets, which may confuse you. I will simply delete them.

@james20141606
Copy link
Author

I used the json file you provided in Hyper params URL in Pre-trained models. Do you mean we do not need the upsample_scales parameters anymore? Could you provide the new json file? I encounted the similar upsample problems when I tried to use trained model to synthesize audio files, it seems that the upsampled c's size(-1) in line 276 in wavenet.py does not match with T

@r9y9
Copy link
Owner

r9y9 commented Sep 30, 2019

For pretrained models, please checkout the specific git commit as noted in README.

@james20141606
Copy link
Author

Yeah I checkout to the specific version while trying synthesis. But for training a new model use my own data I think I kind of mixed the older version with specific version.
For the error, for one case, I have a c with size(-1) 1016 and after upsample it's 129536 which ratio is 127.49606299212599, it does not match the hop size 128 I provided.
The weird thing is I think I use the same parameters and wavenet.py in my train.py and it also use upsampling and it runs well. I am not sure why the upsample fails in synthesis.py part

@james20141606
Copy link
Author

hey, I'd like to ask again that although the model can be trained smoothly on the specific upsample scale, the model can't be used to synthesize the audio using same json file since the upsample network did not give input audio c exact upsample scales (for me it gives 127.xxxx instead of 128). I am not sure what may cause this problem.

@r9y9
Copy link
Owner

r9y9 commented Oct 4, 2019

# ensure length of raw audio is multiple of hop_size so that we can use
# transposed convolution to upsample
out = out[:N * audio.get_hop_size()]
assert len(out) % audio.get_hop_size() == 0

If you use our preprocessing script, upsampling is expected to work correctly.

I'm not really sure what you are hitting. You might want to try pdb or ipdb debugging to isolate your problem.

@james20141606
Copy link
Author

Hey, I tried to see what happened to upsample_net, I found that when specifying scales to [2, 4, 4, 4] (which supposed to upsample 128). But during training when I print the c.size(-1),x.size(-1) in wavenet.py before and after line 196, I found that the upsample scales are not 128 (for example: torch.Size([2, 32, 82]) and torch.Size([2, 32, 9984])), but fortunately c.size(-1),x.size(-1) matches

However, during synthesis which using codes in wavenet.py line 275

c = self.upsample_net(c)
assert c.size(-1) == T

this time the upsample_net won't produce c.size(-1) == T

@james20141606
Copy link
Author

I did some further debugging and there are still something confusing me:
at first in synthesis.py it seems batch_wavegen function's parameter has some problems when applying it in line 243.
then I found that the length mismatch may due to the cin_pad? the cin_pad made len(x)/lem(c) != hopsize. and upsample_net(c) does not produce same length with x. I am not sure how to deal with it.

@stale
Copy link

stale bot commented Dec 9, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants