About Different Size between Predicted-mel and Preprocess-mel #205

ymzlygw · 2020-08-20T02:42:58Z

Hi，I am trying combine “deepvoice3-pytorch” with "wavenet_vocoder" ,which is both from your work, and I really thanks about that.

And , I extract the mel-output from deepvoice3-pytroch in synthesis.py of it on line 108 ：

with torch.no_grad():
mel_outputs, linear_outputs, alignments, done = model(
sequence, text_positions=text_positions, speaker_ids=speaker_ids)
linear_output = linear_outputs[0].cpu().data.numpy()
spectrogram = audio._denormalize(linear_output)
alignment = alignments[0].cpu().data.numpy()
mel = mel_outputs[0].cpu().data.numpy()
mel = audio._denormalize(mel)

I save the mel output in .npy file and try to use it in wavenet_vocoder. But I meet the size-mismatch , while the preporcess-mel is (X,80) and can be used for synthesis to wave, but the predicted-mel from deepvoice3 is (80,X) and has size-mismatch error.

Firstly I think it maybe the transform problems so I change the predicted-mel(80,X) to .T(X,80) , but It didn't work too.

Could you please tell me why this happens? And how to modify the size of predicted-mel from deepvoice3 so that it match the input of wavenet?

I'm really want to know about it. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Different Size between Predicted-mel and Preprocess-mel #205

About Different Size between Predicted-mel and Preprocess-mel #205

ymzlygw commented Aug 20, 2020

About Different Size between Predicted-mel and Preprocess-mel #205

About Different Size between Predicted-mel and Preprocess-mel #205

Comments

ymzlygw commented Aug 20, 2020