`valid_signal_crop` in `validation_step`? #185

victor-shepardson · 2023-02-08T22:17:07Z

I noticed when training causal models with RAVE v2 that the validation audio sounds pretty bad. If I'm understanding correctly, it's because V2 crops to the valid (as in convolution) portion of the signal, so the part of the reconstruction which is affected by zero padding (~2/3 of it with v2 defaults) is not trained at all. But validation_step doesn't do the same cropping, so the validation curve looks very noisy and the audio sounds bad in tensorboard.

Would it make sense to include the same cropping in validation_step?

The text was updated successfully, but these errors were encountered:

domkirke · 2023-12-20T17:27:21Z

The cropping is only useful for the training, dropping signal with zero gradients. Cropping it in validation_step would not have that much sense, and would mess with the output dimensionality. Furthermore, audio is not related to curves ; causal configurations are unfortunately limiting the capacity of RAVE modelling, so maybe the sound quality is due to the training and configuration. don't know if @caillonantoine would have additional comments?

victor-shepardson · 2023-12-21T00:03:15Z

Agree that this change doesn't affect training, only logging.

However I'm quite certain it works as described, I've been using it on my fork. Since the beginning part of the reconstruction gets cropped from the loss during training, I believe the model is incentivized to collapse the corresponding latents (i.e., those influenced by zero padding) to the prior. So, the beginning of the reconstruction ends up unrelated to the input. this leads to high reconstruction error when that part isn't cropped at validation time, which makes the validation curve in tensorboard noisy and unreadable. also, I'm quite certain the audio logged is affected. it's the same audio computed in validation_step which gets logged in valdiation_epoch_end, (

RAVE/rave/model.py

Line 457 in b67a187

audio, z = list(zip(*out))

) no?

this change only shortens the logged audio, by slicing off the 'random' prior-collapsed part. but I find it easier to hear how faithful the reconstructions are this way.

victor-shepardson mentioned this issue Feb 9, 2023

valid crop at validation time #187

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`valid_signal_crop` in `validation_step`? #185

`valid_signal_crop` in `validation_step`? #185

victor-shepardson commented Feb 8, 2023 •

edited

domkirke commented Dec 20, 2023

victor-shepardson commented Dec 21, 2023

valid_signal_crop in validation_step? #185

valid_signal_crop in validation_step? #185

Comments

victor-shepardson commented Feb 8, 2023 • edited

domkirke commented Dec 20, 2023

victor-shepardson commented Dec 21, 2023

`valid_signal_crop` in `validation_step`? #185

`valid_signal_crop` in `validation_step`? #185

victor-shepardson commented Feb 8, 2023 •

edited