Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valid_signal_crop in validation_step? #185

Open
victor-shepardson opened this issue Feb 8, 2023 · 2 comments
Open

valid_signal_crop in validation_step? #185

victor-shepardson opened this issue Feb 8, 2023 · 2 comments

Comments

@victor-shepardson
Copy link

victor-shepardson commented Feb 8, 2023

I noticed when training causal models with RAVE v2 that the validation audio sounds pretty bad. If I'm understanding correctly, it's because V2 crops to the valid (as in convolution) portion of the signal, so the part of the reconstruction which is affected by zero padding (~2/3 of it with v2 defaults) is not trained at all. But validation_step doesn't do the same cropping, so the validation curve looks very noisy and the audio sounds bad in tensorboard.

Would it make sense to include the same cropping in validation_step?

@domkirke
Copy link
Collaborator

The cropping is only useful for the training, dropping signal with zero gradients. Cropping it in validation_step would not have that much sense, and would mess with the output dimensionality. Furthermore, audio is not related to curves ; causal configurations are unfortunately limiting the capacity of RAVE modelling, so maybe the sound quality is due to the training and configuration. don't know if @caillonantoine would have additional comments?

@victor-shepardson
Copy link
Author

Agree that this change doesn't affect training, only logging.

However I'm quite certain it works as described, I've been using it on my fork. Since the beginning part of the reconstruction gets cropped from the loss during training, I believe the model is incentivized to collapse the corresponding latents (i.e., those influenced by zero padding) to the prior. So, the beginning of the reconstruction ends up unrelated to the input. this leads to high reconstruction error when that part isn't cropped at validation time, which makes the validation curve in tensorboard noisy and unreadable. also, I'm quite certain the audio logged is affected. it's the same audio computed in validation_step which gets logged in valdiation_epoch_end, (

audio, z = list(zip(*out))
) no?

this change only shortens the logged audio, by slicing off the 'random' prior-collapsed part. but I find it easier to hear how faithful the reconstructions are this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants