Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering persistently poor reconstruction ability of DiffAE when used for 9-frame videos #54

Open
evaprakash opened this issue Apr 15, 2023 · 0 comments

Comments

@evaprakash
Copy link

Hello,

I'm experimenting enhancing the DiffAE model to capture the semantic information of small video clips and then to reconstruct them. The input is video clips with 9 3-channel RGB frames and the semantic code is a 512 dimensional vector. However, the model stabilizes quickly to a loss of 10e-3/10e-4 in just 20 epochs, at which point I stopped the training. I discovered that the semantic code is not learning anything. Thinking that I needed a higher quality semantic encoder for videos, I swapped the half-unit semantic encoder for an off-the-shelf preinitialized model. But this does not seem to help - the model stabilizes to a similar loss in the same number of epochs. Would you have any suggestions on likely model modifications to help the semantic encoder learn better? One thought I had was that possibly the decoder is not deep enough to decode the complex semantic codes of a video. How did you decide the dimensions of the stochastic encoder/decoder unit?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant