Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results for CausalConv3d #11

Open
Epiphqny opened this issue Nov 16, 2023 · 8 comments
Open

The results for CausalConv3d #11

Epiphqny opened this issue Nov 16, 2023 · 8 comments

Comments

@Epiphqny
Copy link

Hi @lucidrains , thanks for your awesome work! I used your causal conv implementation and trained on a video vqgan network. The results are as follows:
Original clip sequence:
36500_image
The reconstructed clip sequence:
36500_image_prime
I've noticed that the reconstruction seems to heavily rely on the initial frame. As the sequence progresses, the clarity of the images appears to diminish, leading to a more blurring effect with each subsequent frame. Could you provide any insights into this phenomenon? Thank you for your time and assistance!

@lucidrains
Copy link
Owner

@Epiphqny wow Yuqing! those results do not look half bad! i'll have to think about your results a bit more. so this work builds upon the cvivit from the phenaki paper. in that paper, i believe they encode the first frame separately from the rest (to allow for single image pretraining). however, in this work, they decide to just pad on the left and use the same encoding for the first frame vs the rest. perhaps i can add the cvivit way for the sake of comparing the two

@lucidrains
Copy link
Owner

@Epiphqny once i circle back to this, also want to craft out a few more specialized discriminators (fourier domain as well as temporal)

@lucidrains
Copy link
Owner

lucidrains commented Nov 16, 2023

@Epiphqny did you use LFQ or FSQ btw? could you share your hyperparameters?

@lucidrains
Copy link
Owner

@Epiphqny added it here if you want to run some experiments

@Epiphqny
Copy link
Author

Hi @lucidrains, thanks for your prompt response! Actually, I didn't use the LFQ or FSQ, instead, I used the quantization in CVQ-VAE https://github.com/lyndonzheng/CVQ-VAE, and extend the 2D conv to 3D causal conv like magvit2. For the training parameters, I've followed the setup used in VQGAN and initialized the weights using a CVQ-VAE model prertrained on image data. I will trained the updated code of first frame and looking forward to the updated discriminator!

@lucidrains
Copy link
Owner

@Epiphqny ohh i see! i didn't know you only used the causal conv

i'm not sure what the issue is then

@Epiphqny
Copy link
Author

@lucidrains Thanks for your response ! I will try more modules in this implementation and update the results later.

@sijeh
Copy link

sijeh commented Mar 11, 2024

@lucidrains Thanks for your response ! I will try more modules in this implementation and update the results later.

Hi @Epiphqny , Is there any progress on improving results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants