Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance gap between using vanilla 3dVAE and Casual3dCNN #236

Open
Sutongtong233 opened this issue Apr 15, 2024 · 1 comment
Open

Comments

@Sutongtong233
Copy link

Sutongtong233 commented Apr 15, 2024

Hi, I am interested in the performance of using 3d CNN without casual for VAE training, and corresponding results of generation. As I understand, the propose of Casual3dCNN is for autoregressive model of stage two in MagViT. Since our stage two is still diffusion, is there any consideration of using Casual3dCNN?

@Sutongtong233 Sutongtong233 changed the title Have you The performance gap between using vanilla 3dVAE and Casual3dCNN Apr 15, 2024
@LinB203
Copy link
Member

LinB203 commented Apr 18, 2024

We are currently just inflating 2DVAE to 3DVAE and replacing conv3d with causalconv3d. this already supports simultaneous training of images and videos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants