The performance gap between using vanilla 3dVAE and Casual3dCNN #236

Sutongtong233 · 2024-04-15T03:23:11Z

Hi, I am interested in the performance of using 3d CNN without casual for VAE training, and corresponding results of generation. As I understand, the propose of Casual3dCNN is for autoregressive model of stage two in MagViT. Since our stage two is still diffusion, is there any consideration of using Casual3dCNN?

LinB203 · 2024-04-18T03:43:41Z

We are currently just inflating 2DVAE to 3DVAE and replacing conv3d with causalconv3d. this already supports simultaneous training of images and videos.

Sutongtong233 changed the title ~~Have you~~ The performance gap between using vanilla 3dVAE and Casual3dCNN Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance gap between using vanilla 3dVAE and Casual3dCNN #236

The performance gap between using vanilla 3dVAE and Casual3dCNN #236

Sutongtong233 commented Apr 15, 2024 •

edited

LinB203 commented Apr 18, 2024

The performance gap between using vanilla 3dVAE and Casual3dCNN #236

The performance gap between using vanilla 3dVAE and Casual3dCNN #236

Comments

Sutongtong233 commented Apr 15, 2024 • edited

LinB203 commented Apr 18, 2024

Sutongtong233 commented Apr 15, 2024 •

edited