New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About GumbelQuantization training #67
Comments
This config is probably used for inference only. DummyLoss is simply not calculating any loss. Hope the author would release the config for training soon. As for now, I'll stick to vanilla VQ although it suffers from index collapsing. The utilization of the 16384 model is ~6% (~1000 valid codes), the 1024 model is around 50% (~500 valid codes). EDIT: The quantization loss for GumbelVQ is not big, around 0.005 if training from scratch with |
I would also like some clarity on the best KL weight for training from scratch (and whether it should be warmed up over time). |
@TomoshibiAkira Why do you expect to have a better utilization of the codes with Gumbel Quantization? |
@borisdayma Because the codebook in the f=8 GumbelVQ model does not contain invalid codes, unlike the IN model. By "invalid codes", I mean: Here's the visualization of the first 1024 codes in the f=8 GumbelVQ's codebook for comparsion. |
Thanks @TomoshibiAkira for this great explanation! |
You're welcome! @borisdayma |
Thanks for your explanation@TomoshibiAkira. |
@TomoshibiAkira wondering if you looked at codebook utilization for other models (like OpenAI dVAE). |
@borisdayma DALL-E suffers much harsher information loss than VQ. The reason might be that every code in DALL-E is an integer (or "a class"). Thus contains much less information than VQ's codes (feature vectors). As for the utilization though, DALL-E's discretion method is different from VQ. EDIT: The code for visualization, you can directly use this in the
@sczhou |
Thanks, @TomoshibiAkira. Where could I find GumbelVQ's model.yaml? I didn't see this config file in this repo. Many thanks. |
It's in the pretrained model zoo. |
@TomoshibiAkira Don't they both use a codebook where you can use either the codebook index or the corresponding feature vector? |
@borisdayma I personally don't think so. To put it in VQ's perspective, you can say all different 8192 one-hot vectors are DALL-E's codebook.
Well the GAN and perceptual loss are definitely helping, I do think even without them the VQGAN (or a plain simple VQ-VAE) could achieve better reconstruction results. |
I think the new Gumbel VQGAN type already has a DALL-E style codebook. It does indeed seem better but I think this comes down to the quantization method preventing codebook collapse. The DALL-E decoder just uses a simple 1x1 conv2d layer to transform the one-hots into feature vectors (it's a one-to-one mapping), I have opened the decoder up and used the features directly instead.
They have to be because the second stage transformer models only produce indices, not features. |
Ah, now I see. If we combine the Conv2D layer with the one-hots (only considering the output of the 1x1 Conv2D layer), it's actually the same as VQ's codebook (with or without Gumbel). The codebook here is actually the weight of the Conv2D layer. Since they're the same,
This hypothesis is invalid from the start.
Sorry if my statement is not clear before.
I'm not sure at this point. From my personal experience on an AE with VQ, with f=8/f=16, the network's behavior is vastly different from each other on the same dataset. |
Hi @TomoshibiAkira , it is really a valuable discussion! May I know if you validate the performance of f=8 without Gumbel? Actually, I just want to see the effect of Gumbel, i.e., adding Gumbel to vanilla VQ will always improve the reconstruction & codebook utilization (e.g., f=8, f=16), or there is some trade-off such as a high utilization of codebook but relatively low accuracy of code matching. If you have any idea on that? |
@fnzhan I didn't conduct the experiment so I can't give any concrete answer. Personally, I'd like to believe that Gumbel can improve the performance without any trade-off since it's basically a better method for sampling discrete data. But, I didn't dive deep into the theory part of Gumbel-Softmax so please take a grain of salt. |
I think the tradeoff is during training, you have to train longer because you have to slowly decrease the Gumbel-Softmax temperature to 0 or very near 0. But I think it is straightforwardly better during inference. |
@TomoshibiAkira @crowsonkb Thanks for sharing your insight, I am working on it recently and will update if concrete conclusion is reached. |
@fnzhan any updates? Really interested to see if there are any key improvements. |
Hi @EmaadKhwaja , I am preparing a paper regarding to it. Here is a brief observation: comparing original VQ and GumbelVQ (both f=16), the improvement with Gumbel tends to be marginal although its codebook utilization is nearly 100%. |
@fnzhan |
Thanks for your share |
hey guys, if you are still interested about optimize codebooks. i tried using the codebook with projection and l2 norm from https://arxiv.org/pdf/2110.04627v3.pdf, it works well. here is a codebook. |
@fnzhan Congratulations on your article being accepted by CVPR 2023! Would you kindly share your codes and pre-trained weights? It would help us to better understand and follow up on your work. |
Hi, guys, I have tried train another VQModel (first stage) on my own datasets, (modified the encoder, decoder a little), however when training, the vector quantization loss rises, and kl loss also rises, any suggestions? |
Thank you for the great work!
I tried to repoduce
VQGAN OpenImages (f=8), 8192, GumbelQuantization
model based on the config file from the cloud. (the detailed config file is in below.)VQGAN OpenImages (f=8), 8192, GumbelQuantization
However, I encountered some errors to train with GumbelQuantization training.
The first error was an unexpected keyword argument error as below.
I could fix this error by remove
return_pred_indices=True
from the below line.taming-transformers/taming/models/vqgan.py
Line 336 in 9d17ea6
The second error occurs because of
DummyLoss
as below.This can be fixed by changing
target: taming.modules.losses.vqperceptual.DummyLoss
totarget: taming.modules.losses.vqperceptual.VQLPIPSWithDiscriminator
.But the thing is, I not sure if
VQGAN OpenImages (f=8), 8192, GumbelQuantization
model was trained with Discriminator loss and when it was on with what parameters.Can you share the detailed config file of
VQGAN OpenImages (f=8), 8192, GumbelQuantization
model and fix above issues so that the model can be reproducible?Thank you in advance!
The text was updated successfully, but these errors were encountered: