Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training #1

Open
inferense opened this issue Aug 25, 2020 · 3 comments
Open

Training #1

inferense opened this issue Aug 25, 2020 · 3 comments

Comments

@inferense
Copy link

Thanks for the implementation. Few questions to the training:

  1. Does the training on other RGB dataset like COCO require any other changes besides hyperparameters of the priors?
  2. When it comes to conditioning on class labels / captions, I'm not quite sure about the y=None in forward of priors. Does this need to be changed to refer to the one-hot encoded labels / captions?
    Thanks!
@kamenbliznashki
Copy link
Owner

kamenbliznashki commented Aug 27, 2020

Assuming you are referring to the VQVAE2 implementation since you mention priors. To answer your questions:

  1. Yes, you should be able to train on COCO by just creating another dataset. You only need to specify the input_dims, which is currently specified for each dataset in the fetch_vqvae_dataloader function. It is then saved to the model config and automatically scaled down to the dims of the latent maps when training each prior.
  2. y is your conditioning one-hot vector that your dataset dataloader outputs. It is set to None, if you want to sample without conditioning on a class. For all my experiments, I did condition feeding a one-hot - you see that in line 80 of vqvae.py here.

Hope this helps.

@inferense
Copy link
Author

inferense commented Aug 31, 2020

Thanks! and correct, I'm referring to VQVAE2 (sorry for not specifying earlier).

I've trained VQVAE (my own script) and extracted codes. As I'm using COCO, instead of one-hot I decided to use word embedding. Going through the vqvae_prior.py, I'm curious about the n_cond_classes value. It seems like it's mainly being used for linear transformation in the GatedResidualLayer? Any suggestions on how it might work with embedding vector instead of one-hot?

@kamenbliznashki
Copy link
Owner

y_cond_classes serves to set the dimension for a linear projection layer from a one-hot encoding of the class to the internal dimension (n_channels) of the gated residual layers - i.e. it's the size of the one-hot 'embedding'. You can set y_cond_classes to the size of your embedding vector and the model should work.

It is also used in the dataset constructor to set the size of the one hot encoding, but since you are using your own dataset constructor you don't need to worry about this bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants