Training #1

inferense · 2020-08-25T10:34:32Z

Thanks for the implementation. Few questions to the training:

Does the training on other RGB dataset like COCO require any other changes besides hyperparameters of the priors?
When it comes to conditioning on class labels / captions, I'm not quite sure about the y=None in forward of priors. Does this need to be changed to refer to the one-hot encoded labels / captions?
Thanks!

kamenbliznashki · 2020-08-27T11:16:30Z

Assuming you are referring to the VQVAE2 implementation since you mention priors. To answer your questions:

Yes, you should be able to train on COCO by just creating another dataset. You only need to specify the input_dims, which is currently specified for each dataset in the fetch_vqvae_dataloader function. It is then saved to the model config and automatically scaled down to the dims of the latent maps when training each prior.
y is your conditioning one-hot vector that your dataset dataloader outputs. It is set to None, if you want to sample without conditioning on a class. For all my experiments, I did condition feeding a one-hot - you see that in line 80 of vqvae.py here.

Hope this helps.

inferense · 2020-08-31T20:42:23Z

Thanks! and correct, I'm referring to VQVAE2 (sorry for not specifying earlier).

I've trained VQVAE (my own script) and extracted codes. As I'm using COCO, instead of one-hot I decided to use word embedding. Going through the vqvae_prior.py, I'm curious about the n_cond_classes value. It seems like it's mainly being used for linear transformation in the GatedResidualLayer? Any suggestions on how it might work with embedding vector instead of one-hot?

kamenbliznashki · 2020-09-03T10:39:19Z

y_cond_classes serves to set the dimension for a linear projection layer from a one-hot encoding of the class to the internal dimension (n_channels) of the gated residual layers - i.e. it's the size of the one-hot 'embedding'. You can set y_cond_classes to the size of your embedding vector and the model should work.

It is also used in the dataset constructor to set the size of the one hot encoding, but since you are using your own dataset constructor you don't need to worry about this bit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training #1

Training #1

inferense commented Aug 25, 2020

kamenbliznashki commented Aug 27, 2020 •

edited

inferense commented Aug 31, 2020 •

edited

kamenbliznashki commented Sep 3, 2020

Training #1

Training #1

Comments

inferense commented Aug 25, 2020

kamenbliznashki commented Aug 27, 2020 • edited

inferense commented Aug 31, 2020 • edited

kamenbliznashki commented Sep 3, 2020

kamenbliznashki commented Aug 27, 2020 •

edited

inferense commented Aug 31, 2020 •

edited