Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low FVD scores and generating inverted samples? #16

Open
skymanaditya1 opened this issue May 18, 2022 · 10 comments
Open

Low FVD scores and generating inverted samples? #16

skymanaditya1 opened this issue May 18, 2022 · 10 comments

Comments

@skymanaditya1
Copy link

For comparisons, I am training StyleGan-V on a relatively smaller dataset of faces (faces from the how2sign dataset). In particular, I am training StyleGan-V on 10,000 videos and each video has exactly 25 frames. After training for a sufficient amount of time (when I started noticing really good perceptual results), I generated the inference, and the inference looks as follows. The first thing I observe is that the video generated is inverted. Now the orientation of the intermediate predicted/generated video keeps changing during training.
Secondly, I measured the fvd2048_16f scores using this pretrained checkpoint against the dataset on which the model was trained, and I am getting a relatively very high fvd score of ~1100. Is this expected since the model is trained on a fewer number of samples, or if there is something wrong as the inferred videos are inverted? For training on the rest of the datasets, I am able to get videos in the correct orientation (ucf, skytimelapse, rainbow jelly). Also attached below is one frame extracted from the generated video (and the perceptually quality looks good to me).

one_frame

@skymanaditya1
Copy link
Author

skymanaditya1 commented May 18, 2022

For the rainbow jelly dataset, the generated images have a white background whereas the real images have a black background. Please find some samples here --
fakes005961
reals

The training configuration is the same as described above. Did you also notice this at any point during training?

@universome
Copy link
Owner

Hi! The inverted images are being generated due to the use of differentiable augmentations (from StyleGAN2-ADA).
The white/black background happens for the same reason.
If your generator produces the inverted ones, the FVD will be very high for sure.

Typically, one just needs to train for longer to get those diffaugs sorted out (you can check the StyleGAN2-ADA paper).
For how many kimgs do you train?

A natural way to solve the issue would be to increase the dataset size, but i suspect it's not possible in your case.
You can disable any specific augmentations here. It would remove the affect, but might make it more difficult for G to learn the data since D would be winning too severely.

For RainbowJelly — note please that it's not a symmetric dataset, so it makes sense to disable mirroring for it here if you want to obtain better results on it (we didn't do this in our case to be comparable with other methods).

@skymanaditya1
Copy link
Author

skymanaditya1 commented May 22, 2022

I see, I will follow this advice and retrain the network on all the datasets. I don't remember the exact number of k images that I trained the network for, but I trained each network on a 4 GPU setup of Nvidia 2080 GTX Tis for close to 2 days with the following batch configuration and resolution --

  1. How2sign_faces -- 256 x 256, 32 batch size
  2. Rainbow jelly -- 128x128, 64 batch size
  3. Skytimelapse - 128x128, 64 batch size

I did manually invert the videos in the case of predictions in how2sign_faces and used the cal_metrics_for_dataset.py for 100 generated videos and calculated the FVD -- it came to 297. As for SkyTimeLapse, I observed an FVD score of around 51 even on the smaller dataset and the images were perceptually the best as well.

I will try to retrain the models with the mentioned augmentations check turned off and re-report the metrics and inferences.

@universome
Copy link
Owner

Ok, sounds good. Also note that computing FVD on a small amount of videos (100 instead of 2000) might lead to worse FVD values because it will think that you have mode collapse in your statistics and will penalize for that

@skymanaditya1
Copy link
Author

skymanaditya1 commented May 24, 2022

I will train with augmentations disabled on the smaller datasets in that case. I don't think generating and manually inverting 2048-generated videos would be a good idea.

@skymanaditya1
Copy link
Author

@universome I tried running with augmentation disabled using the flag augpip: noaug, and I get an AssertionError - assert c.augpipe is None or c.augpipe in augpipe_specs. From what I understand, having noaug is disabled (I can remove the assertion). Are you expecting at least one augmentation to be specified as an input?

@skymanaditya1
Copy link
Author

Would it be this particular option in the base.yaml file under configs/training?
aug: noaug # One of ['noaug', 'ada', 'fixed']

Also what should be the disc augmentation to avoid the situation being talked about at the top?

Currently I am using noaug for aug: and bgc as augpipe: . I am inclined towards changing augpip in bgc to noise though. Please let me know what you think.

@universome
Copy link
Owner

universome commented May 30, 2022

Hi! The question you are asking is somewhat difficult since it is difficult to predict how the model would perform with these or that augmentations. I believe that you would need some augmentations enabled to make your model fit a small dataset. If you want to disable augmentations completely, then you should specify aug: noaug, in which case augpipe parameter is neglected. If you want to only disable rotations, you should set rotate90=0 and rotate=0 for the bgc augmentation pipe here (or create your own augpipe, like we did for bgc_norgb).

How are your results going without any augmentations? If the model does not overfit, then you can disable them completely.

@skymanaditya1
Copy link
Author

skymanaditya1 commented Jun 1, 2022

So the augmentations specified in the "augpipe" parameter are applied. Supplying noaug to the "aug" parameter disables augmentations. There are however two other modes that can be supplied - 'ada', and 'fixed'. Do both of those augmentations also use the same augpipe parameter. I will also share the results with the "aug: noaug" parameter. Also while setting the aug parameter to noaug, I get an error that says raise UserError('--target can only be specified with --aug=ada'). Is specifying ada as the aug parameter necessary?

@universome
Copy link
Owner

There are three possible choices for augmentations: 1) no augmentations (aug: noaug), 2) fixed augmentations (aug: fixed); and 3) adaptive augmentations (aug: ada). If you choose to have adaptive augmentations, then you can set target option for it (i.e. when to increase/decrease augmentations probability). However, if you choose other augmentation types, then you should disable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants