Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce Table 1 IS and FID score on Birds #17

Open
yangyu12 opened this issue Aug 28, 2020 · 2 comments
Open

Reproduce Table 1 IS and FID score on Birds #17

yangyu12 opened this issue Aug 28, 2020 · 2 comments

Comments

@yangyu12
Copy link

Hi, nice work!

I am evaluating your released model on Birds dataset. Particularly, I want to get the number claimed in your paper to make sure that everything I have done is correct. However, in your paper, it is claimed IS=52.53±0.45 and FID=11.25. But I got IS=43.20±0.54 and FID=22.08. I think I might have made some mistakes in details.

What I did

I generate 30K 128x128 child images with your released model on Birds, wherein 150 images for each child category.
(1) I compute the IS with your released finetuned inception model. The generated images are resized to 299x299 normalized to be within [-1, 1] before fed into the network. Mean and std are computed over 10 splits.
(2) I compute the FID with default inception model using

calculate_fid_given_paths([/path/to/generated/images, /path/to/real/images], batch_size=1, cuda=True, dims=2048)

Note that /path/to/real/images are original CUB images without any cropping or resizing.

Evaluation codes

IS: https://github.com/sbarratt/inception-score-pytorch
FID: https://github.com/mseitzer/pytorch-fid

Questions

  1. Do you use the finetuned inception model for computing FID?
  2. Should I use the images that are cropped with 1.5x bounding boxes from original images to compute FID?
  3. Should I first resize the real images to 128x128 and then feed them into the inception network (which automatically resizes input to 299x299) to compute FID?
  4. Is the number I got lies in the normal variation? I suppose the quality of generated images may be different for different times of generation.

B.T.W. I am also curious about the results of LR-GAN you got in Table1. Do you train LR-GAN on the original CUB images or on the cropped images?

B.T.W. I am using pytorch==1.3.0. Not sure if there is any version issue.

@yangyu12
Copy link
Author

UPDATE:
I first resize the real images to 128x128 and then feed them into the inception model to compute FID. This gives me FID=12.39. I think this is quite close to the number claimed in the paper. However, I still can't get a higher IS score.

@utkarshojha
Copy link
Collaborator

Hi, apologies for the delayed response. The FID will depend on how exactly you're generating the 30k images. Are you iterating through all the child codes, and generating the images with parent/background code fixed?

And yes, you should use 1.5 times cropped real images for computing the FID.
The inception model used for FID is the model pre-trained on ImageNet. We use the fine-tuned version only for computing the Inception score, where the model should be finetuned on all the 200 categories (for birds).

As for the variation, I believe FID could still be in that range, but IS does appear to be lower than usual.
Double check the image generation process, and see if that improves the score.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants