Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets 404 Not Found #21

Open
Pudding-0503 opened this issue Jun 20, 2023 · 16 comments
Open

Datasets 404 Not Found #21

Pudding-0503 opened this issue Jun 20, 2023 · 16 comments
Assignees
Labels
question Further information is requested

Comments

@Pudding-0503
Copy link

Hello! I use the dataset from Council-GAN to train the model, and the FID results have about 2~5 deviations. So I tried to download the preprocessed dataset using the .sh file you provided in order to reproduce the results in the best possible way. However, I found that the download address of the dataset seems to be invalid?

image

image

@usert5432
Copy link
Collaborator

Thank you for reporting this issue as well, @Pudding-0503!

Yes, it seems like the dataset is gone. Let us figure out how to proceed...

@usert5432 usert5432 self-assigned this Jun 20, 2023
@usert5432 usert5432 added the bug Something isn't working label Jun 20, 2023
@usert5432
Copy link
Collaborator

Hi @Pudding-0503,

Could I ask you a follow up question please about the dataset you are referring to? It may help us to find the quickest way to resolve this issue.

In the beginning of the message, you mention that

I use the dataset from Council-GAN to train the model, and the FID results have about 2~5 deviations.

Would it be correct to assume, that the dataset you refer to, is either celeba_male2female or celeba_glasses_removal, obtained by the CouncilGAN's download script ?

@Pudding-0503
Copy link
Author

Pudding-0503 commented Jun 23, 2023

I apologise for taking a day to reply to you. @usert5432
Yes. I downloaded the dataset via CouncilGAN's script about half a month ago, then trained the model via your commands and evaluated it later using the torch-fidelity package. input1 is the fake images and input2 is the test images.
Maybe because the preprocessors are different? Maybe I should first scale the test images to 256, then perform a centre crop operation, then perform image translation and then compare the two? But the dataset image for selfie2anime is 256*256 and there is still this bias. I don't know if this is normal, maybe I'm doing something wrong?

@Pudding-0503
Copy link
Author

Here are some of my test results, Council-GAN takes too long to train...

(GFPM9Z%L~_S3HN_B)JE2

@usert5432
Copy link
Collaborator

I apologise for taking a day to reply to you. @usert5432

No worries. Please feel free to reply when it is convenient.

I downloaded the dataset via CouncilGAN's script about half a month ago

If you used the CouncilGAN's script, then we should be using the same datasets. For reference, here are the md5 hashes of our files:

$ md5sum celeba_glasses.zip celeba_male2female.zip
6c111193f06d4b5f815ca48acdda2c11  celeba_glasses.zip
8fd9759ba8028810d7c9e480906167c9  celeba_male2female.zip

If your files have the same hashes, then probably we are using the same datasets. The only processing we did -- we flattened the celeba_glasses dataset. That is, initially it was separated by gender, but we combined both genders into a single dataset.

Maybe because the preprocessors are different? Maybe I should first scale the test images to 256, then perform a centre crop operation, then perform image translation and then compare the two? But the dataset image for selfie2anime is 256*256 and there is still this bias. I don't know if this is normal, maybe I'm doing something wrong?

Unfortunately, I was not involved in the retraining of the other models, so I cannot tell what may be the problem here. I will contact the expert who performed this training, but they are on a vacation right now. It will take about a week before I get any response from them.

then trained the model via your commands and evaluated it later using the torch-fidelity package.

Previously, we have tried to document the retraining procedure in this repository https://github.com/LS4GAN/benchmarking . Could I confirm please that you refer to the commands from that repository?

@Pudding-0503
Copy link
Author

Pudding-0503 commented Jun 24, 2023

Thank you very much for your patient reply. @usert5432

If your files have the same hashes, then probably we are using the same datasets.

Unfortunately, in order to save space on the data drive, I deleted the compressed files of the dataset after unzip, so now there is no way to verify the hashes of the files...

Previously, we have tried to document the retraining procedure in this repository https://github.com/LS4GAN/benchmarking . Could I confirm please that you refer to the commands from that repository?

Yes, I used your commands to test models. Today I tested the pre-trained models you provided and the UVCGAN models is perfect, but the other models still have similar FID and KID biases, so the problem is not due to the dataset or commands, I think I must have made a mistake somewhere.
I examined the translated images from ACL-GAN and found that the image size was (256, 313) and the height was not cropped to 256. cycleGAN's translated image size was (256, 256), but the image size in the test set used for comparison was (178, 218), which may account for the difference in evaluation.

So I resized the images of celeba's test set to 256*256 and calculated the FID and KID. I got similar results, but there is still a slight difference, maybe some detail is wrong ......

image

@usert5432
Copy link
Collaborator

I examined the translated images from ACL-GAN and found that the image size was (256, 313) and the height was not cropped to 256. cycleGAN's translated image size was (256, 256), but the image size in the test set used for comparison was (178, 218), which may account for the difference in evaluation.

Yes, I think this may be the reason. We tried to make all the training and evaluation uniform. And for the evaluation part -- we used two transformations (for CelebA):

  1. Resize image, such that the smallest side of the image has size of 256 pixels. This resizing was done preserving aspect ratio.
  2. Take a center crop of the resized image of size (256, 256) pixels.

The image-to-image translation was done on such center crops. I do remember we had to modify the source code of the other models to support these transformations, but unfortunately, I am not familiar with the details. I will contact our expert for the additional details, once they returned from the vacation (end of this week).

@zhushuqi2333
Copy link

zhushuqi2333 commented Jul 3, 2023

Hello! I use the dataset from Council-GAN to train the model, and the FID results have about 2~5 deviations. So I tried to download the preprocessed dataset using the .sh file you provided in order to reproduce the results in the best possible way. However, I found that the download address of the dataset seems to be invalid?

image

image

Hello Dr, can you send me a copy of the dataset you downloaded, thank you very much!!!!!!!
This is my email address: 541971079@qq.com

@usert5432
Copy link
Collaborator

Hi @zhushuqi2333,

Hello Dr, can you send me a copy of the dataset you downloaded, thank you very much!!!!!!!

Unfortunately, I cannot provide a copy of the CelebA dataset as it was not created by us. We are reusing the dataset developed by CouncilGAN, and redistributing it would likely violate copyright laws. At this moment, we are trying to create an alternative way to obtain the CouncilGAN datasets, but it should take a few days to finalize it.

In the meantime, you can consider the following options to access the dataset:

  1. Contact the authors of the CouncilGAN paper and inquire if they can reupload the dataset.
  2. Try to reproduce it starting from the original CelebA dataset.

To the best of my understanding, to reproduce the dataset, one needs to do the following:
a. Download the raw CelebA dataset.
b. Run dataset converter script from CouncilGAN

Again, we are actively working on providing detailed instructions to help researchers reproduce the CouncilGAN's dataset, and we aim to make it available soon. We apologize for any inconvenience this may cause.

@usert5432
Copy link
Collaborator

Hi @Pudding-0503,

I wanted to provide a brief update on the reproduction of the alternative GAN results. I have contacted out to our expert regarding this matter, and they are currently investigating the issue. I will follow up this thread as soon as I receive a response from them.

@pphuangyi
Copy link
Contributor

Thank you very much for your patient reply. @usert5432

If your files have the same hashes, then probably we are using the same datasets.

Unfortunately, in order to save space on the data drive, I deleted the compressed files of the dataset after unzip, so now there is no way to verify the hashes of the files...

Previously, we have tried to document the retraining procedure in this repository https://github.com/LS4GAN/benchmarking . Could I confirm please that you refer to the commands from that repository?

Yes, I used your commands to test models. Today I tested the pre-trained models you provided and the UVCGAN models is perfect, but the other models still have similar FID and KID biases, so the problem is not due to the dataset or commands, I think I must have made a mistake somewhere. I examined the translated images from ACL-GAN and found that the image size was (256, 313) and the height was not cropped to 256. cycleGAN's translated image size was (256, 256), but the image size in the test set used for comparison was (178, 218), which may account for the difference in evaluation.

So I resized the images of celeba's test set to 256*256 and calculated the FID and KID. I got similar results, but there is still a slight difference, maybe some detail is wrong ......

image

Hi @Pudding-0503,

You mentioned that the FID and KID are different for ACL-GAN. However, I only find CycleGAN results in the attached image. I wonder whether you would share with me the ACL-GAN scores you got, too?

Thank you so much!

@pphuangyi
Copy link
Contributor

I examined the translated images from ACL-GAN and found that the image size was (256, 313) and the height was not cropped to 256. cycleGAN's translated image size was (256, 256), but the image size in the test set used for comparison was (178, 218), which may account for the difference in evaluation.

Hi @Pudding-0503,

As @usert5432 mentioned in his earlier reply, to uniform the FID/KID comparison, we did the following:

  1. create processed input (raw) image dataset:
    • resize the shorter edge to 256 with bilinear interpolation.
    • take a center 256 x 256 crop.
  2. post-process output images: all algorithms we considered output image with shorter edge 256, except for Council-GAN on glass removal (128 x 128). So, we just need to resize Council-GAN output and center crop if the longer edge is not 256.

Please let me know the problem are resolved with these pre/post processing steps. I am not sure whether the results should be complete the same, but the different shouldn't be too large.

Thank you so much for you interest in our work! 😄

@usert5432
Copy link
Collaborator

Hi @zhushuqi2333,

Hello Dr, can you send me a copy of the dataset you downloaded, thank you very much!!!!!!!

We have created a set of detailed instructions on how to recreate the missing CelebA datasets. If this issue is still relevant, you could try obtaining the CelebA datasets by using this repository: https://github.com/LS4GAN/celeba4cyclegan

@happy-hahaha
Copy link

Hello, I'm sorry to bother you again, but I'm having trouble getting to this step.
(7z x img_align_celeba_png.7z )(Extracting archive: img_align_celeba_png.7z/img_align_celeba_png.7z.016
ERROR: img_align_celeba_png.7z/img_align_celeba_png.7z.016
Can not open the file as archive
Archives: 3
OK archives: 0
Can't open as archive: 3
Files: 0
Size: 0
Compressed: 0)
I tried a lot of ways to fix it, but I found that it seems like there is something wrong with the file downloaded from the official celeba website, maybe the file is incomplete or the file is corrupted, I downloaded it a lot of times and it's like this. I've downloaded the file many times but it's still not working. I'd like to ask you if that's the reason, and how can I continue to get the dataset?

@YHRen
Copy link
Contributor

YHRen commented Jul 12, 2023

@happy-hahaha Thank you for reporting. You can check the integrity of each 7z file using md5sum. @usert5432 has provided the checksums so others can compare. Can you check if they are the same?

@happy-hahaha
Copy link

Hello, I checked the integrity of all the 7z files and found that the files are complete, but then I remembered that they were downloaded this morning through a chain, the link is here: https://pan.baidu.com/s/1dkp-d7g8Pmb5qgKuSPUZyw Extract Code: 21eq, and now I can't go in Align& Cropped Images to download img_align_celeba_png.7z file from celeba official website, after I download the 7z file from the official website and then use md5sum to check its integrity, I will get back to you with the result!

@usert5432 usert5432 added question Further information is requested and removed bug Something isn't working labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants