Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between prepare_celeba_tfrecords.py and prepare_celeba_hq_tfrecords.py #76

Open
udithhaputhanthri opened this issue Feb 22, 2021 · 2 comments

Comments

@udithhaputhanthri
Copy link

@podgorskiy

Hi, Thanks for the great paper.

I wonder what is the aim of prepare_celeba_hq_tfrecords.py comparing to prepare_celeba_tfrecords.py.

I have successfully generated celeba dataset using the above ** prepare_celeba_tfrecords.py** script. Model training using those tfrecords also was perfect.

But when it comes to Celeba-HQ dataset, even though prepare_celeba_hq_tfrecords.py is able to generate the tfrecords (~230GB), training was not started properly. Basically, training will be terminated when the script calling batches = make_dataloader()

So I have changed the prepare_celeba_tfrecords.py a bit to accommodate CELEBA-HQ dataset. The changes I have done is,

  1. removed all the preprocessing/ dataset organizing parts which uses those list_eval_partition.txt, identity_CelebA.txt files in celeba dataset.
  2. CELEBA-HQ images were zipped and reshaped to 256x256 when reading
  3. for loop (for i in range(5)) in prepare_celeba_tfrecords.py is changed to for i in range(6) to accommodate the extra resolution level.

By doing these changes, I was able to generate the tfrecords with 2-> 8 resolution levels as in CELEBA-HQ config file and, the training was also perfectly running. Generated images also realistic.

But here my concern is, my generated tfrecords are only ~11GB but in previous case (generating tfrecords with prepare_celeba_hq_tfrecords.py), it was ~230GB (train and test).

So I would like to know that where this large dataset difference is coming from ?

@udithhaputhanthri
Copy link
Author

@podgorskiy
I have found that,

  1. prepare_celeba_hq_tfrecords.py script generated the data of resolution levels: 1024, 512, 256, 128, 64, 32, 16 (7 levels)
  2. prepare_celeba_tfrecords.py script generated the data of resolution levels: 128, 64, 32, 16, 8, 4 (6 levels)

I think this causes the larger size (~230GB) of the CELEBA-HQ tfrecords. But now I am having the problem that is, should the model is trained using 1024x1024 CELEBA-HQ dataset or 256x256 dataset. After going through the paper, I thought it should be 256, but in the prepare_celeba_hq_tfrecords.py, below 34 line uses the 1024x1024 data. It will cause the generation of the above ~230GB dataset.

image

I will be really thankful if you can give me a clue about what happened here.

@udithhaputhanthri
Copy link
Author

udithhaputhanthri commented Feb 24, 2021

accidentally closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant