Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling method parameters are unclear and it crashes Google Colab #38

Open
tirthajyoti opened this issue Sep 9, 2019 · 3 comments
Open
Labels
question Further information is requested

Comments

@tirthajyoti
Copy link

tirthajyoti commented Sep 9, 2019

Hi,

I am trying out this excellent implementation of tabular-GAN

While the parameters for the fit are well illustrated in the docs and the Jupyter notebook example, the sample method parameters are not.

Currently, the sample method runs very slow. That may be fine for a GAN to generate data, but is there a way to control how much iterations it will run before generating the data? Can some more documentation be added for the sample method?

For example, I was just trying out your example Notebook in Google Colab and the instance crashed after using all the Memory!

Fitting was fine. I changed the number of steps/epochs to 1000. Crash happened after 44 minutes of GPU compute for sampling. Output log showed 144328it [44:19, 54.66it/s]

Is this normal?

image1

@tirthajyoti tirthajyoti changed the title Sampling method parameters are unclear Sampling method parameters are unclear and it crashes Google Colab Sep 9, 2019
@csala
Copy link
Contributor

csala commented Sep 16, 2019

@tirthajyoti you can find all the API documentation here: https://dai-lab.github.io/TGAN/api/tgan.model.html

In particular, the sample method is here: https://dai-lab.github.io/TGAN/api/tgan.model.html#tgan.model.TGANModel.sample

As you can see, there isn't much documentation about the sampling arguments because there is only one argument, which needs no documentation: the number of samples to generate.

Regarding the Google Colab and the memory consumption, we cannot tell you whether it is normal or not because we do not have any insight on your data. Also, we have never executed TGAN on Google Colab ourselves.

However, yes, TGAN is memory intensive, just like any other GAN or data synthesization tool, and yes, it's normal that the memory consumption increases during sampling, as you are generating and trying to allocate new data that didn't exist before you started sampling.

One option that you have, if you have limited resources and can fit but not sample, is to delete the data variables and collect garbage after fitting, before sampling, to make sure that you have enough space to allocate the new data as it is generated.

@csala csala added the question Further information is requested label Sep 16, 2019
@csala
Copy link
Contributor

csala commented Sep 19, 2019

Hi @qwerkkk I think that you are hitting a different issue. Please have a look at the issue #41 that I just opened.

As you can see there, there is a problem that makes TGAN go into an infinite loop if the number of samples is lower than the batch_size, which defaults to 200.

So, in you case, the snippet of code that you pasted will never end. However, if you change SAMPLES to 200 or higher, the fit process will take a couple of minutes and then the sampling will be almost immediate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants
@csala @tirthajyoti and others