-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HELP] CTGAN has Reproducibility? #380
Comments
Hi there @limhasic I'm not able to reproduce this. With both 1 and 10 epochs, I was able to generate the same exact data from 2 different CTGAN models.
^ The last line returns |
Is it possible to share the environment? Damn I got false again i have ran on
|
I ran my code in Google Colab: https://colab.research.google.com/
A few things to consider:
|
@limhasic after some more investigation, it turns out we actually don't support reproducibility when fitting a synthesizer. The reproducibility we do support right now is only during sampling (generating 2 samples from the same synthesizer with the same random state). Out of curiosity, what's the motivation to have reproducibility during model fitting itself? |
@srinify I am working on synthetic data. Therefore, there is a lot of interest in evaluation indicators and generation methods between original data and synthetic data. However, when generating data with CTGAN for evaluation, different results were obtained each time. Since the sample did not show reproducibility, I started thinking about seed control for fitting. Since it is still morning, I will test it in the Colab environment you sent. also,
|
Close by checking sampling reproducibility in the latest version of CTGANSynthesizer. |
Reproducibility is visible in simple data, but when the number of columns increases to more than 25, reproducibility is lost. When I wake up, I observe the phenomenon of the generator emitting different data. |
Thanks for sharing context into your use case @limhasic I've opened this feature request to add reproducibility at the model fitting level with your use case: sdv-dev/SDV#2022 DataCebo is a very small team and we use community interest to help us prioritize what to work on! So we hope more people will add their use cases to that issue over time. Closing this issue out as software is working as intended right now. |
Environment details
If you are already running CTGAN, please indicate the following details about the environment in
which you are running it:
Problem description
i tried this thousand times but .. still synthetic_data1 & synthetic_data2 is not equal.
The text was updated successfully, but these errors were encountered: