Lossvalues are good, but the quality of the synthetic data is bad... How?? #2010
Labels
question
General question about the software
resolution:cannot replicate
The problem cannot be replicated
I am using the CTGAN Model for my masterthesis, i want to generate synthetic data using dataset UNSW_NB15 (intrusion detection system dataset, zo it contains attacks). I want to generate synthetic data of 'Generic attacks', which counts 58871 real samples to train with.
I have trained my CTGAN model with the following code:
lossvalues:
Those are my lossvalues for my generator and discriminator, if you look at the discussion #980 , you would expect really good synthetic data generated by the CTGAN Model.
But if I use the metrics from SDV, comparing the real data with the synthetic data, the scores from the metrics are bad:
KS_complement:
TV_complement:
The visual distributions of each feature are also bad.
Can you help me? what did I wrong? Why have the fake samples bad quality?
PS. If I use SMOTE, the scores of the SDV metrics are better. But I have to use a GAN model...
The text was updated successfully, but these errors were encountered: