Skip to content

v1.4.0 - 2023-08-23

Compare
Choose a tag to compare
@amontanez24 amontanez24 released this 23 Aug 19:44

This release makes multiple improvements to the metadata. Both the single and multi table metadata classes now have a validate_data method. This method runs checks to validate the data against the current specifications in the metadata. The SingleTableMetadata.visualize is also improved. The sequence index is now shown in the same section as the sequence key. It also now shows all key and index information (eg. sequence key, primary key, sequence index) in one section.

The CTGANSynthesizer has been made more efficient in the following ways:

  1. Boolean columns are now being skipped during preprocess like categorial columns are.
  2. It is possible to apply other transformations to categorical columns and have CTGAN skip the one-hot encoding step.

Additional changes include that the columns labeled with the sdtype id will now go through the IDGenerator transformer by default and constraint transformations that were being overwritten during sampling will now be respected.

New Features

  • Add validate_data method to Metadata - Issue #1518 by @fealho
  • Use IDGenerator for ID columns - Issue #1519 by @frances-h
  • Metadata visualization for sequential data: Only create 2 sections - Issue #1543 by @frances-h

Bugs Fixed

  • Inefficient CTGAN modeling when adding categorical transformers - Issue #1450 by @fealho
  • CTGANSynthesizer is assigning LabelEncoder to boolean columns (instead of None) - Issue #1530 by @fealho
  • Metadata visualization for sequential data: Missing sequence index - Issue #1542 by @frances-h
  • Constraint outputs are being overwritten in DataProcessor.reverse_transform - Issue #1551 by @amontanez24