v1.4.0 - 2023-08-23
This release makes multiple improvements to the metadata. Both the single and multi table metadata classes now have a validate_data
method. This method runs checks to validate the data against the current specifications in the metadata. The SingleTableMetadata.visualize
is also improved. The sequence index is now shown in the same section as the sequence key. It also now shows all key and index information (eg. sequence key, primary key, sequence index) in one section.
The CTGANSynthesizer
has been made more efficient in the following ways:
- Boolean columns are now being skipped during
preprocess
like categorial columns are. - It is possible to apply other transformations to categorical columns and have
CTGAN
skip the one-hot encoding step.
Additional changes include that the columns labeled with the sdtype id
will now go through the IDGenerator
transformer by default and constraint transformations that were being overwritten during sampling will now be respected.
New Features
- Add validate_data method to Metadata - Issue #1518 by @fealho
- Use IDGenerator for ID columns - Issue #1519 by @frances-h
- Metadata visualization for sequential data: Only create 2 sections - Issue #1543 by @frances-h
Bugs Fixed
- Inefficient CTGAN modeling when adding categorical transformers - Issue #1450 by @fealho
- CTGANSynthesizer is assigning LabelEncoder to boolean columns (instead of None) - Issue #1530 by @fealho
- Metadata visualization for sequential data: Missing sequence index - Issue #1542 by @frances-h
- Constraint outputs are being overwritten in DataProcessor.reverse_transform - Issue #1551 by @amontanez24