Add ExcelHandler #1950

amontanez24 · 2024-04-22T15:58:36Z

As a user, I'd like an streamlined way to load my data and metadata from files so that I can get right to using SDV.

Parameters

from sdv.io.local import ExcelHandlder

handler = ExcelHandler()

Functionality

Internally, reading should use the read_excel function from pandas. A few things should be hardcoded by default

Pandas should not detect an index column from the data
Pandas should not try to infer datetime formats (or cast them to np.datetime objects). Any datetime column should be left as a dtype 'object'
After reading the data, we should use it to infer a MultiTableMetadata object. (Even if there is only 1 table, we should still create a MultiTableMetadata object.)

Parameters

(required) file_path: A string describing the path of the Excel file to read
sheet_name: A list of strings denoting which sheets in the Excel file to read from
- (default) None: Read all the sheets in the file
- list(str): Read only the sheets listed

Returns

data: A dictionary mapping each table name to a pandas DataFrame with the data. The table name is the same as the sheet name
metadata: A MultiTableMetadata object that describes the data

Functionality
Internally, writing should use the to_excel function from pandas. A few things should be hardcoded by default

Do not write the index column
Each table of the synthetic data should be written as a new sheet within the file. The name of the sheet should be the same as the name of the table
If a sheet already exists with the same name, completely override it

Parameters

(required) synthetic_data: A dictionary that maps each table name to a pandas.DataFrame containing data from it
(required) file_name: The name of the excel file to write
sheet_name_suffix: A string with a suffix to add to each sheet name
- (default) None: The name of the table should be the name of the sheet
- (str) Append this string as the suffix. Eg. suffix of "_synthetic" will make sheets with "TABLENAME_synthetic"
mode: A string signaling which mode of writing to use
- (default) 'w': Write sheets to a new file, clearing any existing file that may exist
- 'a': Append new sheets within the existing file. Note: You cannot append data to existing sheets.

We will add a number of local file handlers for different file types. Therefore the implementation of this class should also add a base class.
Optionally, the init, read and write functions can include a subset of arguments that the corresponding pandas functions use
- if both the read and write for pandas are the same for a parameter (eg. decimal), then put it in the init.
- We can ignore most of these parameters. Only add ones that seem impactful

The text was updated successfully, but these errors were encountered:

amontanez24 added the feature request Request for a new feature label Apr 22, 2024

amontanez24 mentioned this issue Apr 22, 2024

Add CSVHandler #1949

Closed

pvk-developer mentioned this issue Apr 25, 2024

Add ExcelHandler #1962

Merged

pvk-developer closed this as completed in #1962 May 2, 2024

amontanez24 added this to the 1.12.2 milestone May 13, 2024

amontanez24 assigned pvk-developer May 13, 2024

Provide feedback