Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataProcessor never gets assigned a table_name. #1964

Closed
pvk-developer opened this issue Apr 26, 2024 · 0 comments · Fixed by #2024
Closed

DataProcessor never gets assigned a table_name. #1964

pvk-developer opened this issue Apr 26, 2024 · 0 comments · Fixed by #2024
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pvk-developer
Copy link
Member

pvk-developer commented Apr 26, 2024

Error Description

The DataProcessor is designed to log the name of the table it is processing. This feature is particularly useful when dealing with MultiTableSynthesizers. However, currently, the table_name attribute is not being assigned, as the argument is not passed to the BaseSingleTableSynthesizer, to which the DataProcessor instance belongs.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.multi_table import HMASynthesizer
import logging

# Configure logging to see INFO level messages
logging.basicConfig(level=logging.INFO)

# Download demo data
data, metadata = download_demo('multi_table', 'fake_hotels')

# Initialize HMASynthesizer with metadata
hmas = HMASynthesizer(metadata)

# Fit the synthesizer to the data
hmas.fit(data)
image

Expected Behavior

The DataProcessor should correctly log the name of the table it is processing during synthesis, aiding in the debugging process and providing clarity on the synthesis workflow.

Multiple approaches

  1. Enhance BaseSingleTableSynthesizer Interface:

    • Add an additional table_name argument to the BaseSingleTableSynthesizer constructor.
    • Propagate this argument to the DataProcessor instance during synthesizer initialization.
    • Note: This approach requires adjustments to methods like get_parameters.
    • Example implementation can be found in this PR.
  2. Manual Attribute Setting in MultiTable Context:

    • Manually set the table_name attribute while utilizing the synthesizer within a MultiTable context.
    • Access the _data_processor attribute of the synthesizer instance and set table_name manually.
    • This approach offers a workaround without modifying the synthesizer's core interface.
    • Example implementation:
    for table_name in metadata.tables:
        synthesizer_instance = GaussianCopulaSynthesizer(metadata.tables[table_name])
        synthesizer_instance._data_processor.table_name = table_name
@pvk-developer pvk-developer added bug Something isn't working new Automatic label applied to new issues and removed new Automatic label applied to new issues labels Apr 26, 2024
@fealho fealho self-assigned this May 21, 2024
@fealho fealho added this to the 1.13.2 milestone May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants