Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we apply NewRowSynthesis by default? #188

Open
npatki opened this issue Jan 20, 2023 · 0 comments
Open

Should we apply NewRowSynthesis by default? #188

npatki opened this issue Jan 20, 2023 · 0 comments
Labels
feature request Request for a new feature

Comments

@npatki
Copy link

npatki commented Jan 20, 2023

Version: 0.8.0 (in developement)

Problem Description

Currently, we are applying the SDMetrics NewRowSynthesis by default in the benchmark_single_table script. The motivation was to capture whether new synthetic data is being created at all -- or whether the rows are being re-used as in DataIdentity.

But in practice, the NewRowSynthesis metric may not be too robust. It may error out on a large # of columns, and leading to generally longer benchmarking runs.

Expected behavior

We should consider the behavior of the default NewRowSynthesis metric that we apply:

  1. We could disable it. That is, by default set sdmetrics=None
  2. We could fix the underlying issues with it in the SDMetrics library. Perhaps that can achieved by subsetting or some other means.
@npatki npatki added the feature request Request for a new feature label Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

1 participant