Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow constraints in PARSynthesizer (for all context cols, or all non-context columns) #1936

Open
npatki opened this issue Apr 18, 2024 · 4 comments · May be fixed by #2044
Open

Allow constraints in PARSynthesizer (for all context cols, or all non-context columns) #1936

npatki opened this issue Apr 18, 2024 · 4 comments · May be fixed by #2044
Labels
data:sequential Related to timeseries datasets feature request Request for a new feature

Comments

@npatki
Copy link
Contributor

npatki commented Apr 18, 2024

Problem Description

There is an increasing user demand for applying constraints to the PARSynthesizer -- see #570. One reason why we have been unable to support constraints is that the PARSynthesizer has unchanging, context_columns while other synthesizers do not.

To at least address some constraint uses, it would be good to allow constraints if:

  • All of the involved columns are NOT context columns or
  • All of the involved columns ARE context columns

i.e. mixing-and-matching columns from context and non-context are not allowed within a constraint

Another reason is that PAR (and sequence-based synthesizers) cannot easily accommodate reject sampling based approaches. So it would only be possible to add non-overlapping constraints.

Expected behavior

Enable the PARSynthesizer to be used with constraints, in the same manner that any other synthesizer can be used. But only for select cases:

  1. You can only use a constraint for columns that are all contextual, or all non-contextual
  2. You cannot supply multiple overlapping constraints (that is 2 constraints that each independently act on the same column)
  3. You cannot use it for custom constraints (for now, as it is complicated with a custom constraint to determine whether points 1 and 2 can be met).

If a user passes in constraints that violate these assumptions, we should throw a SynthesizerInputError.

my_constraint = {
  'constraint_class': 'Inequality',
  'constraint_parameters': {
    'high_column_name': 'A',
    'low_column_name': 'B'
  }
}

synthesizer = PARSynthesizer(metadata, context_columns=['A'])
synthesizer.add_constraints([my_constraint])
SynthesizerInputError: The PARSynthesizer cannot accommodate constraints with a mix of context
and non-context columns.

For overlapping constraints:

SynthesizerInputError: The PARSynthesizer cannot accommodate multiple constraints that overlap on 
the same columns.

For custom constraints:

SynthesizerInputError: The PARSynthesizer cannot accommodate custom constraints.
@npatki npatki added the feature request Request for a new feature label Apr 18, 2024
@Ng-ms
Copy link

Ng-ms commented Apr 25, 2024

is there any update on this feature request ? it would be EXTRAMLY beneficial if it is integrated on the PARSynthesizer

@npatki
Copy link
Contributor Author

npatki commented Apr 25, 2024

Hi @Ng-ms, I understand this is an important issue to you. We (at SDV) are a small team working to support many users and features. We appreciate your patience as we prioritize issues and bug fixes. Thanks.

If you have any urgent request that require more investment from the team, perhaps you may want to Contact Us about starting a business relationship.

@naiomi-mo
Copy link

Hi @npatki Thank you for the amazing work in this library, i was wondering if there is any workaround method for the context_columns in the PAR , all of my date columns are context columns , i just need to make some larger than some, and the other equal to each other. thank you

@npatki
Copy link
Contributor Author

npatki commented May 3, 2024

Hi @naiomi-mo, no problem. One simple workaround for now might be that instead of modeling both datetime columns, you can preprocess your data to model: (a) only the lower date column, and then (b) the difference. That is to say, get rid of the higher datetime column. The SDV will then synthesize lower date + the difference column, which you can then use to reconstruct the higher date column. Hopefully that makes sense.

@srinify srinify added the data:sequential Related to timeseries datasets label Jun 3, 2024
@lajohn4747 lajohn4747 linked a pull request Jun 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:sequential Related to timeseries datasets feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants