Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve strategies internals: accumulate check statisics instead of filtering #1625

Open
cosmicBboy opened this issue May 8, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@cosmicBboy
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Currently, the way pandera converts multiple checks into strategies is to use filters in hypothesis. This is inefficient and causes slowdowns and low-entropy samples, see #1579

Describe the solution you'd like

We'd like some way of accumulating check statistics/constraints (the values users provide in checks, e.g. in Check.ge(0), 0 would be the check statistic before defining the element strategy of a particular column in a dataframe. This would obviate the need to use filters.

This might be implemented as a class that maintains the state of all the check statistics and then

from hypothesis.strategies import SearchStrategy

class Strategy():
    def __init__():
        self.check_statistics = {}

    def add(check: pa.Check):
        # translate check statistics into args/kwargs to be fed into
        # hypothesis strategy
        self.check_statistics["arg"] = <value>

    def element() -> SearchStrategy:
        # returns a search strategy for a single element in the
        # dataframe column
        ...

Describe alternatives you've considered

An alternative approach would be some kind of functional API that accumulates the check constraints, ultimately producing a hypothesis SearchStrategy.

Additional context

It would also be nice to come up with a nicer user-facing API to define custom strategies

@cosmicBboy cosmicBboy added the enhancement New feature or request label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant