You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, the way pandera converts multiple checks into strategies is to use filters in hypothesis. This is inefficient and causes slowdowns and low-entropy samples, see #1579
Describe the solution you'd like
We'd like some way of accumulating check statistics/constraints (the values users provide in checks, e.g. in Check.ge(0), 0 would be the check statistic before defining the element strategy of a particular column in a dataframe. This would obviate the need to use filters.
This might be implemented as a class that maintains the state of all the check statistics and then
fromhypothesis.strategiesimportSearchStrategyclassStrategy():
def__init__():
self.check_statistics= {}
defadd(check: pa.Check):
# translate check statistics into args/kwargs to be fed into# hypothesis strategyself.check_statistics["arg"] =<value>defelement() ->SearchStrategy:
# returns a search strategy for a single element in the# dataframe column
...
Describe alternatives you've considered
An alternative approach would be some kind of functional API that accumulates the check constraints, ultimately producing a hypothesis SearchStrategy.
Additional context
It would also be nice to come up with a nicer user-facing API to define custom strategies
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, the way pandera converts multiple checks into strategies is to use
filter
s in hypothesis. This is inefficient and causes slowdowns and low-entropy samples, see #1579Describe the solution you'd like
We'd like some way of accumulating check statistics/constraints (the values users provide in checks, e.g. in
Check.ge(0)
,0
would be the check statistic before defining the element strategy of a particular column in a dataframe. This would obviate the need to usefilter
s.This might be implemented as a class that maintains the state of all the check statistics and then
Describe alternatives you've considered
An alternative approach would be some kind of functional API that accumulates the check constraints, ultimately producing a hypothesis
SearchStrategy
.Additional context
It would also be nice to come up with a nicer user-facing API to define custom strategies
The text was updated successfully, but these errors were encountered: