Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Expr.assert_all_true expression, which preserves current values but raises error at runtime if any value is False. #16120

Open
cosmicBboy opened this issue May 8, 2024 · 0 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@cosmicBboy
Copy link

Description

Hi,

I'm the maintainer of pandera, and we recently released support for validating polars LazyFrames and DataFrames: https://pandera.readthedocs.io/en/latest/polars.html

One caveat is that there's different behavior between LazyFrame and DataFrame validation: with the former, it only validates schema-level properties (columns and datatypes) because we don't want to materialize any data on a schema.validate call (e.g. here).

@register_builtin_check(
    aliases=["eq"],
    error="equal_to({value})",
)
def equal_to(data: PolarsData, value: Any) -> pl.LazyFrame:
    """Ensure all elements of a data container equal a certain value.

    :param data: NamedTuple PolarsData contains the dataframe and column name for the check. The keys
                to access the dataframe is "dataframe" and column name using "key".
    :param value: values in this polars data structure must be
        equal to this value.
    """
    return data.lazyframe.select(pl.col(data.key).eq(value))

We designed this to be forward-looking, and the expected signature of a data-level check should return LazyFrames and leverage the lazy API as much as possible. I think a great next iteration for this integration would be great to be able to do data-level validations that preserves the values of the expression but raises an error at runtime if any value is False:

def equal_to(data: PolarsData, value: Any) -> pl.LazyFrame:
    return data.lazyframe.select(pl.col(data.key).eq(value).assert_all_true())

Context: this was suggested by @MarcoGorelli here: https://discord.com/channels/897120336003334214/1160076741394563102/1237738610837557289

@cosmicBboy cosmicBboy added the enhancement New feature or an improvement of an existing feature label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant