Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Overhaul parametric test implementations and update Hypothesis to latest version #16062

Merged
merged 4 commits into from May 13, 2024

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented May 5, 2024

The newest Hypothesis version is stricter about randomness - using Python's built-in random module in strategies is not allowed (for good reason), and there is additional detection for tests that do not function properly.

This is great, but it prohibited us from upgrading. A thorough revision of the code was needed to adhere to the new requirements. Some minor user-facing changes were necessary, but since this concerns code that is only used in test suites, I think we can go ahead with these changes without a major version increase. More details below.

Changes

  • Randomness is now restricted to hypothesis-controlled randomness within strategies. This affects the following functions:

    • column no longer selects a dtype upon creation. This now happens within the dataframes strategy. Use of column passed to dataframes is unaltered, but users who used the column outside of this intended usage will notice this as a breaking change.
    • columns had built-in functionality to randomly determine a number of columns. This is no longer possible. The function has been deprecated, with the recommendation to build your own columns using column in conjunction with a list comprehension. It continues to function for now by leveraging .example().
    • create_list_strategy needs to determine the exact inner data type to select an appropriate strategy. This is no longer possible in the same way. It has been deprecated. Users can use the lists strategy to do something similar, but they should supply a fully instantiated data type or defaults will be used - this avoids the randomness. The function continues to work for now by leveraging .example().
  • New strategy dtypes has been added which generates a random Polars data type.

  • Various improvements to the data generation strategies

    • Fixed an issue with the decimal strategy.
    • Extended the range of timedeltas. This exposed some bugs for parsing millisecond durations - so the full range for those will be enabled later.
    • Categoricals now have a minimum length of 1. Hypothesis shrinks to empty strings which leads to reproducible examples that are hard to read.
  • Update hypothesis to the latest version.

  • Change the default row/column limit from 10/8 to 5/5. This should be enough for parametrized testing in general. It can be overwritten by the user if they require more rows/cols.

  • Various refactorings / cleanups to make everything work nicely.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels May 5, 2024
Copy link

codecov bot commented May 5, 2024

Codecov Report

Attention: Patch coverage is 81.31579% with 71 lines in your changes are missing coverage. Please review.

Project coverage is 80.99%. Comparing base (0b66308) to head (78349e8).

Files Patch % Lines
...lars/polars/testing/parametric/strategies/dtype.py 70.33% 26 Missing and 9 partials ⚠️
...ars/polars/testing/parametric/strategies/legacy.py 56.75% 11 Missing and 5 partials ⚠️
...olars/polars/testing/parametric/strategies/data.py 89.53% 6 Missing and 3 partials ⚠️
...olars/polars/testing/parametric/strategies/core.py 91.91% 4 Missing and 4 partials ⚠️
py-polars/polars/testing/parametric/__init__.py 40.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #16062   +/-   ##
=======================================
  Coverage   80.99%   80.99%           
=======================================
  Files        1387     1392    +5     
  Lines      178832   178884   +52     
  Branches     2877     2893   +16     
=======================================
+ Hits       144839   144887   +48     
- Misses      33500    33501    +1     
- Partials      493      496    +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexander-beedie
Copy link
Collaborator

Nice; happy to see these methods getting some love 😎👍

@stinodego stinodego force-pushed the hyp-strats branch 2 times, most recently from 4025ad3 to cef503e Compare May 8, 2024 10:30
@stinodego stinodego marked this pull request as ready for review May 9, 2024 14:04
@stinodego
Copy link
Member Author

This got a little out of hand 😅 but I think we have something clean now that works well. @alexander-beedie would you mind taking a look at this if you have the time?

@alexander-beedie
Copy link
Collaborator

This got a little out of hand 😅 but I think we have something clean now that works well. @alexander-beedie would you mind taking a look at this if you have the time?

If you can wait until Sunday then I can over it thoroughly, with pleasure ;))

@stinodego
Copy link
Member Author

@alexander-beedie I'm going to go ahead with this one, there's a few more things I want to build on top of this. If you have any comments I'd be happy to address them in a follow-up!

@stinodego stinodego merged commit dbfc6b2 into main May 13, 2024
14 checks passed
@stinodego stinodego deleted the hyp-strats branch May 13, 2024 04:51
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented May 13, 2024

@alexander-beedie I'm going to go ahead with this one, there's a few more things I want to build on top of this. If you have any comments I'd be happy to address them in a follow-up!

@stinodego: Been working my way through it slowly, heh; looks good to me so far, and I'm really happy to see the foundations being built out and incorporated in more places! Will be poking at some of the updated API design to give it a proper test shortly ✌️

@c-peters c-peters added the accepted Ready for implementation label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants