Releases: unionai-oss/pandera
Releases · unionai-oss/pandera
Release 0.13.0: Option to Report All Errors, "Try Pandera" with Jupyterlite
Highlights ⭐️
- try pandera: add jupyterlite notebooks, add support for py3.7 (#951) @cosmicBboy
- Feature/922 add other ways to report unique errors as an argument (#914) @ng-henry
What's Changed 📈
- Bugfix/910: Support
ordered=True
in yaml schemas (#943) @dstumpy - docs: Fix typo in pyspark.rst (#948) @smoothml
- update rename_columns not to error on {key: key, ...} rename_dict (#941) @hsorsky
- Fix #937: Handle empty MultiIndex validation (#938) @davidandreoletti
- Fix infer_schema for 'empty' dataframes (#944) @tpvasconcelos
- Bugfix/Fix with_pydantic mypy error (#934) @brrm
- Updating Fugue section docs (#927) @kvnkho
New Contributors 🎉
Shout out to all the first-time contributors!
Full Changelog: v0.12.0...v0.13.0
Beta Release: v0.13.0b1
beta release v0.13.0b1
Beta Release v0.13.0b0
beta release 0.13.0b0
Release 0.12.0: Logical Types, New Built-in Check, Bugfixes, Doc Improvements
Release 0.12.0
Highlights ⭐️
This release features:
- Support for Logical Data Types #798: These data types check the actual values of the data container at runtime to support data types like
"URL"
,"Name"
, etc. Check.unique_values_eq
#858: Make sure that all of the values in the data container cover the entire domain of the specified finite set of values.
What's Changed 📈
- Lazy SchemaErrors contain schema name by @fleimgruber in 0d10f39
- Support for logical data types by @jeffzi in #798
- fix for Index of type category fails on validation by @kuutsav in #840
- Add new check unique_values_eq by @johnkangw in #858
- Add from records to panderas dataframe #850 by @borissmidt in #859
- Doc fix: incorrect default value by @plague006 in #862
- Handle cases of reset_index level being None or an empty list by @plague006 in #865
- fixing unique multi index in SchemaModel by @mattB1989 in #870
- Adding description and title to column serializations by @dantheand in #877
- Fix modin and pyspark CI by @jeffzi and @cosmicBboy in #886
- Add pandas_engine.Date by @jeffzi in #887
- fix typo in docs by @jonwiggins in #895
- Update strict type-hints by @the-matt-morris in #898
- fix strategies ci by @cosmicBboy in #899
- Bugfix/882 don't coerce datatypes twice by @ng-henry in #901
- bugfix/904: ignore_na only ignores df records if all are Nan by @cosmicBboy in #909
- fix sphinx docs by @cosmicBboy in #912
ExtensionDtype
path should follow documentation by @pepelovesvim in #915- pin pandas-stubs version, bump mypy by @cosmicBboy in #916
- Docs/867 by @the-matt-morris in #919
New Contributors 🎉
- @johnkangw made their first contribution in #858
- @pepelovesvim made their first contribution in #915
- @dantheand made their first contribution in #877
- @jonwiggins made their first contribution in #895
- @kuutsav made their first contribution in #840
- @borissmidt made their first contribution in #859
- @plague006 made their first contribution in #862
- @mattB1989 made their first contribution in #870
- @the-matt-morris made their first contribution in #898
- @ng-henry made their first contribution in #901
- @pepelovesvim made their first contribution in #915
Full Changelog: v0.11.0...v0.12.0
Beta release v0.12.0b0
beta release v0.12.0b0
0.11.0: Docs support dark mode, custom names and errors for built-in checks, bug fixes
Big shoutout to the contributors on this release!
Highlights
Docs Gets Dark Mode 🌓
Just a little something for folks who prefer dark mode!
Enhancements
- Make DataFrameSchema respect subclassing #830
- Feature: Add support for Generic to SchemaModel #810
- feat: make schema available in SchemaErrors #831
- add support for custom name and error in builtin checks #843
Bugfixes
- Make DataFrameSchema respect subclassing #830
- fix pandas_engine.DateTime.coerce_value not consistent with coerce #827
- fix mypy 9c5eaa3
Documentation Improvements
- Dark docs #841
0.11.0b1: fix mypy error
v0.11.0b1 release v0.11.0b1
0.11.0b0: Docs support dark mode, custom names and errors for built-in checks, bug fixes
0.11.0b0: Docs support dark mode, custom names and errors for built-in checks, bug fixes
Pre-release
Pre-release
v0.11.0b0 beta release for 0.11.0
0.10.1: Pyspark documentation fixes
v0.10.1 release 0.10.1
0.10.0: Pyspark.pandas Support, PydanticModel datatype, Performance Improvements
Highlights
pandera
now supports pyspark dataframe validation via pyspark.pandas
The pandera koalas integration has now been deprecated
You can now pip install pandera[pyspark]
and validate pyspark.pandas
dataframes:
import pyspark.pandas as ps
import pandas as pd
import pandera as pa
from pandera.typing.pyspark import DataFrame, Series
class Schema(pa.SchemaModel):
state: Series[str]
city: Series[str]
price: Series[int] = pa.Field(in_range={"min_value": 5, "max_value": 20})
# create a pyspark.pandas dataframe that's validated on object initialization
df = DataFrame[Schema](
{
'state': ['FL','FL','FL','CA','CA','CA'],
'city': [
'Orlando',
'Miami',
'Tampa',
'San Francisco',
'Los Angeles',
'San Diego',
],
'price': [8, 12, 10, 16, 20, 18],
}
)
print(df)
PydanticModel
DataType Enables Row-wise Validation with a pydantic
model
Pandera now supports row-wise validation by applying a pydantic model as a dataframe-level dtype:
from pydantic import BaseModel
import pandera as pa
class Record(BaseModel):
name: str
xcoord: str
ycoord: int
import pandas as pd
from pandera.engines.pandas_engine import PydanticModel
class PydanticSchema(pa.SchemaModel):
"""Pandera schema using the pydantic model."""
class Config:
"""Config with dataframe-level data type."""
dtype = PydanticModel(Record)
coerce = True # this is required, otherwise a SchemaInitError is raised
Improved conda installation experience
Before this release there were only two conda packages: one to install pandera-core
and another to install pandera
(which would install all extras functionality)
The conda packaging now supports finer-grained control:
conda install -c conda-forge pandera-hypotheses # hypothesis checks
conda install -c conda-forge pandera-io # yaml/script schema io utilities
conda install -c conda-forge pandera-strategies # data synthesis strategies
conda install -c conda-forge pandera-mypy # enable static type-linting of pandas
conda install -c conda-forge pandera-fastapi # fastapi integration
conda install -c conda-forge pandera-dask # validate dask dataframes
conda install -c conda-forge pandera-pyspark # validate pyspark dataframes
conda install -c conda-forge pandera-modin # validate modin dataframes
conda install -c conda-forge pandera-modin-ray # validate modin dataframes with ray
conda install -c conda-forge pandera-modin-dask # validate modin dataframes with dask
Enhancements
- Add option to disallow duplicate column names #758
- Make SchemaModel use class name, define own config #761
- implement coercion-on-initialization for DataFrame[SchemaModel] types #772
- Update filtering columns for performance reasons. #777
- implement pydantic model data type #779
- make finding coerce failure cases faster #792
- add pyspark support, deprecate koalas #793
- Add overloads to schema.to_yaml #790
- Add overloads to infer_schema #789
Bugfixes
Deprecations
Docs Improvements
- add imports to fastapi docs
- add documentation for pandas_engine.DateTime #780
- update docs for 0.10.0 #795
- update docs with fastapi #804