Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion from 1 pydantic dtype dataframe to another fails when strict = 'filter' #1511

Open
2 of 3 tasks
Daniel-Vetter-Coverwhale opened this issue Feb 22, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@Daniel-Vetter-Coverwhale

Describe the bug
Converting from one pydantic dtype pandera DataFrame to another without strict = 'filter' drops columns, but when strict = 'filter' the conversion errors.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas
import pandera
import pydantic
from pandera.engines.pandas_engine import PydanticModel
from polyfactory.factories.pydantic_factory import ModelFactory


class Test1(pydantic.BaseModel):
    a: str
    b: int
    c: str


class Test2(pydantic.BaseModel):
    a: str
    b: int


class TestDF1(pandera.DataFrameModel):
    class Config:
        dtype = PydanticModel(Test1)
        coerce = True


if __name__ == "__main__":
    factory = ModelFactory.create_factory(Test1)

    t = pandas.DataFrame([factory.build().model_dump() for _ in range(10)])

    tdf = t.pipe(pandera.typing.DataFrame[TestDF1])
    print(tdf.columns)
    try:

        class TestDF2(pandera.DataFrameModel):
            class Config:
                dtype = PydanticModel(Test2)
                coerce = True
                strict = "filter"

        tdf2 = tdf.pipe(pandera.typing.DataFrame[TestDF2])
    except Exception as e:
        print(e)

    class TestDF2(pandera.DataFrameModel):
        class Config:
            dtype = PydanticModel(Test2)
            coerce = True

    tdf2 = tdf.pipe(pandera.typing.DataFrame[TestDF2])
    print(tdf2.columns)

yields

Index(['a', 'b', 'c'], dtype='object')
Error while coercing 'TestDF2' to type <class '__main__.Test2'>: Could not coerce <class 'pandas.core.frame.DataFrame'> data_container into type <class '__main__.Test2'>
Empty DataFrame
Columns: [column, index, failure_case]
Index: []
Index(['a', 'b'], dtype='object')

Expected behavior

I expected tdf2 to be successfully created with columns a and b (dropping column c), but only when strict = 'filter', and not without it.

Desktop (please complete the following information):

  • OS: [e.g. iOS] macOS
  • Version Sonoma 14.3.1 (23D60)

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Pandera rocks. I am so happy to have something to validate my dataframes in a clean way. I really love the ability to make a dataframe based on Pydantic Models. It gives a lot reusability to the types, letting them pull triple duty around sqlmodel, fastapi, and dataframe all from one convenient definition. It's awesome that pandera has the strict = 'filter' to drop extra columns, and combining these two things gives a nice way to do automatic conversions. It's counterintuitive to me that it would drop the extra columns when filter is not set, though maybe I'm not understanding something.

@Daniel-Vetter-Coverwhale Daniel-Vetter-Coverwhale added the bug Something isn't working label Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant