Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError("Subscripted generics cannot be used with class and instance checks") #1500

Open
cjthorley opened this issue Feb 20, 2024 · 8 comments
Labels
question Further information is requested

Comments

@cjthorley
Copy link

cjthorley commented Feb 20, 2024

Question about pandera

I am running the code below on two environments:

ENV1 = databricks spark cluster driver node (same pandera, pandas versions)
ENV2 = local standard VENV(same pandera, pandas versions)

Both environments have the same pandas and pandera versions.

The dataframe validates on local (ie ENV2) with no schema errors but I get "TypeError("Subscripted generics cannot be used with class and instance checks")" on the databricks spark cluster driver node (ie ENV2).

image

Why am I getting this failure and what does it mean? It is specific to the pa.Check.isin.

import pandas as pd
import pandera as pa
from IPython.display import HTML

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'], 'Age': [20, 21, 19, 18]}
data_df = pd.DataFrame(data)

list_age = [20, 21, 19, 18]

# define pandera custom schema
validation_schema = pa.DataFrameSchema(
    columns = {
        'Name': pa.Column(str),
        'Age': pa.Column(int, pa.Check.isin(list_age), nullable=False)
    }
)

try:
    validated_df = validation_schema(data_df, lazy=True)
    print('VALID')
except pa.errors.SchemaErrors as err:
    err_sum_df = err.failure_cases[['column', 'check', 'failure_case']].value_counts().reset_index(name='no_rows_with_errors')
    err_sum_df = err_sum_df.rename(columns={'column': 'affected_column', 'check': 'error_failure'})
    err_sum_html = err_sum_df.to_html()
    display(HTML(err_sum_html))
@cjthorley cjthorley added the question Further information is requested label Feb 20, 2024
smackesey added a commit to dagster-io/dagster that referenced this issue Feb 20, 2024
## Summary & Motivation

Some dependency of pandera is causing breakage of one of our
docs_snippets tests. The same issue is afflicting other users of
pandera: unionai-oss/pandera#1500

Since it is not clear what nth-order dependency of pandera needs to be
pinned, just skip this test and wait for the upstream pandera fix.
@tf75
Copy link

tf75 commented Feb 20, 2024

We are also getting this issue, please see below:

class ScanOutput(pa.DataFrameModel):
    rating: Series[int] = pa.Field(ge=0, le=4)

def validate_df_against_schema(df: pd.DataFrame, schema: pa.DataFrameModel):
    null_schema = pa.DataFrameSchema({
    "rating": pa.Column(int, nullable=False)
    })

    null_schema.validate(df) - this does not fail
    validation = schema.validate(df) - this causes the TypeError("Subscripted generics cannot be used with class and instance checks")"
    
validate_df_against_schema(df, schema=ScanOutput)    

@PierreC1024
Copy link

PierreC1024 commented Feb 20, 2024

I encountered a similar problem, which appeared to stem from the multimethod package.

Resolving the issue was achieved by downgrading the package to version 1.11.

@ushakrishna2k
Copy link

I also encountered the same issue. I am using python 3.9.5 in databricks runtime 12.2 (pandera 0.18.0)
It was working fine till yesterday.
When I run it using runtime 10.4 (python version 3.8.1), it is working fine today also.

@ushakrishna2k
Copy link

@PierreC1024 , I have multimethod version 1.11.1 . Did you downgrade it to 1.11.0?

@PierreC1024
Copy link

PierreC1024 commented Feb 20, 2024 via email

@cjthorley
Copy link
Author

I can confirm rolling back to multimethod==1.11 before the feb 19th update has stopped the "TypeError("Subscripted generics cannot be used with class and instance checks")" schema error.

I am so pleased because I love the pandera library.

@tf75
Copy link

tf75 commented Feb 21, 2024

Thanks, everyone, this solved my issue on AWS Glue, just versioned
multimethod==1.11 using Terraform when deploying (https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html - if anyone gets stuck doing it)

@ushakrishna2k
Copy link

I was able to fix the issue in databricks by rolling back the multimodel library to 1.11
Thank you very much everyone!!!

PedramNavid pushed a commit to dagster-io/dagster that referenced this issue Mar 28, 2024
## Summary & Motivation

Some dependency of pandera is causing breakage of one of our
docs_snippets tests. The same issue is afflicting other users of
pandera: unionai-oss/pandera#1500

Since it is not clear what nth-order dependency of pandera needs to be
pinned, just skip this test and wait for the upstream pandera fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants