Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type checking with converters= on read_excel #849

Open
oliverbeagley-pgg opened this issue Jan 11, 2024 · 4 comments
Open

Type checking with converters= on read_excel #849

oliverbeagley-pgg opened this issue Jan 11, 2024 · 4 comments
Labels
good first issue IO Excel read_excel, to_excel

Comments

@oliverbeagley-pgg
Copy link

Describe the bug
Pyright doesn't seem to like something about using converters=... with read_excel

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.
from functools import partial

import pandas as pd

df_1 = pd.read_excel(
    "foo.csv",
    converters={"field_1": partial(pd.to_datetime, errors="coerce")},
)
df_2 = pd.read_excel(
    "foo.csv",
    sheet_name="sheet",
    converters={"field_1": partial(pd.to_datetime, errors="coerce")},
)

reveal_type(df_1)
reveal_type(df_2)
  1. Indicate which type checker you are using (mypy or pyright).: Both
  2. Show the error message received from that type checker while checking your example.

mypy:

$ mypy t.py
t.py:15: note: Revealed type is "pandas.core.frame.DataFrame"
t.py:16: note: Revealed type is "pandas.core.frame.DataFrame"
Success: no issues found in 1 source file

pyright:

/home/user/t.py
  /home/user/t.py:7:16 - error: Argument of type "dict[str, partial[Timestamp]]" cannot be assigned to parameter "converters" of type "dict[int | str, (object) -> object] | None" in function "read_excel"
    Type "dict[str, partial[Timestamp]]" cannot be assigned to type "dict[int | str, (object) -> object] | None"
      "dict[str, partial[Timestamp]]" is incompatible with "dict[int | str, (object) -> object]"
        Type parameter "_KT@dict" is invariant, but "str" is not the same as "int | str"
        Type parameter "_VT@dict" is invariant, but "partial[Timestamp]" is not the same as "(object) -> object"
      "dict[str, partial[Timestamp]]" is incompatible with "None" (reportGeneralTypeIssues)
  /home/user/t.py:9:8 - error: No overloads for "read_excel" match the provided arguments (reportGeneralTypeIssues)
  /home/user/t.py:12:16 - error: Argument of type "dict[str, partial[Timestamp]]" cannot be assigned to parameter "converters" of type "dict[int | str, (object) -> object] | None" in function "read_excel"
    Type "dict[str, partial[Timestamp]]" cannot be assigned to type "dict[int | str, (object) -> object] | None"
      "dict[str, partial[Timestamp]]" is incompatible with "dict[int | str, (object) -> object]"
        Type parameter "_KT@dict" is invariant, but "str" is not the same as "int | str"
        Type parameter "_VT@dict" is invariant, but "partial[Timestamp]" is not the same as "(object) -> object"
      "dict[str, partial[Timestamp]]" is incompatible with "None" (reportGeneralTypeIssues)
  /home/user/t.py:15:13 - information: Type of "df_1" is "DataFrame"
  /home/user/t.py:16:13 - information: Type of "df_2" is "Unknown"

Please complete the following information:

  • WSL - ubuntu 22.04
  • python version 3.10.12
  • version of type checker: pyright==1.1.345, mypy==1.8.0
  • version of installed pandas-stubs: pandas_stubs==1.4.4.220919 (also present on latest pandas_stubs version for most recent pandas version)

Additional context
Not sure if its something with the converters type or read_excel, but using a lambda instead of partial results in both type checkers not liking it even though read_csv has no issue with either ie

converters={"field": lambda s: pd.to_datetime(s, errors="coerce")}
#vs
converters={"field": partial(pd.to_datetime, errors="coerce")}
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 11, 2024

Thanks for the report. I think this is a pyright issue, because the following code reports a similar error:

from functools import partial
from typing import Callable

basetwo = partial(int, base=2)


def foo(func: Callable[[object], object], value: str):
    print(func(value))


foo(basetwo, "1011")

Created an issue for pyright: microsoft/pyright#6961

Will leave this open pending feedback there.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 11, 2024

Based on the discussion with pyright, it looks like the proper fix is to change Callable[[object], object] to Callable[[Any], Any] in pandas-stubs/io/excel/_base.pyi for the converters argument.

PR with tests welcome.

@Dr-Irv Dr-Irv added good first issue IO Excel read_excel, to_excel labels Jan 11, 2024
@oliverbeagley-pgg
Copy link
Author

I can look to make that change, would it also be worth looking at the other io methods that use converters and see if they need to be changed also?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 12, 2024

I can look to make that change, would it also be worth looking at the other io methods that use converters and see if they need to be changed also?

Sure, but you'll have to see if the API is consistent across the other API's. I did see that we have ConvertersArg defined in _typing.pyi, so it might be better to change that and use it throughout, but only if the API is consistent (i.e., the args passed to converters are similar across the different io methods)

Also, now that I think about it, for the Excel case, the right argument type might be Callable[[Any], Scalar] because the function should return something that would normally be stored inside of a DataFrame, i.e., a Scalar . Not sure if that will work with partial, and, if not, then Callable[[Any], Any] is probably the way to go. Feel free to open a PR either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

2 participants