Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby with MultiIndex not annotated correctly #703

Open
behrenhoff opened this issue May 22, 2023 · 5 comments
Open

groupby with MultiIndex not annotated correctly #703

behrenhoff opened this issue May 22, 2023 · 5 comments

Comments

@behrenhoff
Copy link

behrenhoff commented May 22, 2023

Consider a DF or Series with a MultiIndex M1, M2, M3.

When grouping as in
grp = df.groupby(level=["M1", "M3"])

You will get an error like

Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
  Type "list[str]" cannot be assigned to type "Level | None"
    "list[str]" is incompatible with protocol "Hashable"

This is due to the annotation of groupby as

        by: IntervalIndex[IntervalT],
        axis: AxisIndex = ...,
        level: Level | None = ...,

The type "Level" is an alias for "Hashable" and allows only one level. I think it should be Hashable|Sequence[Hashable] or, as pandas calls it, IndexLabel (+ the | None)

I've identified a few places (see attachment). However, I am not sure the patch is correct/complete. It is interesting to see that stack is annotated with level: Level | list[Level] = ... while unstack has level: Level = ... while IMHO both should be IndexLabel.

Also, I am not sure the groupby annotations are completey correct because I think you cannot use df.groupby(by=not_None, level=other_not_None) at the same time, yet the annotations seems to allow it.

Minimum example code:

import pandas as pd

df = pd.DataFrame(
    data={"a": [0]},
    index=pd.MultiIndex.from_product(
        [pd.Index(["i1a"], name="i1"), pd.Index(["i2a"], name="i2")]
    ),
)
grp = df.groupby(level=["i1", "i2"])

pyright outputs:

pyright pd-stub-test.py
pd-stub-test.py
  pd-stub-test.py:9:24 - error: Argument of type "list[str]" cannot be assigned to parameter "level" of type "Level | None" in function "groupby"
    Type "list[str]" cannot be assigned to type "Level | None"
      "list[str]" is incompatible with protocol "Hashable"
        "__hash__" is an incompatible type
          Type "None" cannot be assigned to type "(self: list[str]) -> int"
      "list[str]" is incompatible with "int"
      Type cannot be assigned to type "None" (reportGeneralTypeIssues)
1 error, 0 warnings, 0 informations 

0001-MultiIndex-in-groupby.patch.gz

(Warning: I haven't tested/verified the patch!)

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 22, 2023

Can you indicate which version of pandas-stubs you are using?

@behrenhoff
Copy link
Author

behrenhoff commented May 22, 2023

I am using pandas-stubs==1.5.3.230321 but you can find the issue in the main branch (pulled appox 1h ago) as well. The attached patch is against the current main.
See for example https://github.com/pandas-dev/pandas-stubs/blob/main/pandas-stubs/core/frame.pyi line 1077.

But as said, I am not sure the patch is correct - I am not exactly sure if/when label/by combinations are allowed in df.groupby. But you can definitely have multiple levels in level=.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 22, 2023

OK, thanks. Can you create a PR (with tests) with your change?

I see now that the docs say that you can't use by and level, so I think the proper change here is to remove level from all the existing overloads, and then create a new overload that has level but not by (and probably a * to indicate that keyword arguments are required if you use level)

@behrenhoff
Copy link
Author

behrenhoff commented May 22, 2023

Sorry, I misread. Yes, I can. Probably tomorrow.

@janrito
Copy link

janrito commented Sep 6, 2023

This happens with a Series as well:

grp = series.groupby(level=["l1", "l2"])
...
No overload variant of "groupby" of "Series" matches argument type "list[str]"  [call-overload]

due to this

level: Level | None = ...,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants