Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ExtensionArray _accumulate and _reduce #850

Closed
MichaelTiemannOSC opened this issue Jan 13, 2024 · 7 comments · Fixed by #923
Closed

Implement ExtensionArray _accumulate and _reduce #850

MichaelTiemannOSC opened this issue Jan 13, 2024 · 7 comments · Fixed by #923
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@MichaelTiemannOSC
Copy link

Describe the bug
The stubs for ExtensionArray (in pandas-stubs/core/arrays/base.pyi) does not provide type signatures for _accumulate and _reduce. To properly add typing information to the Pint-Pandas project, these need to be defined.

To Reproduce

  1. Minimal Runnable Example:
import numpy as np
import pandas as pd
from typing import reveal_type
from pandas.arrays import IntegerArray
from pandas.api.extensions import ExtensionArray

_data: ExtensionArray = IntegerArray(values=np.array([1, 2, 3], dtype=int), mask=np.array([True, True, True], dtype=bool))
if isinstance(_data, ExtensionArray):
    reveal_type(_data)
    reveal_type(_data._accumulate)
    reveal_type(_data._reduce)
  1. Using mypy
  2. Show the error message received from that type checker while checking your example.
(pint-dev) % pre-commit run mypy --files foo.py
mypy.....................................................................Failed
- hook id: mypy
- duration: 1.41s
- exit code: 1

foo.py:9: note: Revealed type is "pandas.core.arrays.base.ExtensionArray"
foo.py:10: error: "ExtensionArray" has no attribute "_accumulate"  [attr-defined]
foo.py:10: note: Revealed type is "Any"
foo.py:11: error: "ExtensionArray" has no attribute "_reduce"  [attr-defined]
foo.py:11: note: Revealed type is "Any"
Found 2 errors in 1 file (checked 1 source file)

Note that running the script in python works, because it uses actual Pandas code, not Pandas-Stubs:

(pint-dev) % python foo.py
Runtime type is 'IntegerArray'
Runtime type is 'method'
Runtime type is 'method'

Please complete the following information:

  • OS: Mac OS
  • OS Version 14.1.2
  • python 3.11.4
  • mypy 1.8.0
  • version of installed pandas-stubs: 2.1.4.231227

Additional context
Add any other context about the problem here.

@twoertwein
Copy link
Member

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Jan 15, 2024

While they look very much private, they are documented: https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._accumulate.html https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.ExtensionArray._reduce.html and could therefore probably be added to pandas-stubs? @Dr-Irv

Agreed. PR with tests welcome

@Dr-Irv Dr-Irv added good first issue ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 15, 2024
@MichaelTiemannOSC
Copy link
Author

I'm glad to see its a simple case, but alas, it's just beyond my level of python and mypy type algebras.

@mutricyl
Copy link
Contributor

I can not see any ExtensionArray specific test. @Dr-Irv can you advise on where they should be located ?

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 30, 2024

I can not see any ExtensionArray specific test. @Dr-Irv can you advise on where they should be located ?

I would add something to test_extension.py, but you can just add a test that asserts the types of _reduce() and _accumulate() to be Callable with appropriate arguments and return types.

@mutricyl
Copy link
Contributor

I have added in core/arrays/base.pyi

    def _reduce(self, name: str, *, skipna: bool=..., keepdims: bool=... , **kwargs) -> Scalar: ...
    def _accumulate(self, name: str, *, skipna: bool=..., **kwargs) -> Self: ...

But now I am facing issues with tests:

  • I am struggling assert-type of Callable with multiple arguments (including optional ones and kwargs). I can not find exemples where this is tested. mypy and pyright looks like to deals with arguments in slightly different ways.
    • mypy: error: Expression is of type "Callable[[str, DefaultNamedArg(bool, 'skipna'), DefaultNamedArg(bool, 'keepdims'), KwArg(Any)], str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex]", not "Callable[[], str | bytes | date | datetime | timedelta | datetime64 | timedelta64 | bool | int | float | Timestamp | Timedelta | complex]" [assert-type]
  • pyright also complains about
    def _reduce(self, name: str, *, skipna: bool = True, **kwargs):
    where _reduce is also defined for sub class of ExtensionArray

not sure about the good first issue tag 😃

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 14, 2024

I had another recent case in dealing with Callable with odd arguments, and I think it will be hard to do the assert_type() based on what I've learned.

I'm fine if we don't include a test for this, and just add the declarations for the 2 functions.

As for the _reduce() issue with pyright, for extension arrays, the _reduce() operation could return an object of the dtype of the extension array, which could be anything, so use this instead:

    def _reduce(self, name: str, *, skipna: bool=..., keepdims: bool=... , **kwargs) -> object: ...

You may have to change tests/extension/decimal/array.py to return decimal.Decimal for _reduce() in there.

Agree this is not a good first issue any more, but I think you can do it!

mutricyl pushed a commit to mutricyl/pandas-stubs that referenced this issue May 14, 2024
Dr-Irv pushed a commit that referenced this issue May 14, 2024
)

* add reduce/accumulate methods stubs + minimal tests, resolves #850

* use np.integer

---------

Co-authored-by: Laurent Mutricy <laurent.mutricy@ekium.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants