Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: remove ignores in refine_percentiles #42389

Merged
merged 2 commits into from Jul 7, 2021

Conversation

simonjayhawkins
Copy link
Member

No description provided.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jul 5, 2021
@@ -50,7 +51,7 @@ def describe_ndframe(
include: str | Sequence[str] | None,
exclude: str | Sequence[str] | None,
datetime_is_numeric: bool,
percentiles: Sequence[float] | None,
percentiles: Sequence[float] | np.ndarray | None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't Sequence[float] cover the ndarray cases that we actually want? i.e. once it is supported, something like np.ndarray[float, ndim=1] | None would be what we really want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type of refine_percentiles is a numpy array and we pass this onto the other functions, so we need to add np.ndarray to the acceptable types.

doesn't Sequence[float] cover the ndarray cases that we actually want?

numpy/numpy#2776, issue has been opened a while, doesn't look like a fix in the pipeline

something like np.ndarray[float, ndim=1] | None would be what we really want?

probably not float, since int is also allowed. but if we narrow using float

from __future__ import annotations

import numpy as np
from typing import Sequence, Any

arr = np.array([0.25, 0.5])
reveal_type(arr)
arr2: np.ndarray[Any, np.dtype[np.float64]] = np.array([0.25, 0.5])
reveal_type(arr2)
arr3: np.ndarray[Any, np.dtype[np.float32]] = np.array([0.25, 0.5], dtype="float64")
reveal_type(arr3)
arr4: np.ndarray[Any, np.dtype[np.int_]] = np.array([0.25, 0.5], dtype="float64")
reveal_type(arr4)
arr5: np.ndarray[Any, np.dtype[np.int_]] = np.array([1, 2, 3])
reveal_type(arr5)


def test(a: Sequence[float]):
    pass


test(arr)
test(arr2)
test(arr3)
test(arr4)
test(arr5)


def test2(a: Sequence[float] | np.ndarray[Any, np.dtype[np.float_]]):
    reveal_type(a)


test2(arr)
test2(arr2)
test2(arr3)
test2(arr4)
test2(arr5)
/home/simon/t1.py:7: note: Revealed type is "numpy.ndarray[Any, Any]"
/home/simon/t1.py:9: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy.typing._64Bit]]]"
/home/simon/t1.py:11: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy.typing._32Bit]]]"
/home/simon/t1.py:13: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.signedinteger[Any]]]"
/home/simon/t1.py:15: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.signedinteger[Any]]]"
/home/simon/t1.py:22: error: Argument 1 to "test" has incompatible type "ndarray[Any, Any]"; expected "Sequence[float]"  [arg-type]
/home/simon/t1.py:23: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[floating[_64Bit]]]"; expected "Sequence[float]"  [arg-type]
/home/simon/t1.py:24: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[floating[_32Bit]]]"; expected "Sequence[float]"  [arg-type]
/home/simon/t1.py:25: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Sequence[float]"  [arg-type]
/home/simon/t1.py:26: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Sequence[float]"  [arg-type]
/home/simon/t1.py:30: note: Revealed type is "Union[typing.Sequence[builtins.float], numpy.ndarray[Any, numpy.dtype[numpy.floating[Any]]]]"
/home/simon/t1.py:36: error: Argument 1 to "test2" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Union[Sequence[float], ndarray[Any, dtype[floating[Any]]]]"  [arg-type]
/home/simon/t1.py:37: error: Argument 1 to "test2" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Union[Sequence[float], ndarray[Any, dtype[floating[Any]]]]"  [arg-type]
Found 7 errors in 1 file (checked 1 source file)

I've added the dtype bound to the return type of refine_percentiles since for return types we should be as precise as possible.

For the arguments of describe we want to be as permissible as possible, so maybe best not to narrow till the numpy inference works properly to avoid false positives for end users, describe is public.

'int' is duck type compatible with float, see https://mypy.readthedocs.io/en/stable/duck_type_compatibility.html#duck-type-compatibility, so having Sequence[float] is equivalent to Sequence[float | int]

on master

import numpy as np
import pandas as pd

print(pd.DataFrame(pd.Series([1, 2, 3])).describe([0, 1]))
print(pd.DataFrame(pd.Series([1, 2, 3])).describe([0.5]))
print(pd.DataFrame(pd.Series([1, 2, 3])).describe(np.array([0, 1])))
print(pd.DataFrame(pd.Series([1, 2, 3])).describe(np.array([0.5])))

gives

         0
count  3.0
mean   2.0
std    1.0
min    1.0
0%     1.0
50%    2.0
100%   3.0
max    3.0
         0
count  3.0
mean   2.0
std    1.0
min    1.0
50%    2.0
max    3.0
         0
count  3.0
mean   2.0
std    1.0
min    1.0
0%     1.0
50%    2.0
100%   3.0
max    3.0
         0
count  3.0
mean   2.0
std    1.0
min    1.0
50%    2.0
max    3.0

but on master, users don't see false positives since the percentiles parameter of DataFrame.describe is not yet typed

to type that, we need to pass the user passed argument onto the lower level functions, and these therefore need to be typed in a such a way to avoid false positives, so we can't wait for downstream fixes and imo best not to narrow types for numpy arrays till it's more mature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks for walking me through that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, passing integers 0 and 1 is redundant anyway.

imo best not to narrow types for numpy arrays till it's more mature.

and for the internal functions, using the type parameters for np.ndarray is probably also fine. it's only the public api where we may want to be more cautious.

to be clear, while the constructors don't specialize and the dtype type parameter of np.ndarray is Any, then they will be type compatible with any specializations we do add. If we get these wrong, then there could be a batch of errors that need to be resolved when we get a future numpy release.

in numpy 1.21 (we use for type checking)...

def array(
    object: object,
    dtype: DTypeLike = ...,
    *,
    copy: bool = ...,
    order: _OrderKACF = ...,
    subok: bool = ...,
    ndmin: int = ...,
    like: ArrayLike = ...,
) -> ndarray: ...

so all arrays created with np.array will have an inferred type numpy.ndarray[Any, Any]

@jreback jreback added this to the 1.4 milestone Jul 7, 2021
@jreback jreback merged commit 00ceed4 into pandas-dev:master Jul 7, 2021
@simonjayhawkins simonjayhawkins deleted the ignore-2 branch July 7, 2021 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants