Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type hinting / annotation (PEP 484) for ndarray, dtype, and ufunc #7370

Open
InonS opened this issue Mar 1, 2016 · 86 comments
Open

Type hinting / annotation (PEP 484) for ndarray, dtype, and ufunc #7370

InonS opened this issue Mar 1, 2016 · 86 comments

Comments

@InonS
Copy link

InonS commented Mar 1, 2016

Feature request: Organic support for PEP 484 with Numpy data structures.

Has anyone implemented type hinting for the specific numpy.ndarray class?

Right now, I'm using typing.Any, but it would be nice to have something more specific.

For instance if the numpy people added a type alias for their array_like object class. Better yet, implement support at the dtype level, so that other objects would be supported, as well as ufunc.

original SO question

@njsmith
Copy link
Member

njsmith commented Mar 2, 2016

I don't think anyone's thought about it. Perhaps you would like to? :-)

I'm also going to suggest that if you want to followup on this that we close the gh issue and move the discussion to the mailing list, since it's better suited to open-ended design discussions.

@InonS
Copy link
Author

InonS commented Mar 2, 2016

After getting this answer on SO, I've decided to close the issue.

@InonS InonS closed this as completed Mar 2, 2016
@njsmith
Copy link
Member

njsmith commented Mar 3, 2016

To be clear, we don't actually have any objection to supporting cool new python features or anything (rather the opposite); it's just that we're a volunteer run project without many resources, so stuff only happens if someone who's interested steps up to do it.

The mailing list is usually the best place if you're trying to start working on something or hoping to recruit some other interested folks to help.

@InonS
Copy link
Author

InonS commented Mar 3, 2016

Thanks, @njsmith. I decided to start here because of the more orderly issue-tracking, as opposed to an unstructured mailing list (I was looking for a 'feature request' tag, among other features...)

Since the guy who answered me on SO got back to me with a viable solution, I decided to leave the matter.
Maybe the Numpy documentation should be updated to include his answer (please make sure to give him credit if you do).

Thanks, again!

@JulesGM
Copy link

JulesGM commented Apr 27, 2017

hello guys! I was just kindly wondering if there had been any progress on this issue. Thanks.

@eric-wieser
Copy link
Member

There is some discussion about it on the mailing list here.

@shoyer
Copy link
Member

shoyer commented May 8, 2017

I'm reopening this issue for those who are interested in discussing it further.

I think this would certainly be desirable for NumPy, but there are indeed a few tricky aspects of the NumPy API for typing to sort through, such as how NumPy currently accepts arbitrary objects in the np.array constructor (though we want to clean this up, see #5353).

@shoyer shoyer reopened this May 8, 2017
@engnadeau
Copy link
Contributor

Some good work is being done here: https://github.com/machinalis/mypy-data

There's discussion about whether to push the work upstream to numpy or typeshed: machinalis/mypy-data#16

@njsmith
Copy link
Member

njsmith commented Jun 16, 2017

CC @mrocklin

@henryJack
Copy link

This really would be a great addition to NumPy. What would be the next steps to push this up to typeshed or NumPy? Even an incomplete stub would be useful and I'm happy to help with a bit of direction?

@shoyer
Copy link
Member

shoyer commented Sep 1, 2017

@henryJack The best place to start would probably be tooling: figure out how we can integrate basic type annotations into the NumPy repository (and ideally test them) in a way that works with mypy and supports adding them incrementally.

Then, start with extremely minimal annotations and we can go from there. In particular, I would skip dtype annotations for now since we don't have a good way to specify them (i.e., only do ndarray, not ndarray[int]).

If it's helpful, I have an alternative version of annotations that I've written for use at Google and could open source. But we have our own unique build system and do type checking with pytype, so there would likely be quirks porting it to upstream.

@jwkvam
Copy link

jwkvam commented Sep 1, 2017

I suppose the only way to test annotations to actually run mypy on sample code snippets and check the output?

Would it be better to have the annotations integrated with the code or as separate stubs?

I suppose we should also learn from dropbox and pandas that we should start with the leaves of the codebase versus core data structures?

@JulesGM
Copy link

JulesGM commented Sep 1, 2017

@shoyer figure out how we can integrate basic type annotations
Wouldn't just putting https://github.com/machinalis/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi in the numpy module base directory do exactly that.. In a experimental version of some kind at least

@shoyer
Copy link
Member

shoyer commented Sep 1, 2017

Would it be better to have the annotations integrated with the code or as separate stubs?

Integrated with the code would be lovely, but I don't think it's feasible for NumPy yet. Even with the comment string version of type annotations, we would need to import from typing on Python 2, and adding dependencies to NumPy is pretty much off the table.

Also, most of the core data structures and functions (things like ndarray and array) are defined in extension modules, we'll need to use stubs there anyways.

Wouldn't just putting https://github.com/machinalis/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi in the numpy module base directory do exactly that.. In a experimental version of some kind at least

Yes, I think that would be enough for external code. But how does mypy handle libraries with incomplete type annotations?

If possible, we might annotate numpy.core.multiarray directly, rather than just at the top level. (multiarray is the extension module where NumPy's core types like ndarray are defined.) I think this would allow NumPy itself to make use of type checking for some of its pure-Python modules.

@mrocklin
Copy link
Contributor

mrocklin commented Sep 1, 2017

I'm curious, what is the type of np.empty(shape=(5, 5), dtype='float32')?

What is the type of np.linalg.svd?

@jwkvam
Copy link

jwkvam commented Sep 1, 2017

I think @kjyv has taken a stab at defining those.

np.empty: https://github.com/kjyv/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi#L523
np.linalg.svd: https://github.com/kjyv/mypy-data/blob/master/numpy-mypy/numpy/linalg/__init__.pyi#L13

@mrocklin
Copy link
Contributor

mrocklin commented Sep 1, 2017

It looks like types are parametrized, is this with their dtype? Is it also feasible to parametrize with their dimension or shape? How much sophistication does Python's typing module support?

@jwkvam
Copy link

jwkvam commented Sep 1, 2017

Yea they are parameterized by their dtype. I'm no expert on the typing module but I think you could just have the ndarray type inherit Generic[dtype, int] to parameterize on ndim. I believe that's what Julia does. I'm not sure if you could easily parameterize on shape. Nor am I sure of what benefits that would bring or why it wasn't done that way in the first place.

@mrocklin
Copy link
Contributor

mrocklin commented Sep 1, 2017 via email

@jwkvam
Copy link

jwkvam commented Sep 1, 2017

You can use numpy dtypes, we just need to define them. That was done here with floating with np.std.

https://github.com/kjyv/mypy-data/blob/24ea87d952a98ef62680e812440aaa5bf49753ae/numpy-mypy/numpy/__init__.pyi#L198

I'm not sure, I don't think it's possible. I don't think you can modify the output type based on an argument's value. I think the best we can do is overload the function with all the type specializations we would care about.

https://docs.python.org/3/library/typing.html#typing.overload

@eric-wieser
Copy link
Member

Another option might be to introduce some strict-typed aliases, so np.empty[dtype] is a function with signature (ShapeType) -> ndarray[dtype].

There's already some precedent for this with the unusual np.cast[dtype](x) function

@shoyer
Copy link
Member

shoyer commented Sep 2, 2017

@jwkvam OK, so maybe dtype annotations are doable -- I was just suggesting starting simple and going from there.

I think TypeVar could possibly be used instead of overloads, maybe:

D = TypeVar('D', np.float64, np.complex128, np.int64, ...)  # every numpy generic type
def empty(dtype: Type[D]) -> ndarray[Type[D]]: ...

If I understand this correctly, this would imply empty(np.float64) -> ndarray[np.float64].

It would also be awesome to be able to type check shape and dimensionality information, but that I don't think current type checkers are up to the task. Generic[int] is an error, for example -- the arguments to Generic are required to be instances of TypeVar:
https://github.com/python/cpython/blob/868710158910fa38e285ce0e6d50026e1d0b2a8c/Lib/typing.py#L1131-L1133

We would also need to express signatures involving dimensions. For example, np.expand_dims maps ndim -> ndim+1.

I suppose one approach that would work is to define classes for each non-negative integer, e.g., Zero, One, Two, Three, ... and then define overloads for each. That would get tiring very quickly.

In TensorFlow, tf.Dimension() and tf.TensorShape() let you statically express shapes. But it's not something that is done in the type system. Rather, each function has a helper associated with it that determines the static shape of outputs from the shape of inputs and any non-tensor arguments. I think we would need something similar if we hoped to do this with NumPy, but there's nothing in Pythons typing system that suggests that this sort of flexibility.

@jwkvam
Copy link

jwkvam commented Sep 11, 2017

@shoyer I see, yea that's disappointing. I was able to hack the following

_A = TypeVar('_A')
_B = TypeVar('_B', int, np.int64, np.int32)

class Abs(Generic[_A, _B]):
    pass

class Conc(Abs[_A, int]):
    pass

But I don't think that's leading anywhere...

It seems like your example works! It seemed like it worked better without the type constraints. I could test dtypes like str. I had to remove the default argument, couldn't figure out how to get that to work.

D = TypeVar('D')
def empty(shape: ShapeType, dtype: Type[D], order: str='C') -> ndarray[D]: ...

and code

def hello() -> np.ndarray[int]:
    return np.empty(5, dtype=float)

I get

error: Argument 2 to "empty" has incompatible type Type[float]; expected Type[int]

I'm a little confused because if I swap the types:

def hello() -> np.ndarray[float]:
    return np.empty(5, dtype=int)

I get no error. Even though I don't think anything is marked as covariant.

Even though the type system isn't as sophisticated as we'd like. Do you think it's still worth it? One benefit I would appreciate is better code completion thru jedi.

@shoyer
Copy link
Member

shoyer commented Sep 11, 2017

I'm a little confused because if I swap the types:

I believe the issue here is that int instances is implicitly considered valid for float annotations. See the notes on the numeric tower in the typing PEP:
https://www.python.org/dev/peps/pep-0484/#the-numeric-tower

I think this could be avoided if we insist on NumPy scalar types instead of generic Python types for annotations, e.g., np.ndarray[np.integer] rather than np.ndarray[int].

This is actually a little easier than I thought because TypeVar has a bound argument. So revising my example:

D = TypeVar('D', bound=np.generic)
def empty(dtype: Type[D]) -> ndarray[D]: ...

I had to remove the default argument, couldn't figure out how to get that to work.

I'm not quite sure what you were getting at here?

@jwkvam
Copy link

jwkvam commented Sep 11, 2017

I just tried to encode the default value of dtype in the stub. They did that in the mypy-data repo.

def empty(shape: ShapeType, dtype: DtypeType=float, order: str='C') -> ndarray[Any]: ...

from https://github.com/kjyv/mypy-data/blob/master/numpy-mypy/numpy/__init__.pyi#L523

Following your example, I wasn't able to get mypy to work with a default argument for dtype. I tried dtype: Type[D]=float and dtype: Type[D]=Type[float].

@shoyer
Copy link
Member

shoyer commented Sep 12, 2017

I think dtype also needs to become a generic type, and then you need to set the default value to a numpy generic subclass like np.float64 rather than float, e.g.,

# totally untested!
D = TypeVar('D', bound=np.generic)

class dtype(Generic[D]):
    @property
    def type(self) -> Type[D]: ...

class ndarray(Generic[D]):
    @property
    def dtype(self) -> dtype[D]: ...

DtypeLike = Union[dtype[D], D]  # both are coercible to a dtype
ShapeLike = Tuple[int, ...]

def empty(shape: ShapeLike, dtype: DtypeLike[D] = np.float64) -> ndarray[D]: ...

@sfolje0
Copy link

sfolje0 commented Nov 22, 2020

@shoyer

There's lots of interest in this area, but more specific typing (dtypes and dimensions) for NumPy arrays isn't supported yet.

Hey community,
to encourage you to do this I am trying to show my interest in using this feature by kindly posting some nostalgic (and harmless) links. I vaguely remember this kind of type from school, when we played around a bit (one exercises only) with vectors and matrices in Lean theorem prover. It looked like something in this section in 11'th paragraph when it talks about vectors (..."Vector operations are handled similarly:"...) but really more like in solutions of exercises 3 and 4 in the same documentation text. I know it is an overkill but I am posting just for inspiration.

@eric-wieser
Copy link
Member

eric-wieser commented Nov 22, 2020

Funny you should mention Lean, I've been working with it solidly for the last few months. While interesting from its own right, my impression is that the heavy dependent-typing used by lean would be a significant challenge for mypy to adopt, and arguably not a worthwhile one - at a certain point, these things are better as language features. For the case of numpy, there are plenty of weaker type systems which are good enough role models.

@ryanpeach
Copy link

ryanpeach commented Mar 17, 2021

Have we looked at using PEP 593 to improve numpy typing? For instance we could use Annotated[np.ndarray, Shape[3, N, 5], DType[np.int64]] etc.

@mrahtz
Copy link

mrahtz commented Apr 13, 2021

For anyone still following along here: one blocker for this has been variadic generics, which we're trying to make some progress on with PEP 646, which is currently in review by the Python steering council.

Some other links that might be of interest are:

  • TensorAnnotations, a library of shape-aware stubs for TensorFlow and JAX, with NumPy support hopefully coming soon (disclaimer: I'm the main dev)
  • tsanley which does something similar but with a runtime checker.
  • torchtyping ditto, but only for PyTorch right now.
  • PyContracts ditto, but general-purpose and much more flexible.

Have we looked at using PEP 593 to improve numpy typing?

This is definitely one option, and it'd be pretty cool to see a runtime checker which employed these kinds of annotations. The reason I'm personally gunning for the approach suggested in PEP 646 is that it would allow existing tooling like existing static type checkers to verify the kinds of typing things we care about, with (relatively) little extra effort. (OK, we'd have to implement support for 646 in e.g. Mypy, but that's probably simpler than writing a static analysis tool from scratch.)

@mrahtz
Copy link

mrahtz commented May 4, 2021

Another quick update: Pradeep Kumar Srinivasan and I will be giving a talk on the approach we've been experimenting with over the past 6 months at the PyCon 2021 Typing Summit next week, Catching Tensor Shape Errors Using the Type Checker: https://us.pycon.org/2021/summits/typing/ We'll be discussing how it works, what it looks like in practice, and a few of its current limitations. Hope to see you there!

@BvB93
Copy link
Member

BvB93 commented May 7, 2021

Hi all,

Some time ago, back in #17719, dtype support was introduced for np.ndarray (including a placeholder slot for shapes). As a followup, earlier today a new PR was submitted (#18935) adding a runtime-subscriptable alias for np.ndarray[Any, np.dtype[~Scalar]], the latter providing a convenient (and compact) alias for annotating arrays with a given dtype and unspecified shape:

>>> import numpy as np
>>> import numpy.typing as npt

>>> print(npt.NDArray)
numpy.ndarray[typing.Any, numpy.dtype[~ScalarType]]

>>> print(npt.NDArray[np.float64])
numpy.ndarray[typing.Any, numpy.dtype[numpy.float64]]

>>> NDArrayInt = npt.NDArray[np.int_]
>>> a: NDArrayInt = np.arange(10)

>>> def func(a: npt.ArrayLike) -> npt.NDArray[Any]:
...     return np.array(a)

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 7, 2021

@BvB93 This is awesome. I've been using your change since numpy 1.21 came out. Are you planning on adding runtime subscripting like np.ndarray[Any, Any], np.integer[Any], np.floating[Any], np.dtype[Any]? MyPy complains if I don't put Any since I have the flag for no implicit generics, and I end up having to work around numpy's missing __class_getitem__.

@rggjan
Copy link

rggjan commented Jul 7, 2021

@NeilGirdhar excellent idea. As a workaround in the meantime, you could try using from __future__ import annotations, which should make it work at runtime (as usually annotations are not evaluated at runtime anymore).

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jul 7, 2021

@rggjan Yup, thanks! That works in many cases except when you want to do set aliases like this. Also, pylint rightly complains that these type objects are not subscriptable.

@gsakkis
Copy link

gsakkis commented Aug 28, 2021

Is it possible to annotate structured arrays and if so how? I tried ndarray[Any, [('i', np.int16), ('q', np.uint16)]] but got Bracketed expression "[...]" is not valid as a type (likewise for other failed attempts).

@Jasha10
Copy link

Jasha10 commented Aug 28, 2021

Is it possible to annotate structured arrays and if so how?

I believe that as of Numpy 1.21 it is not yet possible.

@BvB93
Copy link
Member

BvB93 commented Aug 30, 2021

Is it possible to annotate structured arrays and if so how? I tried ndarray[Any, [('i', np.int16), ('q', np.uint16)]] but got Bracketed expression "[...]" is not valid as a type (likewise for other failed attempts).

Unfortunately not, and I very much doubt that list-of-tuples-syntax will ever be something that mypy will understand (not without some serious plugin magic, at least).

As for structured arrays in general, there two main challenges here:

  1. How to type the necasary structure into the np.void dtype.

    Ideally we'd make it generic w.r.t. to something like TypedDict, so field dtypes can be assigned to each key (i.e. field names). Making np.void generic is however complicated by its flexibility, as it can be used for representing opaque bytes sequences ("V10"), dtypes with a field size ((np.float64, 8)) and structured dtypes with a set of keys and matching field dtypes ([("a", np.float64)] or [("a", np.float64, 8)]).

    Only the last category can reasonably be express via a TypedDict, so this raises the question what do about the other two? Make it generic w.r.t. a trinary Union? Treat all three categories as type-check-only subclasses? This is very much an open question.

  2. How to let ndarray.__getitem__ and __setitem__ access the named fields.

    Letting ndarray access and use the fields encoded within the dtype will be a challenge of its own. Namelly, the only two types that currently deal with arbitrary named fields (NamedTuple and TypedDict) have, in my experience, proven to be less than cooperative when dispatching with the help of protocols. This is very much a mypy bug, but in this context it's probably going to be a detrimental one.

    For example:

    from typing import TypedDict, Protocol, TypeVar, TYPE_CHECKING
    
    KT = TypeVar("KT")
    VT = TypeVar("VT")
    KT_contra = TypeVar("KT_contra", contravariant=True)
    VT_co = TypeVar("VT_co", covariant=True)
    
    class SupportsGetItem(Protocol[KT_contra, VT_co]):
        def __getitem__(self, key: KT_contra, /) -> VT_co: ...
    
    class TestDict(TypedDict):
        a: int
        b: str
    
    def getitem(dct: SupportsGetItem[KT, VT], key: KT) -> VT: ...
    
    test_dict: TestDict
    if TYPE_CHECKING:
        reveal_type(getitem(test_dict, "a"))  # Revealed type is "builtins.object*"
        reveal_type(getitem(test_dict, "b"))  # Revealed type is "builtins.object*"
        reveal_type(getitem(test_dict, "c"))  # Revealed type is "builtins.object*"

@BvB93
Copy link
Member

BvB93 commented Sep 16, 2021

@NeilGirdhar there is currently a PR up for making number, dtype and ndarray runtime-subscriptable (#19879),
though note that this functionality does have a hard dependency on python >= 3.9.

The hope is to wrap things up before the next 1.22 release.

@Jasha10
Copy link

Jasha10 commented Nov 21, 2021

That works in many cases except when you want to do set aliases like this.

@NeilGirdhar The workaround I've been using in that case is to put quotes around the subscripted numpy type:

    RealArray = npt.NDArray["np.floating[Any]"]

@Why-not-now
Copy link

Why-not-now commented Dec 6, 2022

Is it possible to annotate structured arrays and if so how? I tried ndarray[Any, [('i', np.int16), ('q', np.uint16)]] but got Bracketed expression "[...]" is not valid as a type (likewise for other failed attempts).

Unfortunately not, and I very much doubt that list-of-tuples-syntax will ever be something that mypy will understand (not without some serious plugin magic, at least).

As for structured arrays in general, there two main challenges here:

1. How to type the necasary structure into the `np.void` dtype.
   Ideally we'd make it generic w.r.t. to something like `TypedDict`, so field dtypes can be assigned to each key (_i.e._ field names). Making `np.void` generic is however complicated by its flexibility, as it can be used for representing opaque bytes sequences (`"V10"`), dtypes with a field size (`(np.float64, 8)`) and structured dtypes with a set of keys and matching field dtypes (`[("a", np.float64)]` or `[("a", np.float64, 8)]`).
   Only the last category can reasonably be express via a `TypedDict`, so this raises the question what do about the other two? Make it generic w.r.t. a trinary `Union`? Treat all three categories as type-check-only subclasses? This is very much an open question.

2. How to let `ndarray.__getitem__` and `__setitem__` access the named fields.
   Letting `ndarray` access and use the fields encoded within the dtype will be a challenge of its own. Namelly, the only two types that currently deal with arbitrary named fields (`NamedTuple` and `TypedDict`) have, in my experience, proven to be less than cooperative when dispatching with the help of protocols. This is very much a mypy bug, but in this context it's probably going to be a detrimental one.
   For example:
   ```python
   from typing import TypedDict, Protocol, TypeVar, TYPE_CHECKING
   
   KT = TypeVar("KT")
   VT = TypeVar("VT")
   KT_contra = TypeVar("KT_contra", contravariant=True)
   VT_co = TypeVar("VT_co", covariant=True)
   
   class SupportsGetItem(Protocol[KT_contra, VT_co]):
       def __getitem__(self, key: KT_contra, /) -> VT_co: ...
   
   class TestDict(TypedDict):
       a: int
       b: str
   
   def getitem(dct: SupportsGetItem[KT, VT], key: KT) -> VT: ...
   
   test_dict: TestDict
   if TYPE_CHECKING:
       reveal_type(getitem(test_dict, "a"))  # Revealed type is "builtins.object*"
       reveal_type(getitem(test_dict, "b"))  # Revealed type is "builtins.object*"
       reveal_type(getitem(test_dict, "c"))  # Revealed type is "builtins.object*"
   ```

Sorry for the random reply but is this now supported now that we are beyond1.22? If so, the numpy typing documents have not made it clear, in the dev or in the stable release

@marcospgp
Copy link

Is there an update on this? This issue is referenced on Stack Overflow about ndarray typing still being a work in progress, but I suspect advances have been made as some python libraries have ok typing for ndarray returning functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests