Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add Protocol for checking if a dtype is structured #22286

Open
nstarman opened this issue Sep 16, 2022 · 5 comments
Open

ENH: Add Protocol for checking if a dtype is structured #22286

nstarman opened this issue Sep 16, 2022 · 5 comments

Comments

@nstarman
Copy link

nstarman commented Sep 16, 2022

Proposed new feature or change:

It would be very useful to be able to type annotate that a dtype is structured.

A runtime-checkable Protocol is insufficient to detect names and fields are not None in a structured dtype (https://docs.python.org/3/library/typing.html#typing.runtime_checkable).

@runtime_checkable
class StructuredDType(Protocol):
    @property
    def names(self) -> tuple: ...

    @property
    def fields(self) -> Mapping[str, tuple]: ...
@mhvk
Copy link
Contributor

mhvk commented Sep 16, 2022

Note that the dtypes have a hierarchy,

dt = np.dtype(('u2', [('x', 'u1'), ('y', 'u1')]))
np.issubdtype(dt, np.void)
# True

Also, since

type(dt)
# numpy.dtype[void]

it would make sense to make that work for typing (if it doesn't already). Of course, not every void is a structured dtype, but most are...

@GoddessLuBoYan
Copy link

well, that's cool if we use "isinstance" or "issubclass" instead of "np.issubdtype"

@nstarman
Copy link
Author

I opened python/typing#1257 that would allow isinstance to check into fields if they are themselves Protocols. This would allow us to create a Protocol that can distinguish between a normal and structured dtype.

# This is a Protocol for tuple, just to distinguish from None
@runtime_checkable
class _StructuredDTypeName(Protocol):
    def count(self, value: Any) -> Any: ...
    def index(self, start: int, stop: int) -> Any ...

# This is a Protocol for Mapping[str, tuple], just to distinguish from None
@runtime_checkable
class _StructuredDTypeFields(Protocol):
    def get(self, key: str, default: str | None) -> Any: ...
    ... # more Mapping stuff

@runtime_checkable
class StructuredDType(Protocol):
    @property
    def names(self) -> _StructuredDTypeName: ...

    @property
    def fields(self) -> _StructuredDTypeFields: ...

@seberg
Copy link
Member

seberg commented Sep 19, 2022

Yeah, you can use isinstance(dtype, type(np.dtype("i,i"))) where the dtype call is just a weird way to get to the class. I am not quite sure if those types are already exposed somewhere in the typing code @BvB93?

As Marten said, in theory, that is not correct because other dtypes can have fields. But even if that actually works in practice, I think we should get rid of it.

(The use-case I know would be to have a C style union dtype. I don't think anyone should do that though, because NumPy has the arr.view concept.)

EDIT: The isinstance check will work since NumPy 1.19, IIRC.

Whoops, of course the isinstance check will also match unstructured voids or subarray dtypes unfortunately...

@BvB93
Copy link
Member

BvB93 commented Sep 19, 2022

Yeah, you can use isinstance(dtype, type(np.dtype("i,i"))) where the dtype call is just a weird way to get to the class. I am not quite sure if those types are already exposed somewhere in the typing code @BvB93?

String literals for structured dtypes are particularly tricky to deal with when static type checking (as pythons static typing tools lack any form of regex-based validation), so I strongly suspect that the right argument will simply be inferred as type[np.dtype[Any]].

On the other hand, the dtype class is generic w.r.t. its dtype.type attribute, so you could instead use isinstance(dtype.type, np.void) and this should work right out of the box.

As was pointed out previously this unfortuantely does not differentiate between the various flavors of void-based dtypes (structured, byte sequences, etc.). In #7370 (comment) I do go into a bit more detail into the the challenges and potential solutions regarding the latter, but thus far I've yet to see anything that we could actually implement in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants