Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0d arrays are recognised as collections.abc.Sized instances but don't actually support len #19833

Closed
neutrinoceros opened this issue Sep 6, 2021 · 15 comments

Comments

@neutrinoceros
Copy link
Contributor

neutrinoceros commented Sep 6, 2021

Reproducing code example:

from collections.abc import Sized
import numpy as np
a = np.array(1)

if isinstance(a, Sized):
    if len(a) > 1:
        ...

The first condition normally guards against calling len on an object that doesn't have a __len__ implementation, however this fails with 0d ndarrays.

Error message:

TypeError: len() of unsized object

NumPy/Python version information:

numpy 1.21.0, Python 3.9.6

somewhat related to #2776

@eric-wieser
Copy link
Member

eric-wieser commented Sep 6, 2021

The workaround here is to use

try:
    n = len(a)
except TypeError:
    pass
else:
    if n > 1:
        ...

That is: it's better to ask forgiveness than permission.

Fundamentally I don't think numpy has any power to fix this without creating more problems, and the above spelling is likely more idiomatic anyway.

@neutrinoceros
Copy link
Contributor Author

The EAFP pattern you described is precisely what I was initially trying to eliminate in yt where the construct is used so often that is has its own function.
What kind of problems would you expect to pop up if this was somehow fixed ?

@seberg
Copy link
Member

seberg commented Sep 7, 2021

@neutrinoceros, I expect the only way to fix this would be to make 0-D arrays subclasses of ndarray changing their type. That seems odd/problematic as well, though. Or are there approaches where subclassing would not be necessary?

@eric-wieser
Copy link
Member

eric-wieser commented Sep 7, 2021

Thanks for reminding me of the acronym EAFP. What's the reason for trying to avoid it in yt?

@neutrinoceros
Copy link
Contributor Author

What's the reason for trying to avoid it in yt?

mostly just trying to modernise the code base and start using typing and collections.abc for validation and static type checking. Because isinstance(obj, Sized) is meant to be equivalent to the try/except block proposed, I wasn't expecting anything to break by switching patterns, but numpy arrays being omnipresent in our code base, it didn't last long :)

I expect the only way to fix this would be to make 0-D arrays subclasses of ndarray changing their type

wait. Are they currently not subclasses of ndarray ? I though that was precisely the cause of this issue.

@eric-wieser
Copy link
Member

eric-wieser commented Sep 7, 2021

Because isinstance(obj, Sized) is meant to be equivalent to the try/except block proposed,

Sure, but it's slower for the success path, doesn't actually work in all cases (as seen here!), and IMO less idiomatic.

@neutrinoceros
Copy link
Contributor Author

Well if a big player like numpy is ok with breaking the language's intended rules, it also negates the chances of this becoming idiomatic. In any case it feels like this can only be discovered by trial and error, and I assume that the current behavior is 100% accidental and surely not intented.
I lost a couple hours because of it, and I can only assume I will not be the only one.
The other ˋcollections.abc`-related issue is long standing now but maybe the recent push for typing in numpy is also an opportunity to make this right. That said I don't have a clear vision of how to fix it.

@rkern
Copy link
Member

rkern commented Sep 7, 2021

Sorry, but those "language rules" were developed long after numpy arrays.

@rkern
Copy link
Member

rkern commented Sep 7, 2021

I expect the only way to fix this would be to make 0-D arrays subclasses of ndarray changing their type

wait. Are they currently not subclasses of ndarray ? I though that was precisely the cause of this issue.

No, they are ndarrays. You're right that subclassing ndarray for the 0-dim arrays wouldn't work; they would just inherit Sized. Rather, you'd need a subclass for the N>1-dim arrays that declares Sized.

@seberg
Copy link
Member

seberg commented Sep 7, 2021

Are they currently not subclasses of ndarray

They are ndarrays, being subclasses just for the sake of making them not sequences would be strange?

In my opinion, the truly idiomatic thing would probably be to either refuse being a sequence, or making len(arr) == arr.size. But frankly, neither seems doable to me, nor am I sure that it would actually be much better (I do believe it is more idiotmatic, but I am not sure it would be accessible to users to actually understand that it is idiomatic).

I would be happy to help petition Python to create an "idiotmatic way" that NumPy has a fighting chance to adhere to, but I am not exactly sure how it would look like in general, because I am more on the NumPy side of things, but this affects more the "user/library" side using NumPy.

@eric-wieser
Copy link
Member

it also negates the chances of this becoming idiomatic

I don't think changes to numpy influence whether LBYL vs EAFP is more idiomatic; if anything, this provides a clear demonstration that LBYL is fragile and requires active work from third-party packages, while EAFP works out of the box.

That's not to disagree that it would be nice if your code sample worked; just that I don't think your spelling is a particularly good idea in general.

@neutrinoceros
Copy link
Contributor Author

neutrinoceros commented Sep 8, 2021

Sorry, but those "language rules" were developed long after numpy arrays.

I'm well aware of that, and I mean no disrespect @rkern.

No, they are ndarrays. You're right that subclassing ndarray for the 0-dim arrays wouldn't work; they would just inherit Sized. Rather, you'd need a subclass for the N>1-dim arrays that declares Sized.

Thanks for the clarification ! I believe that Sized doesn't requires explicit subclassing, isinstance(obj, Sized) just checks for a __len__ method. Now I'm actually curious how 0d arrays inherit the general ndarray __len__ method and still fail an actual call to len().

In my opinion, the truly idiomatic thing would probably be to either refuse being a sequence, or making len(arr) == arr.size.

I think I need to emphasise that, at least in yt, having len(arr) == arr.size for ndarrays would most likely consitute an api breakage, as we are currently relying on the fact that len(arr) doesn't work for them, so I would advocate for the former alternative: 0d arrays shouldn't be validated as Sized objects. @seberg

I don't think changes to numpy influence whether LBYL vs EAFP is more idiomatic;

sorry about the confusion @eric-wieser, this isn't what I meant either.

if anything, this provides a clear demonstration that LBYL is fragile and requires active work from third-party packages, while EAFP works out of the box.

Truth be told my main motivation in yt was to move away from the EAFP form, not because of a preference of LBYL (if anything I'm more of an advocate for the EAFP pattern), but because I see this check in many places where the len support isn't actually the relevant check to do, but it's being somewhat cargo-culted as a sanity check. Now the LBYL form seemed like a convenient first step to replace the inappropriate EAFP checks (embedded in a function call), because it can easily be inlined (hence, decoupled from the misused function).

@eric-wieser
Copy link
Member

eric-wieser commented Sep 8, 2021

Now I'm actually curious how 0d arrays inherit the general ndarray __len__ method and still fail an actual call to len().

Numpy implements it with something equivalent to:

def __len__(self):
    if self.ndim == 0:
        raise TypeError
    else:
        return self.shape[0]  # this would be an IndexError if `ndim == 0`

@rkern
Copy link
Member

rkern commented Sep 8, 2021

I'm well aware of that, and I mean no disrespect @rkern.

Nor I. Sorry, I didn't intend those to be sneer-quotes, just quote-quotes. :-)

I believe that Sized doesn't requires explicit subclassing, isinstance(obj, Sized) just checks for a __len__ method.

"Inherit whatever makes it Sized" then. :-) Not sure there is a way to remove __len__ (or rather, the sq_length slot in the case of C types) implementations in subtypes.

One more wrinkle is that ndarray objects can be mutated to and from 0-dimness.

|1> x = np.array(10)                                                                                                                                                                                                 
                                                                                                                                                                                                          
|3> x.shape                                                                                                                                                                                                          
()
                                                                                                                                                                                             
|4> x.shape = (1,)                                                                                                                                                                                                   
                                                                                                                                                                                                           
|5> x                                                                                                                                                                                                                
array([10])
                                                                                                                                                                                                        
|6> len(x)                                                                                                                                                                                                           
1

So a LBYL type check per se just doesn't really fit with the dynamism numpy has implemented against.

I think there is a case to be made for deprecating and removing this amount of dynamism, if we decide to really clean up our act to really make these type checks express everything we want them to. Using reshape() to return a new array object (but not copying the memory) at least when moving from 0-dim to n>0-dim or vice versa should satisfy any present uses of this (and I doubt there are too many).

@seberg
Copy link
Member

seberg commented Dec 20, 2021

Closing. There is some interesting discussion in the thread, but I really don't think there is anything actionable. If anyone disagrees, please reopen/or speak up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants