Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndarray should implement Sequence abc #7315

Closed
pitrou opened this issue Feb 23, 2016 · 21 comments
Closed

ndarray should implement Sequence abc #7315

pitrou opened this issue Feb 23, 2016 · 21 comments

Comments

@pitrou
Copy link
Member

pitrou commented Feb 23, 2016

Some third-party functions may explicit check for Sequence registration, e.g.:

>>> a = np.array([1, 2, 3, 42])
>>> random.sample(a, 2)
Traceback (most recent call last):
  File "<ipython-input-26-dc5a0ef4ca5e>", line 1, in <module>
    random.sample(a, 2)
  File "/home/antoine/35/lib/python3.5/random.py", line 311, in sample
    raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
TypeError: Population must be a sequence or set.  For dicts, use list(d).
@njsmith
Copy link
Member

njsmith commented Feb 23, 2016

IIRC there some discussion of this in the archives already, and there's some reason why this isn't totally trivial, like ndarray doesn't actually implement all the Sequence api.

@shoyer
Copy link
Member

shoyer commented Feb 24, 2016

Perhaps the concern was 0d arrays, which don't have a length?

@njsmith
Copy link
Member

njsmith commented Feb 24, 2016

I think we may also be missing reversed, index, and count

On Tue, Feb 23, 2016 at 8:19 PM, Stephan Hoyer notifications@github.com
wrote:

Perhaps the concern was 0d arrays, which don't have a length?


Reply to this email directly or view it on GitHub
#7315 (comment).

Nathaniel J. Smith -- https://vorpus.org http://vorpus.org

@rahuldave
Copy link

Is it so?

collections.abc.Sequence.__abstractmethods__
frozenset({'__getitem__', '__len__'})

@jakirkham
Copy link
Contributor

I think you are missing the mixin methods, @rahuldave, that come from inherited ABCs. ( https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes )

@njsmith
Copy link
Member

njsmith commented Mar 1, 2016

@rahuldave: a sequence has to provide more methods than that -- that's just the minimal set that are sufficient to implement the others, and the sequence ABC contains default implementations of the others in terms of those two. ndarray doesn't inherit from sequence and probably doesn't want those default implementations. (For example, I guess the index and count implementations assume that sequence elements support == turning a bool, which is not going to work for multidimensional arrays.)

@jakirkham
Copy link
Contributor

Unless I am missing something, most of these could be (or are already) implemented.

@rahuldave
Copy link

By traversing the parents (Sized, Iterable, Container) and their mro's I get {'__contains__', '__getitem__', '__iter__', '__len__'}. Where is __reversed__ coming from? And isnt implementing those two just a matter of traversing memory? Most code would never use it...

@jakirkham
Copy link
Contributor

I think it is from Sequence, but there is a default implementation. Though I could be wrong. If someone knows better, feel free to correct me.

@jakirkham
Copy link
Contributor

Yeah, that appears to be the case. Basically the same story for Python 2. ( https://github.com/python/cpython/blob/1fe0fd9feb6a4472a9a1b186502eb9c0b2366326/Lib/_collections_abc.py#L827-L829 )

@seberg
Copy link
Member

seberg commented Mar 1, 2016

Just to link it, this has been discussed here before, first sight sounds like it went into the direction of "just add the abc": gh-2776

@jakirkham
Copy link
Contributor

Just to throw some ideas out there to see if they sound reasonable, __contains__, index, and count should be element-wise (working with array slices is a bit odd and may not be so useful).

__reversed__ should just do what __iter__ does backwards.

Thoughts?

@njsmith
Copy link
Member

njsmith commented Mar 1, 2016

To fully comply with the Sequence ABC then __contains__, index and count cannot be element-wise -- they should refer to the same things you get with single-scalar-indexing or iteration, which are the subslices returned by arr[i]. Of course ndarray already has a __contains__ which implements the elementwise semantics...

In [1]: a = np.eye(2)

In [2]: a
Out[2]: 
array([[ 1.,  0.],
       [ 0.,  1.]])

In [3]: 1 in a
Out[3]: True

The handling of slices though is really..... weird:

In [4]: [1, 0] in a
Out[4]: True

In [5]: [1, 1] in a
Out[5]: True

In [6]: a.__contains__([1, 0])
Out[6]: True

In [7]: a.__contains__([1, 1])
Out[7]: True

In [8]: a.__contains__([2, 3])
Out[8]: False

In [9]: a.__contains__([2, 0])
Out[9]: True

@seberg
Copy link
Member

seberg commented Mar 1, 2016

Contains has long been broken, got a long standing issue open to at one point make it purely element wise for everyones sanity. EDIT: Of course it is true, that in some sense it shouldn't be element wise when comparing to the sequence ABC.

@jakirkham
Copy link
Contributor

...they should refer to the same things you get with single-scalar-indexing or iteration...

I was wonder if someone might say this. :)

I suppose this would make an argument towards making __iter__ more like flat. Though that would probably break many people's workflows. :/ Not sure the right way forward. Has there been any discussion on __iter__ in this regard before?

@njsmith
Copy link
Member

njsmith commented Mar 1, 2016

@seberg: that does seem like a good idea... right now I can't even tell what the semantics are :-)

@jakirkham: my personal feeling is that in a perfect world ndarray would not even attempt to implement the sequence protocol for non-1d-arrays -- e.g. if arr.ndim != 1 then arr[0] would be an error, etc. There are a whole bunch of weird consequences of the current system -- esp. students constantly getting confused and writing things like arr[i][j][k]. (I'd probably also make iter(arr) an error for non-1d arrays, and people who wanted flat iteration should write for x in arr.ravel(): ....) But doing anything about this at this point would be extremely hard -- there's a lot of code out there that does for row in arr: .... I guess the first step would be to provide attractive alternatives to this, e.g. providing a more pleasant and canonical way to iterate over rows. (Maybe something like for row in arr[iter, ...]: ..., where we literally accept iter as a sentinel meaning "please return an iterator over this dimension"?)

@seberg
Copy link
Member

seberg commented Mar 1, 2016

I was once thinking about arr.iteraxis(axis=None) to iterate like our reductions do in some sense. Anyway, it is besides the point here. It would be nice to implement the ABC, but there are some real issues. I would almost suggest to try it and revisit before the next release, though I am not sure trying it in the dev version will actually expose most oddities, since I guess some of them are a few levels downstream (and e.g. 0D arrays are very rare in practice).

@jakirkham
Copy link
Contributor

A totally different thought would be to provide a function in Numpy to replace random sample (unless I am completely overlooking an existing one). Considering both issues have been related to this problem in particular, this would provide a practical manner forward (possibly with a more efficient implementation than Python's) for users who want this feature. In the meantime, these larger issues of following the Sequence API can be hashed out over a possibly longer time frame.

@jakirkham
Copy link
Contributor

BTW, there is a random_sample function, which appears to mirror random exactly. Is there a deprecation plan here or am I missing some subtle distinction?

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Feb 25, 2019

Should this be closed? Reading over the discussion it looks like Sequence is not a good fit for all ndarrays (just some of them).

@seberg
Copy link
Member

seberg commented Feb 25, 2019

Going to close this one in favor of gh-2776 in any case, which has also a lot of discussion. We may want to close that as well, I suppose. I doubt we will add it, although there may be things that could be done in the general direction.

@seberg seberg closed this as completed Feb 25, 2019
drewejohnson pushed a commit to CORE-GATECH-GROUP/hydep that referenced this issue Nov 12, 2020
numpy arrays don't (and likely won't) fulfill the Sequence
interface
numpy/numpy#2776
numpy/numpy#7315

The array gets converted down to a tuple any way, and each element
must be an integer too.

Unit test modified to allow this behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants