ndarray should implement Sequence abc #7315

pitrou · 2016-02-23T10:59:18Z

Some third-party functions may explicit check for Sequence registration, e.g.:

>>> a = np.array([1, 2, 3, 42])
>>> random.sample(a, 2)
Traceback (most recent call last):
  File "<ipython-input-26-dc5a0ef4ca5e>", line 1, in <module>
    random.sample(a, 2)
  File "/home/antoine/35/lib/python3.5/random.py", line 311, in sample
    raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
TypeError: Population must be a sequence or set.  For dicts, use list(d).

The text was updated successfully, but these errors were encountered:

njsmith · 2016-02-23T16:38:55Z

IIRC there some discussion of this in the archives already, and there's some reason why this isn't totally trivial, like ndarray doesn't actually implement all the Sequence api.

shoyer · 2016-02-24T04:19:08Z

Perhaps the concern was 0d arrays, which don't have a length?

njsmith · 2016-02-24T08:01:59Z

I think we may also be missing reversed, index, and count

On Tue, Feb 23, 2016 at 8:19 PM, Stephan Hoyer notifications@github.com
wrote:

Perhaps the concern was 0d arrays, which don't have a length?

—
Reply to this email directly or view it on GitHub
#7315 (comment).

Nathaniel J. Smith -- https://vorpus.org http://vorpus.org

rahuldave · 2016-03-01T17:51:17Z

Is it so?

collections.abc.Sequence.__abstractmethods__

frozenset({'__getitem__', '__len__'})

jakirkham · 2016-03-01T19:07:41Z

I think you are missing the mixin methods, @rahuldave, that come from inherited ABCs. ( https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes )

njsmith · 2016-03-01T19:10:47Z

@rahuldave: a sequence has to provide more methods than that -- that's just the minimal set that are sufficient to implement the others, and the sequence ABC contains default implementations of the others in terms of those two. ndarray doesn't inherit from sequence and probably doesn't want those default implementations. (For example, I guess the index and count implementations assume that sequence elements support == turning a bool, which is not going to work for multidimensional arrays.)

jakirkham · 2016-03-01T19:28:03Z

Unless I am missing something, most of these could be (or are already) implemented.

rahuldave · 2016-03-01T19:36:32Z

By traversing the parents (Sized, Iterable, Container) and their mro's I get {'__contains__', '__getitem__', '__iter__', '__len__'}. Where is __reversed__ coming from? And isnt implementing those two just a matter of traversing memory? Most code would never use it...

jakirkham · 2016-03-01T19:41:32Z

I think it is from Sequence, but there is a default implementation. Though I could be wrong. If someone knows better, feel free to correct me.

jakirkham · 2016-03-01T19:49:36Z

Yeah, that appears to be the case. Basically the same story for Python 2. ( https://github.com/python/cpython/blob/1fe0fd9feb6a4472a9a1b186502eb9c0b2366326/Lib/_collections_abc.py#L827-L829 )

seberg · 2016-03-01T20:44:24Z

Just to link it, this has been discussed here before, first sight sounds like it went into the direction of "just add the abc": gh-2776

jakirkham · 2016-03-01T21:04:28Z

Just to throw some ideas out there to see if they sound reasonable, __contains__, index, and count should be element-wise (working with array slices is a bit odd and may not be so useful).

__reversed__ should just do what __iter__ does backwards.

Thoughts?

njsmith · 2016-03-01T21:14:08Z

To fully comply with the Sequence ABC then __contains__, index and count cannot be element-wise -- they should refer to the same things you get with single-scalar-indexing or iteration, which are the subslices returned by arr[i]. Of course ndarray already has a __contains__ which implements the elementwise semantics...

In [1]: a = np.eye(2)

In [2]: a
Out[2]: 
array([[ 1.,  0.],
       [ 0.,  1.]])

In [3]: 1 in a
Out[3]: True

The handling of slices though is really..... weird:

In [4]: [1, 0] in a
Out[4]: True

In [5]: [1, 1] in a
Out[5]: True

In [6]: a.__contains__([1, 0])
Out[6]: True

In [7]: a.__contains__([1, 1])
Out[7]: True

In [8]: a.__contains__([2, 3])
Out[8]: False

In [9]: a.__contains__([2, 0])
Out[9]: True

seberg · 2016-03-01T21:15:06Z

Contains has long been broken, got a long standing issue open to at one point make it purely element wise for everyones sanity. EDIT: Of course it is true, that in some sense it shouldn't be element wise when comparing to the sequence ABC.

jakirkham · 2016-03-01T21:19:23Z

...they should refer to the same things you get with single-scalar-indexing or iteration...

I was wonder if someone might say this. :)

I suppose this would make an argument towards making __iter__ more like flat. Though that would probably break many people's workflows. :/ Not sure the right way forward. Has there been any discussion on __iter__ in this regard before?

njsmith · 2016-03-01T21:40:03Z

@seberg: that does seem like a good idea... right now I can't even tell what the semantics are :-)

@jakirkham: my personal feeling is that in a perfect world ndarray would not even attempt to implement the sequence protocol for non-1d-arrays -- e.g. if arr.ndim != 1 then arr[0] would be an error, etc. There are a whole bunch of weird consequences of the current system -- esp. students constantly getting confused and writing things like arr[i][j][k]. (I'd probably also make iter(arr) an error for non-1d arrays, and people who wanted flat iteration should write for x in arr.ravel(): ....) But doing anything about this at this point would be extremely hard -- there's a lot of code out there that does for row in arr: .... I guess the first step would be to provide attractive alternatives to this, e.g. providing a more pleasant and canonical way to iterate over rows. (Maybe something like for row in arr[iter, ...]: ..., where we literally accept iter as a sentinel meaning "please return an iterator over this dimension"?)

seberg · 2016-03-01T22:22:43Z

I was once thinking about arr.iteraxis(axis=None) to iterate like our reductions do in some sense. Anyway, it is besides the point here. It would be nice to implement the ABC, but there are some real issues. I would almost suggest to try it and revisit before the next release, though I am not sure trying it in the dev version will actually expose most oddities, since I guess some of them are a few levels downstream (and e.g. 0D arrays are very rare in practice).

jakirkham · 2016-03-01T22:28:30Z

A totally different thought would be to provide a function in Numpy to replace random sample (unless I am completely overlooking an existing one). Considering both issues have been related to this problem in particular, this would provide a practical manner forward (possibly with a more efficient implementation than Python's) for users who want this feature. In the meantime, these larger issues of following the Sequence API can be hashed out over a possibly longer time frame.

jakirkham · 2016-03-01T22:31:00Z

BTW, there is a random_sample function, which appears to mirror random exactly. Is there a deprecation plan here or am I missing some subtle distinction?

NeilGirdhar · 2019-02-25T12:24:56Z

Should this be closed? Reading over the discussion it looks like Sequence is not a good fit for all ndarrays (just some of them).

seberg · 2019-02-25T15:38:56Z

Going to close this one in favor of gh-2776 in any case, which has also a lot of discussion. We may want to close that as well, I suppose. I doubt we will add it, although there may be things that could be done in the general direction.

numpy arrays don't (and likely won't) fulfill the Sequence interface numpy/numpy#2776 numpy/numpy#7315 The array gets converted down to a tuple any way, and each element must be an integer too. Unit test modified to allow this behavior.

seberg closed this as completed Feb 25, 2019

jorisvandenbossche mentioned this issue Oct 4, 2019

Align ExtensionArray with abc.Sequence pandas-dev/pandas#28770

Open

rgommers mentioned this issue Dec 5, 2019

Tensor.__reversed__ breaks protocol for reversible objects pytorch/pytorch#28577

Open

JS3xton mentioned this issue Jul 5, 2020

Add plot.violin() taborlab/FlowCal#335

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ndarray should implement Sequence abc #7315

ndarray should implement Sequence abc #7315

pitrou commented Feb 23, 2016

njsmith commented Feb 23, 2016

shoyer commented Feb 24, 2016

njsmith commented Feb 24, 2016

rahuldave commented Mar 1, 2016

jakirkham commented Mar 1, 2016

njsmith commented Mar 1, 2016

jakirkham commented Mar 1, 2016

rahuldave commented Mar 1, 2016

jakirkham commented Mar 1, 2016

jakirkham commented Mar 1, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

njsmith commented Mar 1, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

njsmith commented Mar 1, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

jakirkham commented Mar 1, 2016

NeilGirdhar commented Feb 25, 2019 •

edited

seberg commented Feb 25, 2019

ndarray should implement Sequence abc #7315

ndarray should implement Sequence abc #7315

Comments

pitrou commented Feb 23, 2016

njsmith commented Feb 23, 2016

shoyer commented Feb 24, 2016

njsmith commented Feb 24, 2016

rahuldave commented Mar 1, 2016

jakirkham commented Mar 1, 2016

njsmith commented Mar 1, 2016

jakirkham commented Mar 1, 2016

rahuldave commented Mar 1, 2016

jakirkham commented Mar 1, 2016

jakirkham commented Mar 1, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

njsmith commented Mar 1, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

njsmith commented Mar 1, 2016

seberg commented Mar 1, 2016

jakirkham commented Mar 1, 2016

jakirkham commented Mar 1, 2016

NeilGirdhar commented Feb 25, 2019 • edited

seberg commented Feb 25, 2019

NeilGirdhar commented Feb 25, 2019 •

edited