Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduceat cornercase (Trac #236) #834

Open
numpy-gitbot opened this issue Oct 19, 2012 · 51 comments · May be fixed by #25476
Open

reduceat cornercase (Trac #236) #834

numpy-gitbot opened this issue Oct 19, 2012 · 51 comments · May be fixed by #25476

Comments

@numpy-gitbot
Copy link

Original ticket http://projects.scipy.org/numpy/ticket/236 on 2006-08-07 by trac user martin_wiechert, assigned to unknown.

.reduceat does not handle repeated indices correctly. When an index is repeated the neutral element of the operation should be returned. In the example below [0, 10], not [1, 10], is expected.

In [1]:import numpy

In [2]:numpy.version.version
Out[2]:'1.0b1'

In [3]:a = numpy.arange (5)

In [4]:numpy.add.reduceat (a, (1,1))
Out[4]:array([ 1, 10])
@numpy-gitbot
Copy link
Author

@teoliphant wrote on 2006-08-08

Unfortunately, perhaps, the reduceat method of NumPy follows the behavior of the reduceat method of Numeric for this corner case.

There is no facility for returning the "identity" element of the operation in cases of index-equality. The defined behavior is to return the element given by the first index if the slice returns an empty sequence. Therefore, the documented and actual behavior of reduceat in this case is to construct

[a[1], add.reduce(a[1:])]

This is a feature request.

@numpy-gitbot
Copy link
Author

trac user martin_wiechert wrote on 2006-08-08

also see ticket #835

@numpy-gitbot
Copy link
Author

Milestone changed to 1.1 by @alberts on 2007-05-12

@numpy-gitbot
Copy link
Author

Milestone changed to Unscheduled by @cournape on 2009-03-02

@jnothman
Copy link
Member

I think this is closely connected to #835: If one of the indices is len(a), reduceat cannot output the element at that index, which is needed if the index len(a) appears or is repeated at the end of the indices.

Some solutions:

  • an option to reduceat to not set any value in the output where end - start == 0
  • an option to set the output to a given fixed value where end - start == 0
  • a where parameter, like in ufunc(), which masks which outputs should be calculated at all.

@jayvius
Copy link
Contributor

jayvius commented Jul 2, 2015

Has there been any more thought on this issue? I would be interested in having the option to set the output to the identity value (if it exists) where end - start == 0.

@divenex
Copy link

divenex commented Nov 25, 2015

I strongly support the change of the reduceat behaviour as suggested in this long-standing open issue. It looks like a clear bug or obvious design mistake which hinders the usefulness of this great Numpy construct.

reduceat should behave consistently for all indices. Namely, for every index i, ufunc.reduceat(a, indices) should return ufunc.reduce(a[indices[i]:indices[i+1]]).

This should also be true for the case indices[i] == indices[i+1]. I cannot see any sensible reason why, in this case, reduceat should return a[indices[i]] instead of ufunc.reduce(a[indices[i]:indices[i+1]]).

See also HERE a similar comment by Pandas creator Wes McKinney.

@njsmith
Copy link
Member

njsmith commented Nov 25, 2015

Wow, this is indeed terrible and broken.
.
We'd need some discussion on the mailing list, but I at least would be
totally in favor of making that issue a FutureWarning in the next release
and fixing the behavior a few releases later. We'd need someone to take the
lead on starting that discussion and writing the patch. Perhaps that's you?

@divenex
Copy link

divenex commented Nov 26, 2015

Thanks for the supportive response. I can start a discussion if this helps, but unfortunately am not up to patching the C code.

@jnothman
Copy link
Member

jnothman commented Nov 26, 2015

What do you intend for ufuncs without an identity, such as np.maximum?

@njsmith
Copy link
Member

njsmith commented Nov 26, 2015

For such functions, an empty reduction should be an error, as it already is
when you use .reduce() instead of .reduceat().

@divenex
Copy link

divenex commented Nov 26, 2015

Indeed, the behaviour should be driven by the consistency with ufunc.reduce(a[indices[i]:indices[i+1]]), which is what every user would expect. So this does not require new design decisions. It really looks just like a long standing bug fix to me. Unless anybody can justify the current inconsistent behaviour.

@divenex
Copy link

divenex commented Dec 7, 2015

@njsmith I am unable to sign up to the Numpy list. I sent my address here https://mail.scipy.org/mailman/listinfo/numpy-discussion but I never get any "email requesting confirmation". Not sure whether one need special requirements to subscribe...

@njsmith
Copy link
Member

njsmith commented Dec 7, 2015

@divenex: did you check your spam folder? (I always forget to do that...) Otherwise I'm not sure what could be going wrong. There definitely shouldn't be any special requirements to subscribe beyond "has an email address". If you still can't get it to work then speak up and we'll try to track down the relevant sysadmin... We definitely want to know if it's broken.

@WarrenWeckesser
Copy link
Member

A version of reduceat that is consistent with ufunc.reduce(a[indices[i]:indices[i+1]]) would be really, really nice. It would be so much more useful! Either an argument to select the behavior or a new function (reduce_intervals? reduce_segments? ...?) would avoid breaking backwards incompatibility.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 13, 2017

I'd perhaps be tempted to deprecate np.ufunc.reduceat alltogether - it seems more useful to be able to specify a set of start and end indices, to avoid cases where indices[i] > indices[i+1]. Also, the name at suggests a much greater similarity to at than atually exists

What I'd propose as a replacement is np.piecewise_reduce np.reducebins, possibly pure-python, which basically does:

def reducebins(func, arr, start=None, stop=None, axis=-1, out=None):
    """
    Compute (in the 1d case) `out[i] = func.reduce(arr[start[i]:stop[i]])`

    If only `start` is specified, this computes the same reduce at `reduceat` did:

        `out[i]  = func.reduce(arr[start[i]:start[i+1]])`
        `out[-1] = func.reduce(arr[start[-1]:])`

    If only `stop` is specified, this computes:

        `out[0] = func.reduce(arr[:stop[0]])`
        `out[i] = func.reduce(arr[stop[i-1]:stop[i]])`

    """
    # convert to 1d arrays
    if start is not None:
        start = np.array(start, copy=False, ndmin=1, dtype=np.intp)
        assert start.ndim == 1
    if stop is not None:
        stop = np.array(stop, copy=False, ndmin=1, dtype=np.intp)
        assert stop.ndim == 1

    # default arguments that do useful things
    if start is None and stop is None:
        raise ValueError('At least one of start and stop must be specified')
    elif stop is None:
        # start only means reduce from one index to the next, and the last to the end
        stop = np.empty_like(start)
        stop[:-1] = start[1:]
        stop[-1] = arr.shape[axis]
    elif start is None:
        # stop only means reduce from the start to the first index, and one index to the next
        start = np.empty_like(stop)
        start[1:] = stop[:-1]
        start[0] = 0
    else:
        # TODO: possibly confusing?
        start, stop = np.broadcast_arrays(start, stop)

    # allocate output - not clear how to do this safely for subclasses
    if not out:
        sh = list(arr.shape)
        sh[axis] = len(stop)
        sh = tuple(sh)
        out = np.empty(shape=sh)

    # below assumes axis=0 for brevity here
    for i, (si, ei) in enumerate(zip(start, stop)):
        func.reduce(arr[si:ei,...], out=out[i, ...], axis=axis)
    return out

Which has the nice properties that:

  • np.add.reduce(arr) is the same as np.piecewise_reduce(np.add, arr, 0, len(arr))
  • np.add.reduceat(arr, inds) is the same as np.piecewise_reduce(np.add, arr, inds)
  • np.add.accumulate(arr) is the same as np.piecewise_reduce(np.add, arr, 0, np.arange(len(arr)))

Now, does this want to go through the__array_ufunc__ machinery? Most of what needs to be handled should be already covered by func.reduce - the only issue is the np.empty line, which is a problem that np.concatenate shares.

@jnothman
Copy link
Member

That sounds like a nice solution to me from an API perspective. Even just being able to specify two sets of indices to reduceat would suffice. From an implementation perspective? Well it's not very hard to change the current PyUFunc_Reduceat to support having two sets of inds, if that provides benefit. If we really see the advantage in supporting the accumulate-like use-case efficiently, it would not be hard to do that either.

@jaimefrio
Copy link
Member

jaimefrio commented Apr 13, 2017 via email

@eric-wieser
Copy link
Member

eric-wieser commented Apr 13, 2017

Use 'start' and 'stop'

Done

Should we make 'step' an option

Seems like a pretty narrow use case

Does it make sense for the indices arrays to broadcast, or must they be 1D

Updated. > 1d is obviously bad, but I think we should allow 0d and broadcasting, for cases like accumulate.

Should this be a np function, or a ufunc method? (I think I prefer it
as a method)

Every ufunc method is one more thing for __array_ufunc__ to handle.

@divenex
Copy link

divenex commented Apr 13, 2017

The main motivation for reduceat is to avoid a loop over reduce for maximum speed. So I am not entirely sure a wrapper of a for loop over reduce would be a very useful addition to Numpy. It would go against reduceat main purpose.

Moreover the logic for reduceat existence and API, as a fast vectorized replacement for a loop over reduce, is clean and useful. I would not deprecate it, but rather fix it.

Regarding reduceat speed, let's consider a simple example, but similar to some real-world cases I have in my own code, where I use reduceat:

n = 10000
arr = np.random.random(n)
inds = np.random.randint(0, n, n//10)
inds.sort()

%timeit out = np.add.reduceat(arr, inds)
10000 loops, best of 3: 42.1 µs per loop

%timeit out = piecewise_reduce(np.add, arr, inds)
100 loops, best of 3: 6.03 ms per loop

This is a time difference of more than 100x and illustrates the importance of preserving reduceat efficiency.

In summary, I would prioritize fixing reduceat over introducing new functions.

Having a start_indices and end_indices, altough useful in some cases, is often redundant and I would see it as a possible addition, but not as a fix for the current reduceat inconsistent behaviour.

@jnothman
Copy link
Member

jnothman commented Apr 13, 2017 via email

@eric-wieser
Copy link
Member

This is a time difference of more than 100x and illustrates the importance of preserving reduceat efficiency.

Thanks for that - I guess I underestimated the overhead associated with the first stage of a reduce call (that only happens once for reduceat).

Not an argument against a free function, but certainly an argument against implementing it in pure python

@eric-wieser
Copy link
Member

eric-wieser commented Apr 13, 2017

but not as a fix for the current reduceat inconsistent behaviour.

The problem is, that it's tricky to change the behaviour of code that's been around for so long.


Another possible extension: when indices[i] > indices[j], compute the inverse:

    for i, (si, ei) in enumerate(zip(start, stop)):
        if si >= ei:
            func.reduce(arr[si:ei,...], out=out[i, ...], axis=axis)
        else:
            func.reduce(arr[ei:si,...], out=out[i, ...], axis=axis)
            func.inverse(func.identity, out[i, ...], out=out[i, ...])

Where np.add.inverse = np.subtract, np.multiply.inverse = np.true_divide. This results in the nice property that

func.reduce(func.reduceat(x, inds_from_0)) == func.reduce(x))

For example

a = [1, 2, 3, 4]
inds = [0, 3, 1]
result = np.add.reduceat(a, inds) # [6, -5, 9] == [(1 + 2 + 3), -(3 + 2), (2 + 3 + 4)]

@mhvk
Copy link
Contributor

mhvk commented Apr 13, 2017

The problem is, that it's tricky to change the behaviour of code that's been around for so long.

This is partially why in the e-mail thread I suggested to give special meaning to a 2-D array of indices in which the extra dimension is 2 or 3: it then is (effectively) interpreted as a stack of slices. But I realise this is also somewhat messy and of course one might as well have a reduce_by_slice, slicereduce, or reduceslice method.

p.s. I do think anything that works on many ufuncs should be a method, so that it can be passed through __array_ufunc__ and be overridden.

@mhvk
Copy link
Contributor

mhvk commented Apr 13, 2017

Actually, a different suggestion that I think is much better: rather than salvaging reduceat, why not add a slice argument (or start, stop, step) to ufunc.reduce!? As @eric-wieser noted, any such implementation means we can just deprecate reduceat altogether, as it would just be

add.reduce(array, slice=slice(indices[:-1], indices[1:])

(where now we are free to make the behaviour match what is expected for an empty slice)

Here, one would broadcast the slice if it were 0-d, and might even consider passing in tuples of slices if a tuple of axes was used.

EDIT: made the above slice(indices[:-1], indices[1:]) to allow for extension to a tuple of slices (slice can hold arbitrary data, so this would work fine).

@divenex
Copy link

divenex commented Apr 13, 2017

I would still find a fix to reduceat, to make it a proper 100% vectorized version of reduce, the most logical design solution. Alternatively, to avoid breaking code (but see below), an equivalent method named like reducebins could be created, which is simply a corrected version of reduceat. In fact, I agree with @eric-wieser that the naming of reduceat conveys more connection to the at function than there is.

I do understand the need not to break code. But I must say that I find it hard to imagine that much code depended on the old behavior, given that it simply did not make logical sense, and I would simply call it a long standing bug. I would expect that code using reduceat just made sure indices were not duplicated, to avoid a nonsense result from reduceat, or fixed the output as I did using out[:-1] *= np.diff(indices) > 0. Of course I would be interested in a user case where the old behavior/bug was used as intended.

I am not fully convinced about @mhvk slice solution because it introduces a non-standard usage for the slice construct. Moreover it would be inconsistent with the current design idea of reduce, which is to "reduce a‘s dimension by one, by applying ufunc along one axis."

I also do not see a compelling user case for both start and end indices. In fact, I see the nice design logic of the current reduceat method conceptually similar to np.histogram, where bins, which "defines the bin edges," are replaced by indices, which also represent the bins edges, but in index space rather than value. And reduceat applies a function to the elements contained inside each pair of bins edges. The histogram is an extremely popular construct, but it does not need, and in Numpy does not include, an option to pass two vectors of left and right edges. For the same reaason I doubt there is a strong need for both edges in reduceat or its replacement.

@shoyer
Copy link
Member

shoyer commented Apr 13, 2017

The main motivation for reduceat is to avoid a loop over reduce for maximum speed. So I am not entirely sure a wrapper of a for loop over reduce would be a very useful addition to Numpy. It would go against reduceat main purpose.

I agree with @divenex here. The fact that reduceat requires indices to be sorted and overlapping is a reasonable constraint to ensure that the loop can be computed cache efficient manner with a single pass over the data. If you want overlapping bins, there are almost certainly better ways to compute the desired operation (e.g., rolling window aggregations).

I also agree that the cleanest solution is to define a new method such as reducebins with a fixed API (and deprecate reduceat), and to not to try to squeeze it into reduce which already does something different.

@jni
Copy link
Contributor

jni commented Apr 14, 2017

Hi everyone,

I want to nip at the bud the discussion that this is a bug. This is documented behaviour from the docstring:

For i in ``range(len(indices))``, `reduceat` computes
``ufunc.reduce(a[indices[i]:indices[i+1]])``, which becomes the i-th
generalized "row" parallel to `axis` in the final result (i.e., in a
2-D array, for example, if `axis = 0`, it becomes the i-th row, but if
`axis = 1`, it becomes the i-th column).  There are three exceptions to this:

* when ``i = len(indices) - 1`` (so for the last index),
  ``indices[i+1] = a.shape[axis]``.
* if ``indices[i] >= indices[i + 1]``, the i-th generalized "row" is
  simply ``a[indices[i]]``.
* if ``indices[i] >= len(a)`` or ``indices[i] < 0``, an error is raised.

As such, I oppose any attempt to change the behaviour of reduceat.

A quick github search shows many, many uses of the function. Is everyone here certain that they all use only strictly increasing indices?

Regarding the behaviour of a new function, I would argue that without separate start/stop arrays, the functionality is severely hampered. There are many situations where one would want to measure values in overlapping windows that are not regularly arrayed (so rolling windows would not work). For example, regions of interest determined by some independent method. And @divenex has shown that the performance difference over Python iteration can be massive.

@shoyer
Copy link
Member

shoyer commented Apr 14, 2017

There are many situations where one would want to measure values in overlapping windows that are not regularly arrayed (so rolling windows would not work).

Yes, but you wouldn't want to use a naive loop such as the one implemented by reduceat. You'd want to implement your own rolling window calculation storing intermediate results in some way so it can be done in a single linear pass over the data. But now we're talking about an algorithm that is much more complicated than reduceat.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 18, 2017

So, what would be a natural usage case for starts and stops in reducebins?

Achievable by other means, but a moving average of length k would be reducebins(np,add, arr, arange(n-k), k + arange(n-k)). I suspect that ignoring the cost of allocating the indices, performance would be comparable to a as_strided approach.

Uniquely, reducebins would allow a moving average of varying duration, which is not possible with as_strided

@eric-wieser
Copy link
Member

eric-wieser commented Apr 18, 2017

Another use case - disambiguating between including the end or the start in the one-argument form.

For instance:

a = np.arange(10)
reducebins(np.add, start=[2, 4, 6]) == [2 + 3, 4 + 5, 6 + 7 + 8 + 9]  # what `reduceat` does
reducebins(np.add, stop=[2, 4, 6])  == [0 + 1, 2 + 3, 4 + 5]          # also useful

@shoyer
Copy link
Member

shoyer commented Apr 19, 2017

Another use case - disambiguating between including the end or the start in the one-argument form.

I don't quite understand this one. Can you include the input tensor here? Also: what would be the default values for start/stop?

Anyways, I'm not strongly against the separate arguments, but it's not as clean of a replacement. I would love to able to say "Don't use reduceat, use reducebins instead" but that's (slightly) harder when the interface looks different.

@jni
Copy link
Contributor

jni commented Apr 19, 2017

Actually, I just realised that even a start/stop option does not cover the use-case of empty slices, which is one that has been useful to me in the past: when my properties/labels correspond to rows in a CSR sparse matrix, and I use the values of indptr to do the reduction. With reduceat, I can ignore the empty rows. Any replacement will require additional bookkeeping. So, whatever replacement you come up with, please leave reduceat around.

In [2]: A = np.random.random((4000, 4000))
In [3]: B = sparse.csr_matrix((A > 0.8) * A)
In [9]: %timeit np.add.reduceat(B.data, B.indptr[:-1]) * (np.diff(B.indptr) > 1)
1000 loops, best of 3: 1.81 ms per loop
In [12]: %timeit B.sum(axis=1).A
100 loops, best of 3: 1.95 ms per loop
In [16]: %timeit np.maximum.reduceat(B.data, B.indptr[:-1]) * (np.diff(B.indptr) > 0)
1000 loops, best of 3: 1.8 ms per loop
In [20]: %timeit B.max(axis=1).A
100 loops, best of 3: 2.12 ms per loop

Incidentally, the empty sequence conundrum can be solved the same way that Python does it: by providing an initial value. This could be a scalar or an array of the same shape as indices.

@jnothman
Copy link
Member

jnothman commented Apr 19, 2017 via email

@divenex
Copy link

divenex commented Apr 19, 2017

I am fully with @shoyer about his last comment.

Let's simply define out=ufunc.reducebins(a, inds) as out[i]=ufunc.reduce(a[inds[i]:inds[i+1]]) for all i but the last, and deprecate reduceat.

Current use cases for starts and ends indices seem more naturally and likely more efficiently implemented with alternative functions like either as_strided or convolutions.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 19, 2017

@shoyer:

I don't quite understand this one. Can you include the input tensor here? Also: what would be the default values for start/stop?

Updated with the input. See the implementation of reduce_bins in the comment that started this for the default values. I've added a docstring there too. That implementation is feature-complete but slow (due to being python).

but that's (slightly) harder when the interface looks different.

When only one the start argument is passed, the interface is identical (ignoring the identity casethat we set out to fix in the first place). These three lines mean the same thing:

np.add.reduce_at(arr, inds)
reduce_bins(np.add, arr, inds)
reduce_bins(np.add, arr, start=inds)

(the method/function distinction is not something I care too much about, and I can't define a new ufunc method as a prototype in python!)


@jni:

Actually, I just realised that even a start/stop option does not cover the use-case of empty slices, which is one that has been useful to me in the past

You're wrong, it does - in the exact same way as ufunc.reduceat already does. It's also possible simply by passing start[i] == end[i].

the empty sequence conundrum can be solved ... by providing an initial value.

Yes, we've already covered this, and ufunc.reduce already does that by filling with ufunc.identity. This is not hard to add to the existing ufunc.reduecat, especially if #8952 is merged. But as you said yourself, the current behaviour is documented, so we should probably not change it.


@divenex

Let's simply define out=ufunc.reducebins(a, inds) as out[i]=ufunc.reduce(a[inds[i]:inds[i+1]]) for all i but the last

So len(out) == len(inds) - 1? This is different to the current behaviour of reduceat, so @shoyer's argument about switching is stronger here


All: I've gone through earlier comments and removed quoted email replies, as they were making this discussion hard to read

@divenex
Copy link

divenex commented Apr 19, 2017

@eric-wieser good point. In my above sentence I meant that for the last index the behaviour of reducebins would be different as in the current reduceat. However, in that case, I am not sure what the value should be, as the last value formally does not make sense.

Ignoring compatibility concerns, the output of reducebins (in 1D) should have size inds.size-1, for the very same reason that np.diff(a) has size a.size-1 and np.histogram(a, bins) has size bins.size-1 . However this would go against the desire to have a drop-in replacement for reduceat.

@eric-wieser
Copy link
Member

eric-wieser commented Apr 19, 2017

I don't think there's a convincing argument that a.size-1 is the right answer - including index 0 and/or index n seems like pretty reasonable behaviour as well. All of them seem handy in some circumstances, but I think it is very important to have a drop in replacement.

There's also another argument for stop/start hiding here - it allows you to build the diff-like behaviour if you want it, with very little cost, while still keeping the reduceat behaviour:

a = np.arange(10)
inds = [2, 4, 6]
reduce_bins(a, start=inds[:-1], stop=inds[1:])  #  [2 + 3, 4 + 5]

# or less efficiently:
reduce_at(a, inds)[:-1}
reduce_bins(a, start=inds)[:-1]
reduce_bins(a, stop=inds)[1:]

@shoyer
Copy link
Member

shoyer commented Apr 19, 2017

@eric-wieser I would be OK with required start and stop arguments, but I do not like making one of them optional. It is not obvious that providing only start means out[i] = func.reduce(arr[start[i]:start[i+1]]) rather than out[i] = func.reduce(arr[start[i]:]), which is what I would have guessed.

My preferred API for reducebins is like reduceat but without the confusing "exceptions" noted in the docstring. Namely, just:

For i in range(len(indices)), reduceat computes ufunc.reduce(a[indices[i]:indices[i+1]]), which becomes the i-th generalized “row” parallel to axis in the final result (i.e., in a 2-D array, for example, if axis = 0, it becomes the i-th row, but if axis = 1, it becomes the i-th column).

I could go either way on the third "exception" which requires non-negative indices (0 <= indices[i] <= a.shape[axis]), which I view as more of a sanity check rather than an exception. But possibly that one could go, too -- I can see how negative indices might be useful to someone, and it's not hard to do the math to normalize such indices.

Not automatically adding an index at the end does imply that the result should have length len(a)-1, like the result of np.histogram.

@jni Can you please give an example of what you actually want to calculate from arrays found in sparse matrices? Preferably with a concrete example with non-random numbers, and self contained (without depending on scipy.sparse).

@eric-wieser
Copy link
Member

eric-wieser commented Apr 19, 2017

It is not obvious that providing only start means out[i] = func.reduce(arr[start[i]:start[i+1]]) rather than out[i] = func.reduce(arr[start[i]:]), which is what I would have guessed.

The reading I was going for is that "Each bin starts at these positions", with the implication that all bins are contiguous unless explicitly specified otherwise. Perhaps I should try and draft a more complete docstring. I think I can see a strong argument for forbidding passing neither argument, so I'll remove that from my propose function.

which requires non-negative indices (0 <= indices[i] < a.shape[axis])

Note that there's also a bug here (#835) - the upper bound should be inclusive, since these are slices.

@shoyer
Copy link
Member

shoyer commented Apr 19, 2017

Note that there's also a bug here - the upper bound should be inclusive, since these are slices.

Fixed, thanks.

@eric-wieser
Copy link
Member

Not in the reduceat function itself, you haven't ;)

eric-wieser added a commit to eric-wieser/numpy that referenced this issue Oct 3, 2017
There didn't seem to be any value to a `assign_identity` function - all we
actually care about is the value to assign.

This also fixes numpy#8860 as a side-effect, and paves the way for:

* easily adding more values (numpy#7702)
* using the identity in more places (numpy#834)
@eric-wieser
Copy link
Member

Turns out that :\doc\neps\groupby_additions.rst contains an (IMO inferior) proposal for a reduceby function.

@martinling
Copy link

Is this something that could be fixed in the upcoming 2.0 release?

@mhvk mhvk linked a pull request Dec 22, 2023 that will close this issue
@mhvk
Copy link
Contributor

mhvk commented Dec 22, 2023

Triggered by this issue coming alive again, I wrote #25476 to see how hard it would be to allow passing in a 2-D array with start, stop values. Not too hard, it turns out. But API to be decided -- best discussed at #25476, probably!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.