ENH: Implement initial kwarg for ufunc.add.reduce #10635

hameerabbasi · 2018-02-20T16:01:37Z

As discussed in #5032: Adds an initial kwarg for ufunc.reduce, plus changes in min, max, sum, prod.

This acts as the first item in a reduction. It's useful for empty reductions e.g. np.min([], initial=np.inf) reduces to inf. It's also useful for starting the reduction with a different value e.g. np.sum([10], initial=5) reduces to 15. (5 used as starting value rather than the identity 0).

Explicitly passing the initial value as None causes empty reductions to err.

Explicitly passing np._NoValue also keeps the default behaviour.

min, max, sum and prod only pass on the initial kwarg.

PyUFunc_Reduce is modified to accept the initial kwarg. It then replaces the identity by initial.

Changes are made to normalize_reduce_args so that it accepts and parses initial correctly.

Documentation has been added for all the changes.

Fixes #5032

hameerabbasi · 2018-02-20T16:21:36Z

@eric-wieser Looking forward to your feedback. 😄

hameerabbasi · 2018-02-20T18:14:40Z

Hmm. I have no idea why the CircleCI build failed, I didn't touch Cython. :/

eric-wieser · 2018-02-20T17:03:05Z

numpy/core/fromnumeric.py

+    identity : scalar, optional
+        The value to reduce to for an empty axis. Raises a ``ValueError``
+        by default.
+


Needs a version added

eric-wieser · 2018-02-20T18:28:10Z

numpy/core/src/umath/ufunc_object.c

-    identity = _get_identity(ufunc, &reorderable);
+    if (identity == NULL || identity == Py_None) {
+        identity = _get_identity(ufunc, &reorderable);
+    }


You need an else { Py_INCREF(identity); }, so that the decref at the end is safe.

I don't think so, it only checks for None and NULL. The object is getting passed through.

identity is gotten from the PyArg_ParseTupleAndKeywords, it is passed through directly from there, does this do the Py_INCREF call automatically or not?

Edit: Looks like you're right, from the docs:

Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!

And the other half of the puzzle is that _get_identity does increment the reference count - so you want to make them equivalent so the cleanup doesn't need special casing.

eric-wieser · 2018-02-20T18:29:36Z

It looks like you built this patch on an out-of-date master from before we had CircleCI working. If you rebase, the CircleCI error will go away.

eric-wieser · 2018-02-20T21:27:16Z

This is failing because we reparse the arguments for forwarding to __array_ufunc__ - but I don't remember where.

ahaldane · 2018-02-20T21:31:13Z

Somewhat tangent comment: How would this fit in with the dream of turning reduce into a gufunc (related to discussions in #8528 #8819)?

Currently gufuncs have a fixed set of common keyword arguments, axis, dtype, casting, extobj, where and so on. Should individual gufuncs be allowed to have additional args besides the default ones, as identity would be?

Or does the fact that reduce will have an extra identity argument mean that it cannot be implemented as a gufunc?

Edit: Related: #8773

ahaldane · 2018-02-20T22:52:59Z

Maybe the way to think of it is that a reduce_ufunc(a, out=b) gufunc treats its "out" parameter also as an input. So if you supply an out, that initial value gets read and added on to.

So maybe ultimately, we would have both a reduce-gufunc, and a reduce method. The reduce method's job is to properly set up the out parameter for the reduce-gufunc. The method would have an "identity" argument, while the ufunc itself would read that value from the out arg.

Sorry for the distraction, we can think about this later outside this PR.

charris · 2018-02-21T01:43:59Z

The explanation of these changes could be much more complete. And please explain it for all the functions changed.

eric-wieser · 2018-02-21T01:55:51Z

So, there are two things that worry me here:

The interface is different to the builtin max(..., default=0), even though it's roughly the same
ufunc.reduce(identity=...) only really makes sense for maximum. minimum, fmax, and fmin, where choosing a different domain lets you choose a different identity. For other ufuncs, it produces crazy results (I think np.add.reduce(np.ones(2,2), identity=10) will give a surprising result, but I haven't checked).
- We could add a marker indicating that the identity is configurable by kwarg, alongside the current allowable identity values - and throw an error when the kwarg is passed for a ufunc without that marker
- For object arrays, having an identity makes sense for lots of ufuncs - consider np.add.reduce(["hello", "world"], dtype=object, identity="")

hameerabbasi · 2018-02-21T06:54:55Z

The explanation of these changes could be much more complete. And please explain it for all the functions changed.

I'll add more complete docs with examples, if that's what you mean, incorporating @eric-wieser's excellent use cases and insights. :-)

The interface is different to the builtin max(..., default=0), even though it's roughly the same.

Yes, I thought of that as well, however, plenty of kwargs here are different (axis, keepdims, etc.). I'll go ahead and assume that we're not following Python's interface here, however, this can still be changed.

However, if it WAS to be changed, I'd say it's more similar to functools.reduce's initializer argument, which is much more instructive. This is also in line with your second worry:

For other ufuncs, it produces crazy results (I think np.add.reduce(np.ones(2,2), identity=10) will give a surprising result, but I haven't checked).

>>> np.add.reduce(np.ones((2, 2)), identity=10)
array([12., 12.])

The name initializer instead of identity is much more instructive and takes care of this worry.

So initializer probably makes more sense here. We could add a warning in the docs that non-default initializers could lead to unexpected results.

eric-wieser · 2018-02-21T06:57:12Z

will give a surprising result, but I haven't checked

There's a more surprising result than that one to be found, I think. Maybe np.add.reduce(np.ones((2, 2, 2)), axis=(0, 1), identity=10)?

hameerabbasi · 2018-02-21T06:58:37Z

It gives a quite reasonable result, actually.

>>> np.add.reduce(np.ones((2, 2, 2)), axis=(0, 1), identity=10)
array([14., 14.])

eric-wieser · 2018-02-21T06:59:21Z

The name initializer instead of identity is much more instructive and takes care of this worry.

That's maybe reasonable. Perhaps ufunc.reduce should take an initializer argument, but the np.min shorthand can alias it to default?

We'll definitely need to run this by the list before this gets merged, but may as well brainstorm here first

eric-wieser · 2018-02-21T06:59:57Z

Non-adjacent axes? axis=(0, 2)?

hameerabbasi · 2018-02-21T07:06:47Z

Non-adjacent axes? (0, 2)?

Still works beautifully.

That's maybe reasonable. Perhaps ufunc.reduce should take an initializer argument, but the np.min shorthand can alias it to default?

The slight problem I see with this is something like the following:

>>> np.minimum.reduce([0.0], identity=-np.inf)
-inf

As you can see, it gives a value not in the original set. I guess we could call it default but then there are two issues with this:

The one I mentioned above;
We're deviating from ufunc.reduce norms, which is a bad thing. It doesn't keep uniformity.
Problems when generalizing to other reductions.

eric-wieser · 2018-02-21T07:17:25Z

Still works beautifully.

Huh, I think my memory of the code is incorrect. I thought that multi-axis reduce fell back on repeated single-axis reduce, and so was expecting to get 24 as the result. It seems that's not the case, which is great!

As you can see, it gives a value not in the original set.

I'm now on board with the argument being called initializer, which makes that far less confusing.

The issue in that case is that the caller chose to pass in an initializer that was not the minimum of the set of allowed values. That looks perfectly fine to me.

eric-wieser · 2018-02-21T07:18:26Z

CI is failing because core/src/umath/override.c@normalize_reduce_args is missing this argument.

hameerabbasi · 2018-02-21T07:40:08Z

The issue in that case is that the caller chose to pass in an initializer that was not the minimum of the set of allowed values. That looks perfectly fine to me.

Exactly. We can document a use-case as being reduction of an empty axis.

On that note, I'm going to work and might not be available for the next hours. If you want to fix that on my branch, feel free.

hameerabbasi · 2018-02-21T08:08:59Z

On another note, if we're renaming this to initializer, it would make sense to add it to accumulate and reduceat. Is that best done in another PR or this one?

Edit: I'm willing to take on the task of making reduceat use identity.

eric-wieser · 2018-02-21T08:23:31Z

reduceat is best untouched - there's an issue about writing a replacement somewhere.

Accumulate would be a better fit for another PR, especially since the use case is less strong.

eric-wieser · 2018-02-22T07:44:14Z

Edit: I'm willing to take on the task of making reduceat use identity.

#834 should get you started on that rabbit hole. Lets try and wrap up this PR first though!

eric-wieser · 2018-02-22T21:08:03Z

numpy/add_newdocs.py

+
+    You can use the initializer argument to initialize the reduction to a
+    different value. It can be used to calculate empty reductions for
+    ufuncs that don't support it as well.


Move this second sentence to just above np.minimum.reduce so it's clear what it refers to

eric-wieser · 2018-02-22T21:09:20Z

numpy/core/fromnumeric.py

@@ -1827,6 +1827,11 @@ def sum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
        `ndarray`, however any non-default value will be.  If the
        sub-class' method does not implement `keepdims` any
        exceptions will be raised.
+    initializer : scalar, optional
+        The initializer for this maximum. Can also be used to compute the
+        minimum of an empty slice.


max? min? This is sum

I'd argue that this isn't useful enough to include in the public API here, or in any of these functions. It's niche enough that we probably want the user calling ufunc.reduce directly

Hmm. I went to a lot of effort to put it in. :-( Anyway, I'd argue it should stay for the NaN-skipping functions because there's no alternative there.

Additionally, it's useful for specifying the value of an All-NaN slice reduction (although I guess you could just do this):

x_max = np.nanmax(x, ...) x[np.isnan(x)] = value

Here's what I propose:

np.some_ufunc.reduce(..., initializer=...)

np.sum(...) - not useful enough to expose - force people to use the more explicit API above

np.prod(...)

~~np.min(..., default=...) - forward to initializer, match the py3 API~~

~~np.nanmin(..., default=...) as above~~

np.min(..., initializer=...)

np.nanmin(..., initializer=...)

I take it back, default and initializer mean different things. max([0], default=10) is different to np.max([0], initializer=10)

Exactly, as in the discussion above. But to keep uniformity, we should probably either keep it or remove it everywhere. I prefer users should not have to remember/recall which reductions had this and which didn't, they should just instinctively go for either the reduction or ufunc.reduce.

I guess to cater for the nanmax, etc. case, we could ask users to reshape, add an extra slice, then undo, but that would be excessive.

nanmax absolutely should have an initializer argument, since it needs it to deal with all-nan slices of object dtypes. I'm just not sure they're useful on sum and prod, but maybe leave them in for now, and consult the list.

Either way, my initial remark here is pointing out that this docstring is copied-and-pasted incorrectly

That's fixed. I see you noticed it was a copypasta.

eric-wieser · 2018-02-22T21:10:14Z

numpy/core/src/umath/ufunc_object.c

@@ -3275,7 +3284,7 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind,
                            op_axes_arrays[2]};
    npy_uint32 op_flags[3];
    int i, idim, ndim, otype_final;
-    int need_outer_iterator;
+    int need_outer_iterator = 0;


Why this change?

There was a warning in GCC, I wanted to get rid of it.

Are the warnings/errors because of this change? Some C++ magic going on? Ideally, adding an intitializer shouldn't change anything.

The warnings are showing up because there's a failure elsewhere - you're probably just not scrolling down far enough to see the error. I'd probably revert this change, and leave it for a future warning cleanup.

mhvk · 2018-04-11T15:10:17Z

I think what you, the behaviour for objects is close enough to what the final state should be than what would be the case without those extra commits. So, I would suggest to merge and do a follow-up PR on the details.

hameerabbasi · 2018-04-11T15:12:56Z

+1 for finally merging.

seberg · 2018-04-12T06:05:47Z

Well, I don't know and I do not want to think about it. Close enough for me is close enough, if assuming that someone will use the current behaviour, we can still get to a reasonable final state without problems at some point. I hate adding the next kludge that future us gets annoyed about.

Right now I am a bit confused what you are suggesting as final state for the object support....

seberg · 2018-04-12T06:27:54Z

That said, None as the "Error out" value, is maybe good even for object. Not ideal in some sense but the other solutions seem not much nicer. Plus, I am not sure it would even manoeuvre into a tricky to resolve issue for the future.

hameerabbasi · 2018-04-12T06:35:31Z

@seberg Right now, the issue is that if you pass initial, it will always include it in the list even for objects. That could break backwards compatibility, as for some functions in _methods.py, we do pass it. So objects will get 0 in the list when they try to do np.sum, for example.

Right now, having np._NoValue or None as an "ignore me" is what we want.

eric-wieser · 2018-04-12T06:47:30Z

I think the cause of our problems is how the .identity is defined, and we'd do better to free ourselves from that constraint. What I'd propose then is:

Use .reduce(initial=np._NoValue) to indicate "no identity"
Make .reduce(obj_array, initial=some_value) use the "initial" logic, not the "default" logic
Allow initial=None to be passed like any other value
Remove the initial=ufunc.identity kwarg defaults, and just remark in the documentation of initial that "If the ufunc has no identity or the dtype is object, this defaults to None - otherwise it defaults to ufunc.identity"

hameerabbasi · 2018-04-12T07:05:01Z

@eric-wieser Would you happen to know how I'd access np._NoValue inside the C code for this check? Doesn't seem terribly hard to do. I'd also need to do the same in override.c.

eric-wieser · 2018-04-12T07:24:25Z

Something like

#include "npy_import.h"

static PyObject *NoValue = NULL;
npy_cache_import("numpy", "_NoValue", &NoValue);

ought to work

eric-wieser · 2018-04-12T08:25:23Z

numpy/core/src/umath/override.c

@@ -152,6 +155,9 @@ normalize_reduce_args(PyUFuncObject *ufunc, PyObject *args,
            }
            obj = PyTuple_GetSlice(args, 3, 4);
        }
+        if (i == 5 && obj == NoValue) {
+            continue;


A comment saying this is the initial argument would be good.

Sure. Any idea why the Windows build is failing?

eric-wieser · 2018-04-12T08:33:30Z

Penultimate commit message is very wrong!

eric-wieser · 2018-04-12T08:35:28Z

numpy/core/src/umath/ufunc_object.c

@@ -3036,6 +3036,8 @@ PyUFunc_Reduce(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out,
    const char *ufunc_name = ufunc_get_name_cstr(ufunc);
    /* These parameters come from a TLS global */
    int buffersize = 0, errormask = 0;
+    static PyObject *NoValue = NULL;
+    npy_cache_import("numpy", "_NoValue", &NoValue);


Ah, this should actually be if (... < 0) { return NULL; } (wrapped as normal), in case the import fails

npy_cache_import has a return type of void, so we can't do that.

Ah, in that case, I mean you should check that NoValue != NULL

eric-wieser · 2018-04-12T08:36:04Z

numpy/core/src/umath/override.c

@@ -137,6 +137,9 @@ normalize_reduce_args(PyUFuncObject *ufunc, PyObject *args,
        return -1;
    }

+    static PyObject *NoValue = NULL;


C90 requires this be declared at the top of the function with the other variables (which is why the test fails)

hameerabbasi · 2018-04-12T11:08:13Z

I think this now has the desired behavior for all cases. The np.NoIdentity thing is a matter for a separate PR, this PR keeps the convention that None means "no initial value" and np._NoValue means "ignore initial completely".

hameerabbasi · 2018-04-17T08:18:48Z

Any more changes required here? If not, is this good to merge? cc @eric-wieser @mhvk @seberg

seberg · 2018-04-18T17:08:10Z

Won't look through the code, at least not now. It seems it still needs release notes I guess? So I understand correctly it is now:

Initial value for the reduction for all (almost) cases, including object
None will cause an error (even for object)
default is NoValue and will cause old behaviour.

Seems good to me as a change.

hameerabbasi · 2018-04-18T17:38:35Z

That's correct:

initial behaves as an initial value in all cases now, even for dtype=object. Previously, it would behave as the default value for dtype=object and initial value otherwise.
None means "no initial value, err if empty, otherwise use the start of the array".
np._NoValue causes the old behavior.

I'll add release notes now.

hameerabbasi · 2018-04-24T08:57:09Z

Rebased and tests added for dtype=object. Tell me when merging is imminent so I can squash all the commits.

mhvk · 2018-04-24T13:38:05Z

@hameerabbasi - I'd gladly merge this (maybe after one more day for last comments), so do go ahead and squash!

hameerabbasi · 2018-04-26T14:02:46Z

@mhvk Ping. 😀

mhvk · 2018-04-26T15:35:13Z

Merged. Thanks, @hameerabbasi!

hameerabbasi · 2018-04-26T16:05:39Z

Thanks so much to @eric-wieser and @mhvk especially for babying me through my first big PR. I owe you!

hameerabbasi changed the title ~~Implement identity= for reduce + docs + tests.~~ Implement identity kwarg for ufunc.add.reduce Feb 20, 2018

eric-wieser reviewed Feb 20, 2018

View reviewed changes

hameerabbasi force-pushed the ufunc-reduce-identity branch from 6e24504 to 609f9b1 Compare February 20, 2018 20:16

charris changed the title ~~Implement identity kwarg for ufunc.add.reduce~~ ENH: Implement identity kwarg for ufunc.add.reduce Feb 21, 2018

charris added 01 - Enhancement component: numpy._core labels Feb 21, 2018

hameerabbasi changed the title ~~ENH: Implement identity kwarg for ufunc.add.reduce~~ ENH: Implement initializer kwarg for ufunc.add.reduce Feb 22, 2018

eric-wieser reviewed Feb 22, 2018

View reviewed changes

eric-wieser reviewed Apr 12, 2018

View reviewed changes

hameerabbasi force-pushed the ufunc-reduce-identity branch from 1b5bed3 to 0fb251e Compare April 12, 2018 08:49

hameerabbasi force-pushed the ufunc-reduce-identity branch 2 times, most recently from 7aca91b to 24e185a Compare April 24, 2018 08:54

hameerabbasi force-pushed the ufunc-reduce-identity branch 2 times, most recently from b3b7289 to 06d39e0 Compare April 24, 2018 09:35

ENH: Implement initial kwarg for ufunc.add.reduce

6b728d1

hameerabbasi force-pushed the ufunc-reduce-identity branch from 06d39e0 to 6b728d1 Compare April 24, 2018 13:44

mhvk merged commit 4ea36ff into numpy:master Apr 26, 2018

mhvk mentioned this pull request Jan 2, 2019

ENH: where for ufunc reductions #12644

Merged

ENH: Implement initial kwarg for ufunc.add.reduce #10635

ENH: Implement initial kwarg for ufunc.add.reduce #10635

Conversation

hameerabbasi commented Feb 20, 2018 • edited

hameerabbasi commented Feb 20, 2018

hameerabbasi commented Feb 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hameerabbasi Feb 20, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser commented Feb 20, 2018

eric-wieser commented Feb 20, 2018 • edited

ahaldane commented Feb 20, 2018 • edited

ahaldane commented Feb 20, 2018

charris commented Feb 21, 2018

eric-wieser commented Feb 21, 2018 • edited

hameerabbasi commented Feb 21, 2018

eric-wieser commented Feb 21, 2018

hameerabbasi commented Feb 21, 2018

eric-wieser commented Feb 21, 2018 • edited

eric-wieser commented Feb 21, 2018 • edited

hameerabbasi commented Feb 21, 2018 • edited

eric-wieser commented Feb 21, 2018

eric-wieser commented Feb 21, 2018

hameerabbasi commented Feb 21, 2018

hameerabbasi commented Feb 21, 2018 • edited

eric-wieser commented Feb 21, 2018

eric-wieser commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Feb 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk commented Apr 11, 2018

hameerabbasi commented Apr 11, 2018

seberg commented Apr 12, 2018

seberg commented Apr 12, 2018

hameerabbasi commented Apr 12, 2018

eric-wieser commented Apr 12, 2018 • edited

hameerabbasi commented Apr 12, 2018 • edited

eric-wieser commented Apr 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser commented Apr 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hameerabbasi commented Apr 12, 2018 • edited

hameerabbasi commented Apr 17, 2018

seberg commented Apr 18, 2018

hameerabbasi commented Apr 18, 2018

hameerabbasi commented Apr 24, 2018

mhvk commented Apr 24, 2018

hameerabbasi commented Apr 26, 2018

mhvk commented Apr 26, 2018

hameerabbasi commented Apr 26, 2018

hameerabbasi commented Feb 20, 2018 •

edited

hameerabbasi Feb 20, 2018 •

edited

eric-wieser commented Feb 20, 2018 •

edited

ahaldane commented Feb 20, 2018 •

edited

eric-wieser commented Feb 21, 2018 •

edited

eric-wieser commented Feb 21, 2018 •

edited

eric-wieser commented Feb 21, 2018 •

edited

hameerabbasi commented Feb 21, 2018 •

edited

hameerabbasi commented Feb 21, 2018 •

edited

eric-wieser Feb 22, 2018 •

edited

eric-wieser commented Apr 12, 2018 •

edited

hameerabbasi commented Apr 12, 2018 •

edited

eric-wieser commented Apr 12, 2018 •

edited

hameerabbasi commented Apr 12, 2018 •

edited