New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Implement initial kwarg for ufunc.add.reduce #10635
Conversation
@eric-wieser Looking forward to your feedback. 😄 |
Hmm. I have no idea why the CircleCI build failed, I didn't touch Cython. :/ |
numpy/core/fromnumeric.py
Outdated
identity : scalar, optional | ||
The value to reduce to for an empty axis. Raises a ``ValueError`` | ||
by default. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a version added
numpy/core/src/umath/ufunc_object.c
Outdated
identity = _get_identity(ufunc, &reorderable); | ||
if (identity == NULL || identity == Py_None) { | ||
identity = _get_identity(ufunc, &reorderable); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need an else { Py_INCREF(identity); }
, so that the decref at the end is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, it only checks for None
and NULL
. The object is getting passed through.
identity
is gotten from the PyArg_ParseTupleAndKeywords
, it is passed through directly from there, does this do the Py_INCREF
call automatically or not?
Edit: Looks like you're right, from the docs:
Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the other half of the puzzle is that _get_identity
does increment the reference count - so you want to make them equivalent so the cleanup doesn't need special casing.
It looks like you built this patch on an out-of-date master from before we had CircleCI working. If you rebase, the CircleCI error will go away. |
6e24504
to
609f9b1
Compare
This is failing because we reparse the arguments for forwarding to |
Somewhat tangent comment: How would this fit in with the dream of turning Currently gufuncs have a fixed set of common keyword arguments, Or does the fact that reduce will have an extra Edit: Related: #8773 |
Maybe the way to think of it is that a So maybe ultimately, we would have both a reduce-gufunc, and a reduce method. The reduce method's job is to properly set up the out parameter for the reduce-gufunc. The method would have an "identity" argument, while the ufunc itself would read that value from the out arg. Sorry for the distraction, we can think about this later outside this PR. |
The explanation of these changes could be much more complete. And please explain it for all the functions changed. |
So, there are two things that worry me here:
|
I'll add more complete docs with examples, if that's what you mean, incorporating @eric-wieser's excellent use cases and insights. :-)
Yes, I thought of that as well, however, plenty of kwargs here are different ( However, if it WAS to be changed, I'd say it's more similar to
>>> np.add.reduce(np.ones((2, 2)), identity=10)
array([12., 12.]) The name So |
There's a more surprising result than that one to be found, I think. Maybe |
It gives a quite reasonable result, actually. >>> np.add.reduce(np.ones((2, 2, 2)), axis=(0, 1), identity=10)
array([14., 14.]) |
That's maybe reasonable. Perhaps We'll definitely need to run this by the list before this gets merged, but may as well brainstorm here first |
Non-adjacent axes? |
Still works beautifully.
The slight problem I see with this is something like the following: >>> np.minimum.reduce([0.0], identity=-np.inf)
-inf As you can see, it gives a value not in the original set. I guess we could call it
|
Huh, I think my memory of the code is incorrect. I thought that multi-axis
I'm now on board with the argument being called The issue in that case is that the caller chose to pass in an initializer that was not the minimum of the set of allowed values. That looks perfectly fine to me. |
CI is failing because |
Exactly. We can document a use-case as being reduction of an empty axis. On that note, I'm going to work and might not be available for the next hours. If you want to fix that on my branch, feel free. |
On another note, if we're renaming this to Edit: I'm willing to take on the task of making |
Accumulate would be a better fit for another PR, especially since the use case is less strong. |
#834 should get you started on that rabbit hole. Lets try and wrap up this PR first though! |
numpy/add_newdocs.py
Outdated
|
||
You can use the initializer argument to initialize the reduction to a | ||
different value. It can be used to calculate empty reductions for | ||
ufuncs that don't support it as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this second sentence to just above np.minimum.reduce
so it's clear what it refers to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
numpy/core/fromnumeric.py
Outdated
@@ -1827,6 +1827,11 @@ def sum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue): | |||
`ndarray`, however any non-default value will be. If the | |||
sub-class' method does not implement `keepdims` any | |||
exceptions will be raised. | |||
initializer : scalar, optional | |||
The initializer for this maximum. Can also be used to compute the | |||
minimum of an empty slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max
? min
? This is sum
I'd argue that this isn't useful enough to include in the public API here, or in any of these functions. It's niche enough that we probably want the user calling ufunc.reduce
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I went to a lot of effort to put it in. :-( Anyway, I'd argue it should stay for the NaN-skipping functions because there's no alternative there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, it's useful for specifying the value of an All-NaN slice reduction (although I guess you could just do this):
x_max = np.nanmax(x, ...)
x[np.isnan(x)] = value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I propose:
np.some_ufunc.reduce(..., initializer=...)
np.sum(...)
- not useful enough to expose - force people to use the more explicit API abovenp.prod(...)
np.min(..., default=...)
- forward to initializer, match the py3 APInp.nanmin(..., default=...)
as abovenp.min(..., initializer=...)
np.nanmin(..., initializer=...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it back, default
and initializer
mean different things. max([0], default=10)
is different to np.max([0], initializer=10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, as in the discussion above. But to keep uniformity, we should probably either keep it or remove it everywhere. I prefer users should not have to remember/recall which reductions had this and which didn't, they should just instinctively go for either the reduction or ufunc.reduce
.
I guess to cater for the nanmax
, etc. case, we could ask users to reshape, add an extra slice, then undo, but that would be excessive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nanmax absolutely should have an initializer
argument, since it needs it to deal with all-nan slices of object dtypes. I'm just not sure they're useful on sum
and prod
, but maybe leave them in for now, and consult the list.
Either way, my initial remark here is pointing out that this docstring is copied-and-pasted incorrectly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fixed. I see you noticed it was a copypasta.
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -3275,7 +3284,7 @@ PyUFunc_Reduceat(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *ind, | |||
op_axes_arrays[2]}; | |||
npy_uint32 op_flags[3]; | |||
int i, idim, ndim, otype_final; | |||
int need_outer_iterator; | |||
int need_outer_iterator = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a warning in GCC, I wanted to get rid of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the warnings/errors because of this change? Some C++ magic going on? Ideally, adding an intitializer shouldn't change anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warnings are showing up because there's a failure elsewhere - you're probably just not scrolling down far enough to see the error. I'd probably revert this change, and leave it for a future warning cleanup.
I think what you, the behaviour for objects is close enough to what the final state should be than what would be the case without those extra commits. So, I would suggest to merge and do a follow-up PR on the details. |
+1 for finally merging. |
Well, I don't know and I do not want to think about it. Close enough for me is close enough, if assuming that someone will use the current behaviour, we can still get to a reasonable final state without problems at some point. I hate adding the next kludge that future us gets annoyed about. Right now I am a bit confused what you are suggesting as final state for the object support.... |
That said, None as the "Error out" value, is maybe good even for object. Not ideal in some sense but the other solutions seem not much nicer. Plus, I am not sure it would even manoeuvre into a tricky to resolve issue for the future. |
@seberg Right now, the issue is that if you pass Right now, having |
I think the cause of our problems is how the
|
@eric-wieser Would you happen to know how I'd access |
Something like #include "npy_import.h" static PyObject *NoValue = NULL;
npy_cache_import("numpy", "_NoValue", &NoValue); ought to work |
numpy/core/src/umath/override.c
Outdated
@@ -152,6 +155,9 @@ normalize_reduce_args(PyUFuncObject *ufunc, PyObject *args, | |||
} | |||
obj = PyTuple_GetSlice(args, 3, 4); | |||
} | |||
if (i == 5 && obj == NoValue) { | |||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment saying this is the initial
argument would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Any idea why the Windows build is failing?
Penultimate commit message is very wrong! |
numpy/core/src/umath/ufunc_object.c
Outdated
@@ -3036,6 +3036,8 @@ PyUFunc_Reduce(PyUFuncObject *ufunc, PyArrayObject *arr, PyArrayObject *out, | |||
const char *ufunc_name = ufunc_get_name_cstr(ufunc); | |||
/* These parameters come from a TLS global */ | |||
int buffersize = 0, errormask = 0; | |||
static PyObject *NoValue = NULL; | |||
npy_cache_import("numpy", "_NoValue", &NoValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this should actually be if (... < 0) { return NULL; }
(wrapped as normal), in case the import fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npy_cache_import
has a return type of void
, so we can't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, in that case, I mean you should check that NoValue != NULL
numpy/core/src/umath/override.c
Outdated
@@ -137,6 +137,9 @@ normalize_reduce_args(PyUFuncObject *ufunc, PyObject *args, | |||
return -1; | |||
} | |||
|
|||
static PyObject *NoValue = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C90 requires this be declared at the top of the function with the other variables (which is why the test fails)
1b5bed3
to
0fb251e
Compare
I think this now has the desired behavior for all cases. The |
Any more changes required here? If not, is this good to merge? cc @eric-wieser @mhvk @seberg |
Won't look through the code, at least not now. It seems it still needs release notes I guess? So I understand correctly it is now:
Seems good to me as a change. |
That's correct:
I'll add release notes now. |
7aca91b
to
24e185a
Compare
Rebased and tests added for |
b3b7289
to
06d39e0
Compare
@hameerabbasi - I'd gladly merge this (maybe after one more day for last comments), so do go ahead and squash! |
06d39e0
to
6b728d1
Compare
@mhvk Ping. 😀 |
Merged. Thanks, @hameerabbasi! |
Thanks so much to @eric-wieser and @mhvk especially for babying me through my first big PR. I owe you! |
As discussed in #5032: Adds an
initial
kwarg forufunc.reduce
, plus changes inmin
,max
,sum
,prod
.This acts as the first item in a reduction. It's useful for empty reductions e.g.
np.min([], initial=np.inf)
reduces toinf
. It's also useful for starting the reduction with a different value e.g.np.sum([10], initial=5)
reduces to15
. (5
used as starting value rather than the identity0
).Explicitly passing the initial value as
None
causes empty reductions to err.Explicitly passing
np._NoValue
also keeps the default behaviour.min
,max
,sum
andprod
only pass on theinitial
kwarg.PyUFunc_Reduce
is modified to accept theinitial
kwarg. It then replaces the identity byinitial
.Changes are made to
normalize_reduce_args
so that it accepts and parsesinitial
correctly.Documentation has been added for all the changes.
Fixes #5032