gufunc to test for all elements equal along axis #8513

mattharrigan · 2017-01-21T12:29:00Z

This is a follow up to this discussion. In certain use cases very large performance gains can be obtained with a gufunc of signature (i),(i)-.() to check if all elements along that dimension are equal. For large arrays which have any early non equal element, its dramatically faster (1000x) than the current alternative. For large arrays which are all equal, its ~10% faster due to eliminating the intermediate boolean array. For tiny arrays its much faster due to a single function call instead of at least two, but its debatable how relevant speed is for tiny problems.

Initial code is here. It includes other functions in this same family, such as all or any, and eq, ne, lt, le, gt, ge, for multiple dtypes.

On this discussion thread, there we no +1 or negative responses. I'm not sure what that means.

Assuming the functionality is desired for inclusion in numpy, then I would like advice on where this code should go.

shoyer · 2017-01-21T19:20:07Z

I think this is great. Checking arrays for equality is quite common, especially in test suites.

I'm not entirely sure it's worth adding all the shortcut operations. At a certain point it's fine to encourage users to write their own inner loops for fusing together fundamental operations (e.g., with Cython or Numba), or we would face a combinatorial explosion of numpy functions.

mattharrigan · 2017-01-21T20:10:49Z

Are you saying you support the "all_eq" function but not the family of related functions such as "any_lt"?

My best argument for including them all is that they fundamentally reduce the computational order (big O) of the function for certain inputs. For the any_** functions, if the first element is true, then all subsequent elements do not have to be checked. This means the function with that input would take similar time regardless of the array was size 2 or 1e8. It could result in 1000x improvement. The other fusing operations I can think of don't fundamentally change the big O, instead reduce overhead and optimize cache/memory. An example would be a ufunc to return A*B+C. Although a fairly common operation, its probably not worth including in numpy.

That being said, I agree the benefit of adding them all is debatable and there is definitely a cost to adding many new functions.

charris · 2017-01-21T21:50:06Z

This sounds like something that might could be optimized. @juliantaylor Thoughts?

shoyer · 2017-01-21T21:56:42Z

Are you saying you support the "all_eq" function but not the family of related functions such as "any_lt"?

Yeah, I'm not entirely sure. On the other hand, there are only 6 * 2 such functions in total, and the naming makes it pretty obvious exactly what they do, so it doesn't add too much of a conceptual burden.

If we do add these, I would definitely use full names for the operations (all_equal/all_not_equal/all_greater/all_greater_equal rather than all_eq/all_ne/all_gt/all_ge).

juliantaylor · 2017-01-21T22:14:21Z

our boolean functions are already pretty much as good as the get (on x86), the issue is that our ufunc + iterator implementation does not short circut right?
That is a problem since 1.7 see gh-3446. It can be worked around by increasing the buffer size of the nditer.
That would be very nice to fix, but probably relatively involved as it affects the nditer and the ufunc code.

Combining operations is something that comes up often and would be very valuable, there are some prototypes adding a couple, but none were completed or merged due to lack of time. Having them is probably also not that valuable as they would only help new code and in cases harm readability (like multiply and add)
Ideally we would have a way to do it automatically either via deferred ufuncs or with python3.6 we have another option of swapping the interpreter loop with something that does lite numba like operations.
Note combined comparison + reduction is a relatively minor improvement as the comparison results in a boolean array which is 4-8 times smaller and thus much faster to process. The gain will not be above 10-20%.

juliantaylor · 2017-01-21T22:26:36Z

Oh right, if we had short circuiting you could skip the comparison of the full array too, that would be valuable.

mattharrigan · 2017-01-21T22:47:08Z

That short circuit in the loop which compares the original arrays is the reason I wrote this.

…

On Jan 21, 2017 5:26 PM, "Julian Taylor" ***@***.***> wrote: Oh right, if we had short circuiting you could skip the comparison of the full array too, that would be valuable. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8513 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJ2jissTW8vKu_DoFTXqlczC8GhrySV6ks5rUoYggaJpZM4LqDjz> .

mhvk · 2017-01-21T23:20:36Z

I like this very much and would certainly use the equal and not_equal versions. From your code, it seems like the others just follow without any real further effort, so I think it makes sense to include them.

Off-topic: @juliantaylor - I'd be very happy myself if every ufunc had not just a where but also a factor optional argument... But I agree that it doesn't help readability, and I think your work on combining things at the interpreter level is the better way to go. Especially since for python 3.6 it might be a bit less difficult to follow than what you have in #7997).

mattharrigan mentioned this issue Jan 24, 2017

ENH: Adding logical gufuncs, ie all_equal #8528

Closed

WarrenWeckesser mentioned this issue Feb 24, 2021

ENH: Create a place for "optimization" ufuncs #18483

Open

seberg mentioned this issue Feb 13, 2023

Linear Search #23203

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gufunc to test for all elements equal along axis #8513

gufunc to test for all elements equal along axis #8513

mattharrigan commented Jan 21, 2017

shoyer commented Jan 21, 2017 •

edited

mattharrigan commented Jan 21, 2017

charris commented Jan 21, 2017

shoyer commented Jan 21, 2017

juliantaylor commented Jan 21, 2017

juliantaylor commented Jan 21, 2017

mattharrigan commented Jan 21, 2017 via email

mhvk commented Jan 21, 2017

gufunc to test for all elements equal along axis #8513

gufunc to test for all elements equal along axis #8513

Comments

mattharrigan commented Jan 21, 2017

shoyer commented Jan 21, 2017 • edited

mattharrigan commented Jan 21, 2017

charris commented Jan 21, 2017

shoyer commented Jan 21, 2017

juliantaylor commented Jan 21, 2017

juliantaylor commented Jan 21, 2017

mattharrigan commented Jan 21, 2017 via email

mhvk commented Jan 21, 2017

shoyer commented Jan 21, 2017 •

edited