Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we have short-circuit equivalents to allclose, array_equal, isfinite and the like in numpy? #6909

Closed
congma opened this issue Dec 30, 2015 · 11 comments

Comments

@congma
Copy link
Contributor

congma commented Dec 30, 2015

I did some (admittedly very dirty) benchmark of scipy.linalg.cho_solve(), and what surprised me was the magnitude of overhead due to check_finite=True, which is on by default. My notes can be read here.

The finiteness-check is useful, if not plain necessary for some scenarios, but can we do better? In many user cases such as the scipy.linalg functions, all that matters is that some element of an array is NaN or infinity. Now, what it has to do, is to call numpy.asarray_chkfinite, which in turn does something amounting to numpy.isfinite(a).all(). But this does not short-circuit; isfinite() checks each single element and returns a Boolean array. This whole test can be expensive, but can it be made less expensive in the best case?

This issue is also present in the test of closeness and equality, in core/numeric.py. Neither allclose() nor array_equal() actually short-circuits when doing the real check. They only short-circuit in the all() function/method call, which is already too late. These two functions can be especially deceiving. Spoiled by Python's built-in all(), the user may think these functions do the short-circuit evaluation. The docs never mention they don't.

So the request is... Do you think it's worthwhile to add some short-circuit Boolean functions for the above tests (finiteness/equality/closeness)?

@njsmith
Copy link
Member

njsmith commented Dec 30, 2015

IIRC there are some cases where np.all does abort early as an optimization, but in general it's somewhat difficult (consider calls that use a non-trivial axis= argument). And it's speaker irrelevant to chkfinite in any case, because here the case where speed matters is the case where all values are finite, and so we have to check the whole array. (If there are non-finite values, we will throw an error, and it doesn't matter much how quickly we throw an error.)

In general though if there's some code in numpy that can be made faster then we do like to receive patches to make things more efficient, and especially if this is creating a bottleneck for you (I can't tell from your post whether allclose is a practical problem, or just a theoretical one).

@congma
Copy link
Contributor Author

congma commented Dec 30, 2015

The example in my notebook linked in OP wasn't the most relevant one, although by that one I discovered the (shocking to me) fact that most of the time cho_solve is simply testing for NaNs. A more relevant example is testing array exact equality for use in dictionaries.

Sometimes I need to use numpy arrays as dictionary keys (for example, caching). Of course, there are complications. For one thing, arrays must not change value during their lifetime as keys. We then need a hash function (to be applied on array buffers, again with complications) and a comparison function that tests exact equality. The latter function is used by Python for open addressing in case of hash collision.

If hash collision is fairly common, the comparison function's performance could be pretty critical. It needs to bail out as early as possible during probing, especially when the key is large.

Currently, numpy.array_equal evaluates too eagerly for this. The desirable behavior, for this particular case, is to short-circuit the comparison.

@seberg
Copy link
Member

seberg commented Dec 30, 2015

I think Nathaniel is right, at least for your case, short circuiting does not help at all. Possible there is some vectorization/low level optimization that can be done.
Your example seems to me mostly memory bandwidth bound (since cfac is too large to fit into cache). From that point of view any, even non-shortcircuiting, combination of isfinite + all or isfinite + any might safe a good bit on out-of-cache sized arrays.
(On my cpu the factor is more like 1.5 I think, but still proves how optimized blas is also)

The most general way of implemeting it, might be adding .all and .any attributes to ufuncs returning a single boolean (so that np.isfinite.all(arr) would work). It would probably be quite a chunk of work though, and "short-circuiting" is actually really tricky with the axis keyword argument as Nathaniel said (it goes against the memory order optimization numpy usually does). Anyway, it could be a nice new feature, standalone or part of the ufuncs.

@rgommers
Copy link
Member

I discovered the (shocking to me) fact that most of the time cho_solve is simply testing for NaNs. A more relevant example is testing array exact equality for use in dictionaries.

It can be expensive, this is why check_finite was added as a keyword so you can disable it if you're sure that your input cannot contain NaNs. The keyword is ugly though, which is a bit of an annoyance. If the check was cheaper, the keyword could be removed.

@nouiz
Copy link
Contributor

nouiz commented Jan 8, 2016

In Theano we did a faster version for nan check.

old nan check(from memory): np.isnan().any()

faster version: np.isnan(np.min(arr))

Executing the reduction first is faster as we do the isnan check on less
element. This use the property of nan that they propagate.

We use for inf check, I didn't do it myself, so I don't remember the speed
of it:

np.isinf(np.nanmax(arr)) or np.isinf(np.nanmin(arr))
We didn't tried the combination of both, but I think a version like the
fast nan check would do it. Operation between inf will give inf or nan.
Operation between inf and nan will give nan. I didn't test it, but I think
this would do it:

numpy.isfinite(numpy.min(arr))

Le 31 déc. 2015 01:47, "Ralf Gommers" notifications@github.com a écrit :

I discovered the (shocking to me) fact that most of the time cho_solve is
simply testing for NaNs. A more relevant example is testing array exact
equality for use in dictionaries.

It can be expensive, this is why check_finite was added as a keyword so
you can disable it if you're sure that your input cannot contain NaNs. The
keyword is ugly though, which is a bit of an annoyance. If the check was
cheaper, the keyword could be removed.


Reply to this email directly or view it on GitHub
#6909 (comment).

@juliantaylor
Copy link
Contributor

something like hasnan could be useful as it skips creating a bool array, but its not going to be faster than @nouiz np.min trick which also only gives you about 30% better speed.

short circuiting is probably not very useful in practice as in the common case of no nan you can't short circuit and in the other you have to add expensive operations to get rid of them.

The best you can do is make use of the fact that operations raise exceptions when using nans, so catch that and handle it. But that only works in cases where something is done before going into lapack which will deadlock or crash (which is the reason for the checks existence int he first place)

@juliantaylor
Copy link
Contributor

that said np.isfinite is not vectorized so there could be a 1.5x-2x speedup available, at least for cached data

@juliantaylor
Copy link
Contributor

vectorized isfinite in gh-6980
Its not very significant for the above testcase as that is mostly memory bandwidth bound, but nice speedup for cached arrays

@congma
Copy link
Contributor Author

congma commented Jan 9, 2016

Thanks for the updates. The min trick is interesting; It's even documented! ;)

Right now, for caching, I'm resorting to directly hashing the data attribute, and in case of collision, comparing the data attribute by Python's == operator. The seem to work fast enough (bottleneck being the hash computation), but for very restricted collections of arrays under comparison. For my usage all the possible arrays being cached have the same type and shape, and there's no overlapping of underlying memory.

@jakirkham
Copy link
Contributor

So it looks like PR ( #6980 ) went in, but was then reverted. Unclear as to whether it was readded later or not. Does anyone know?

@seberg
Copy link
Member

seberg commented Feb 24, 2021

I am going to close this. There are two possibilities:

  1. Specialized gufuncs for performance optimiation to replace some chained operation: ENH: Create a place for "optimization" ufuncs #18483
  2. A chaining operation map_reduce like addition to the ufunc machinery (or addition build on top of it): ENH: Create the ability for fused operations (fused ufuncs or map_reduce) style #11622 (and other issues like it)

Both probably make sense for certain functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants