New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we have short-circuit equivalents to allclose
, array_equal
, isfinite
and the like in numpy?
#6909
Comments
IIRC there are some cases where In general though if there's some code in numpy that can be made faster then we do like to receive patches to make things more efficient, and especially if this is creating a bottleneck for you (I can't tell from your post whether |
The example in my notebook linked in OP wasn't the most relevant one, although by that one I discovered the (shocking to me) fact that most of the time Sometimes I need to use numpy arrays as dictionary keys (for example, caching). Of course, there are complications. For one thing, arrays must not change value during their lifetime as keys. We then need a hash function (to be applied on array buffers, again with complications) and a comparison function that tests exact equality. The latter function is used by Python for open addressing in case of hash collision. If hash collision is fairly common, the comparison function's performance could be pretty critical. It needs to bail out as early as possible during probing, especially when the key is large. Currently, |
I think Nathaniel is right, at least for your case, short circuiting does not help at all. Possible there is some vectorization/low level optimization that can be done. The most general way of implemeting it, might be adding |
It can be expensive, this is why |
In Theano we did a faster version for nan check. old nan check(from memory): np.isnan().any() faster version: np.isnan(np.min(arr)) Executing the reduction first is faster as we do the isnan check on less We use for inf check, I didn't do it myself, so I don't remember the speed np.isinf(np.nanmax(arr)) or np.isinf(np.nanmin(arr)) numpy.isfinite(numpy.min(arr)) Le 31 déc. 2015 01:47, "Ralf Gommers" notifications@github.com a écrit :
|
something like hasnan could be useful as it skips creating a bool array, but its not going to be faster than @nouiz np.min trick which also only gives you about 30% better speed. short circuiting is probably not very useful in practice as in the common case of no nan you can't short circuit and in the other you have to add expensive operations to get rid of them. The best you can do is make use of the fact that operations raise exceptions when using nans, so catch that and handle it. But that only works in cases where something is done before going into lapack which will deadlock or crash (which is the reason for the checks existence int he first place) |
that said np.isfinite is not vectorized so there could be a 1.5x-2x speedup available, at least for cached data |
vectorized isfinite in gh-6980 |
Thanks for the updates. The Right now, for caching, I'm resorting to directly hashing the |
So it looks like PR ( #6980 ) went in, but was then reverted. Unclear as to whether it was readded later or not. Does anyone know? |
I am going to close this. There are two possibilities:
Both probably make sense for certain functions. |
I did some (admittedly very dirty) benchmark of
scipy.linalg.cho_solve()
, and what surprised me was the magnitude of overhead due tocheck_finite=True
, which is on by default. My notes can be read here.The finiteness-check is useful, if not plain necessary for some scenarios, but can we do better? In many user cases such as the
scipy.linalg
functions, all that matters is that some element of an array isNaN
or infinity. Now, what it has to do, is to callnumpy.asarray_chkfinite
, which in turn does something amounting tonumpy.isfinite(a).all()
. But this does not short-circuit;isfinite()
checks each single element and returns a Boolean array. This whole test can be expensive, but can it be made less expensive in the best case?This issue is also present in the test of closeness and equality, in
core/numeric.py
. Neitherallclose()
norarray_equal()
actually short-circuits when doing the real check. They only short-circuit in theall()
function/method call, which is already too late. These two functions can be especially deceiving. Spoiled by Python's built-inall()
, the user may think these functions do the short-circuit evaluation. The docs never mention they don't.So the request is... Do you think it's worthwhile to add some short-circuit Boolean functions for the above tests (finiteness/equality/closeness)?
The text was updated successfully, but these errors were encountered: