New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible performance regression from 1.6.2 --> 1.7.0: np.any() and np.all() are unexpectedly slow over large arrays #3446
Comments
Hmm, I'd guess a change in the handling of |
Looks like the reduce detection should still work for |
I think the problem is that the reduce always uses a buffered iterator, which may be fine for calling from python but within C is just wasting time with unnecessary copies. as a workaround in 1.7 you can increase the (very small by default) buffer to reduce the overhead a bit:
which will of course only help if it actually fits in the buffer and the memcpy is much faster than the logical_or (which it is) |
Ah, that could be. Increasing the buffer size helps a bit, but I suspect the deeper problem is that it doesn't know to shortcircuit the |
One further point - I would also expect to see constant loop times w.r.t. array size for all-zero float arrays as well as boolean, since # x = np.zeros(10**ii,dtype=np.float32)
Numpy v1.6.2
Array size: 1E0, 100000 loops, best of 3: 3.503 us/loop
Array size: 1E1, 100000 loops, best of 3: 3.597 us/loop
Array size: 1E2, 100000 loops, best of 3: 3.742 us/loop
Array size: 1E3, 100000 loops, best of 3: 4.745 us/loop
Array size: 1E4, 100000 loops, best of 3: 14.533 us/loop
Array size: 1E5, 10000 loops, best of 3: 112.463 us/loop
Array size: 1E6, 1000 loops, best of 3: 1.101 ms/loop
Array size: 1E7, 100 loops, best of 3: 11.724 ms/loop
Array size: 1E8, 10 loops, best of 3: 116.924 ms/loop
Array size: 1E9, 1 loops, best of 3: 1.168 s/loop |
Yes, only the boolean loop has a short circuit. |
Sorry for my ignorance - is there some particular reason why? |
Probably the mixed types: boolean output, float inputs. The usual reduce idea has trouble with that. |
Yeah, I know I didn't want to hang out here ;). Someone might want to check nditer construction, because for an unbuffered loop (and this should be unbuffered as far as I can tell), I think I remember seeing code that should expand the innermost dimension to the maximum possible size (i.e. the whole array here). So this might be failing. Other then that I would say this is merely an observation. Something like gh-2269 is more appropriate because usually we want to find any/first occurance after a calculation (though I would rather implement it using nditer, I believe I saw a sniplet by mark that shows how to do such things with it). I am not even sure the ufunc machinery currently supports mixed type reductions which would be necessary to optimize the non-bool case. |
Haha, here is your explenation: |
Resistance is futile ;) But thanks for the pointers.... |
Why do we even have a float loop for this? It's just doing cast-to-bool On Sun, Jun 16, 2013 at 6:57 PM, Charles Harris notifications@github.comwrote:
|
Or wait, the ufunc machinery thinks that you can't cast floats to bools no On Sun, Jun 16, 2013 at 7:40 PM, Nathaniel Smith njs@pobox.com wrote:
|
It does cast for the reduction (I think it only uses the out dtype to |
Apart from in lack of a simple optimization, it might not matter too much in practice. Many of the use cases of any/all are likely to be in expectation of False/True, in which case the entire data set needs to be traversed anyway. |
in numpy 1.9 you can do |
@juliantaylor How did you expect numpy 1.10 to change your statement? I just tried on numpy 1.10 and your workaround (appreciated btw) is still much slower than all() if the condition is true. |
are you on windows or a system with a old c library? the change I was referring too is c12c31f which requires a good |
I am indeed on Windows... |
Closing this, since the core of the issue is identical to gh-17471 |
When
np.all
encounters a zero element it should returnFalse
immediately without testing any further elements. Therefore, the time taken bynp.all
should not increase with increasing array size, provided that the first element in the array is always zero. The same should be true fornp.any
if the first element in the array is nonzero.Test script:
Results:
Bug was initially reported in relation to this SO question.
The text was updated successfully, but these errors were encountered: