Why is `nanmin` so slow relative to numpy? #256

max-sixty · 2023-12-20T18:18:03Z

I recently added numpy to the benchmarks, and numbagg overall does quite well.

But it does very badly on nanmin & nanmax — about 85% slower. Bottleneck performs similarly to numbagg.

Even when I strip out anything that's non-essential to calculating a minimum, it doesn't help performance:

@ndreduce.wrap(
    [int64(int32), int64(int64), float32(float32), float64(float64)],
    # https://github.com/numba/numba/issues/7350
    # supports_parallel=False,
)
def nanmin(a):
    # if not a.size:
    #     raise ValueError(
    #         "zero-size array to reduction operation fmin which has no identity"
    #     )
    amin = np.infty
    # all_missing = True
    for ai in a.flat:
        # if ai <= amin:
        if ai < amin:
            amin = ai
    #         all_missing = False
    # if all_missing:
    #     amin = np.nan
    return amin

pytest -vv --benchmark-enable -k 'benchmark_main and [nanmin and shape1' --run-nightly

Check out the mean column¹:

-------------------------------------------------------------------------------- benchmark 'numbagg.nanmin|(10000000,)': 4 tests ---------------------------------------------------------------------------------
Name (time in ms)                                     Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_main[nanmin-shape1-numpy]           1.3247 (1.0)       1.8171 (1.0)       1.3850 (1.0)      0.0894 (1.0)       1.3428 (1.0)      0.0694 (1.0)         89;65  722.0375 (1.0)         689           1
test_benchmark_main[nanmin-shape1-bottleneck]     12.4117 (9.37)     13.3866 (7.37)     12.5970 (9.10)     0.1924 (2.15)     12.5252 (9.33)     0.1252 (1.80)        14;13   79.3839 (0.11)         80           1
test_benchmark_main[nanmin-shape1-numbagg]        12.5735 (9.49)     14.2261 (7.83)     12.7716 (9.22)     0.2331 (2.61)     12.6986 (9.46)     0.1566 (2.26)         12;9   78.2988 (0.11)         79           1
test_benchmark_main[nanmin-shape1-pandas]         12.5795 (9.50)     13.3724 (7.36)     12.7389 (9.20)     0.1636 (1.83)     12.6984 (9.46)     0.0965 (1.39)         8;10   78.4997 (0.11)         77           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Without solving this, we can't really recommend numbagg as a replacement for aggregation functions as we as grouping & moving window functions. I'm not even sure we should have the functions in numbagg — probably we should at least demote them outside the top namespace.

@shoyer if you happen to know off-hand given your experience here, let me know. No need to reply if not...

Part of me is wondering whether it's even doing the same operation. But we also have tests on the correctness relative to numpy, so it is returning the same result... ↩

The text was updated successfully, but these errors were encountered:

dcherian · 2024-01-17T02:53:57Z

ping numba/numba#2196?

max-sixty · 2024-01-17T04:21:46Z

ping numba/numba#2196?

Very possibly, nice find.

Though that bottleneck performs similarly to us means it's a bit less likely to be an LLVM issue...

shoyer · 2024-01-17T06:04:41Z

My guess is that NumPy is better at using vectorized CPU instructions for some reason. No idea why Numba can't do this, though...

dcherian · 2024-02-03T03:09:36Z

Well if you want to go down a rabbit hole: https://tbetcke.github.io/hpc_lecture_notes/simd.html ;)

max-sixty · 2024-02-03T03:16:33Z

Nice! V interesting. So hopefully this will improve in future numba versions...

max-sixty mentioned this issue Feb 29, 2024

Optional fastmath optimizations via env var #290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is `nanmin` so slow relative to numpy? #256

Why is `nanmin` so slow relative to numpy? #256

max-sixty commented Dec 20, 2023

dcherian commented Jan 17, 2024

max-sixty commented Jan 17, 2024

shoyer commented Jan 17, 2024

dcherian commented Feb 3, 2024

max-sixty commented Feb 3, 2024

Why is nanmin so slow relative to numpy? #256

Why is nanmin so slow relative to numpy? #256

Comments

max-sixty commented Dec 20, 2023

Footnotes

dcherian commented Jan 17, 2024

max-sixty commented Jan 17, 2024

shoyer commented Jan 17, 2024

dcherian commented Feb 3, 2024

max-sixty commented Feb 3, 2024

Why is `nanmin` so slow relative to numpy? #256

Why is `nanmin` so slow relative to numpy? #256