Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is nanmin so slow relative to numpy? #256

Open
max-sixty opened this issue Dec 20, 2023 · 5 comments
Open

Why is nanmin so slow relative to numpy? #256

max-sixty opened this issue Dec 20, 2023 · 5 comments

Comments

@max-sixty
Copy link
Collaborator

I recently added numpy to the benchmarks, and numbagg overall does quite well.

But it does very badly on nanmin & nanmax — about 85% slower. Bottleneck performs similarly to numbagg.

Even when I strip out anything that's non-essential to calculating a minimum, it doesn't help performance:

@ndreduce.wrap(
    [int64(int32), int64(int64), float32(float32), float64(float64)],
    # https://github.com/numba/numba/issues/7350
    # supports_parallel=False,
)
def nanmin(a):
    # if not a.size:
    #     raise ValueError(
    #         "zero-size array to reduction operation fmin which has no identity"
    #     )
    amin = np.infty
    # all_missing = True
    for ai in a.flat:
        # if ai <= amin:
        if ai < amin:
            amin = ai
    #         all_missing = False
    # if all_missing:
    #     amin = np.nan
    return amin
pytest -vv --benchmark-enable -k 'benchmark_main and [nanmin and shape1' --run-nightly

Check out the mean column1:

-------------------------------------------------------------------------------- benchmark 'numbagg.nanmin|(10000000,)': 4 tests ---------------------------------------------------------------------------------
Name (time in ms)                                     Min                Max               Mean            StdDev             Median               IQR            Outliers       OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_main[nanmin-shape1-numpy]           1.3247 (1.0)       1.8171 (1.0)       1.3850 (1.0)      0.0894 (1.0)       1.3428 (1.0)      0.0694 (1.0)         89;65  722.0375 (1.0)         689           1
test_benchmark_main[nanmin-shape1-bottleneck]     12.4117 (9.37)     13.3866 (7.37)     12.5970 (9.10)     0.1924 (2.15)     12.5252 (9.33)     0.1252 (1.80)        14;13   79.3839 (0.11)         80           1
test_benchmark_main[nanmin-shape1-numbagg]        12.5735 (9.49)     14.2261 (7.83)     12.7716 (9.22)     0.2331 (2.61)     12.6986 (9.46)     0.1566 (2.26)         12;9   78.2988 (0.11)         79           1
test_benchmark_main[nanmin-shape1-pandas]         12.5795 (9.50)     13.3724 (7.36)     12.7389 (9.20)     0.1636 (1.83)     12.6984 (9.46)     0.0965 (1.39)         8;10   78.4997 (0.11)         77           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Without solving this, we can't really recommend numbagg as a replacement for aggregation functions as we as grouping & moving window functions. I'm not even sure we should have the functions in numbagg — probably we should at least demote them outside the top namespace.


@shoyer if you happen to know off-hand given your experience here, let me know. No need to reply if not...

Footnotes

  1. Part of me is wondering whether it's even doing the same operation. But we also have tests on the correctness relative to numpy, so it is returning the same result...

@dcherian
Copy link
Contributor

ping numba/numba#2196?

@max-sixty
Copy link
Collaborator Author

ping numba/numba#2196?

Very possibly, nice find.

Though that bottleneck performs similarly to us means it's a bit less likely to be an LLVM issue...

@shoyer
Copy link
Collaborator

shoyer commented Jan 17, 2024

My guess is that NumPy is better at using vectorized CPU instructions for some reason. No idea why Numba can't do this, though...

@dcherian
Copy link
Contributor

dcherian commented Feb 3, 2024

Well if you want to go down a rabbit hole: https://tbetcke.github.io/hpc_lecture_notes/simd.html ;)

@max-sixty
Copy link
Collaborator Author

Nice! V interesting. So hopefully this will improve in future numba versions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants