Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: thrust::count_if and copy_if performance on Grace and x86 10x+ / 20x+ slower than libstdc++ #1709

Open
1 task done
gonzalobg opened this issue May 6, 2024 · 0 comments
Labels
bug Something isn't working right.

Comments

@gonzalobg
Copy link
Collaborator

gonzalobg commented May 6, 2024

Is this a duplicate?

Type of Bug

Performance

Component

Thrust

Describe the bug

See title, the performance of count_if and copy_if are very low on NVIDIA Grace CPU.

How to Reproduce

This self-contained file reproduces: https://github.com/gonzalobg/cpp_hpc_tutorial/blob/master/labs/cpp/lab1_select/solutions/exercise1.cpp

This one isolates copy_if: https://github.com/gonzalobg/cpp_hpc_tutorial/blob/master/labs/cpp/lab1_select/solutions/copy_if.cpp

The performance of transform_inclusive_scan is also quite low (~3x lower) than libstdc++, see
https://github.com/gonzalobg/cpp_hpc_tutorial/blob/master/labs/cpp/lab1_select/solutions/exercise2.cpp

Expected behavior

Close to peak CPU bandwidth.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@gonzalobg gonzalobg added the bug Something isn't working right. label May 6, 2024
@gonzalobg gonzalobg changed the title [BUG]: thrust::count_if performance on Grace and x86 10x+ slower than libstdc++ [BUG]: thrust::count_if and copy_if performance on Grace and x86 10x+ / 20x+ slower than libstdc++ May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

1 participant