You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gonzalobg
changed the title
[BUG]: thrust::count_if performance on Grace and x86 10x+ slower than libstdc++
[BUG]: thrust::count_if and copy_if performance on Grace and x86 10x+ / 20x+ slower than libstdc++
May 6, 2024
Is this a duplicate?
Type of Bug
Performance
Component
Thrust
Describe the bug
See title, the performance of count_if and copy_if are very low on NVIDIA Grace CPU.
How to Reproduce
This self-contained file reproduces: https://github.com/gonzalobg/cpp_hpc_tutorial/blob/master/labs/cpp/lab1_select/solutions/exercise1.cpp
This one isolates copy_if: https://github.com/gonzalobg/cpp_hpc_tutorial/blob/master/labs/cpp/lab1_select/solutions/copy_if.cpp
The performance of transform_inclusive_scan is also quite low (~3x lower) than libstdc++, see
https://github.com/gonzalobg/cpp_hpc_tutorial/blob/master/labs/cpp/lab1_select/solutions/exercise2.cpp
Expected behavior
Close to peak CPU bandwidth.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
The text was updated successfully, but these errors were encountered: