[PERF][BUG]: thrust::transform
does not saturate bandwidth on newer hardware architectures (down to 62% SoL on H200 for int)
#1673
Labels
bug
Something isn't working right.
Is this a duplicate?
Type of Bug
Performance
Component
Thrust
Describe the bug
Using
thrust::transform
on newer hardware platforms can result in subpar performance.How to Reproduce
See godbolt link for exact reproducer.
Output:
Expected behavior
The benchmarks with int32 datatype should be able to saturate bandwidth (~90%). The benchmarks with int16 and int8 datatypes should have reasonable performance (>60%). The int64 mul benchmark should be at 90% SoL.
The int128, and the remaining int64 benchmarks have been added as a reference. Their performance is acceptable.
Reproduction link
https://godbolt.org/z/K7EW5freK
Operating System
No response
nvidia-smi output
NVCC version
NA
The text was updated successfully, but these errors were encountered: