[PERF][BUG]: Thrust uses cudaMemcpy for Device->Device copies (66% SoL on H200) #1672
Open
1 task done
Labels
bug
Something isn't working right.
Is this a duplicate?
Type of Bug
Performance
Component
Thrust
Describe the bug
thrust::copy
usescudaMemcpy
to implement the copy, which saturates at most 66% of memory bandwidth on H200.nvbug 4207603
How to Reproduce
See godbolt link for exact reproducer.
Observed output:
Expected behavior
thrust::copy should be able to saturate bandwidth.
Reproduction link
https://godbolt.org/z/foPG4ox53
Operating System
No response
nvidia-smi output
NVCC version
NA
The text was updated successfully, but these errors were encountered: