Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF][BUG]: Thrust uses cudaMemcpy for Device->Device copies (66% SoL on H200) #1672

Open
1 task done
ahendriksen opened this issue Apr 29, 2024 · 1 comment
Open
1 task done
Assignees
Labels
bug Something isn't working right.

Comments

@ahendriksen
Copy link
Contributor

ahendriksen commented Apr 29, 2024

Is this a duplicate?

Type of Bug

Performance

Component

Thrust

Describe the bug

thrust::copy uses cudaMemcpy to implement the copy, which saturates at most 66% of memory bandwidth on H200.
nvbug 4207603

How to Reproduce

See godbolt link for exact reproducer.

Observed output:

$ ./01_thrust_copy 
     cp_gb  elapsed_ms     bw_gbps  pct_of_sol
      8.59      2.6090      3292.4       67.0%
      8.59      2.6073      3294.5       67.0%
      8.59      2.6061      3296.1       67.0%

Expected behavior

thrust::copy should be able to saturate bandwidth.

Reproduction link

https://godbolt.org/z/foPG4ox53

Operating System

No response

nvidia-smi output

$ nvidia-smi 
Mon Apr 29 05:40:23 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:45:00.0 Off |                    0 |
| N/A   27C    P0             73W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

NVCC version

NA

@ahendriksen ahendriksen added the bug Something isn't working right. label Apr 29, 2024
@bernhardmgruber bernhardmgruber self-assigned this Apr 29, 2024
@ahendriksen
Copy link
Contributor Author

Related issue in RAPIDS, where smaller copies are serialized behind larger copies due to busy copy engines.

@gevtushenko : does thrust::copy_n use a kernel to perform the copying? Perhaps, that should be used instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: Todo
Development

No branches or pull requests

2 participants