Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About RDMA Scatter/ Gather & RC QP's max_rd_atomic #24

Open
eedalong opened this issue May 26, 2022 · 1 comment
Open

About RDMA Scatter/ Gather & RC QP's max_rd_atomic #24

eedalong opened this issue May 26, 2022 · 1 comment
Assignees
Labels
documentation Improvements or additions to documentation Doing

Comments

@eedalong
Copy link
Member

eedalong commented May 26, 2022

RDMA Scatter/Gather is a nice way to consolidate data transfers. For example, verbs API allows data at multiple locations to be written in a remote buffer with a SINGLE RDMA write operation; or, data in a remote buffer could be read to multiple locations with a SINGLE RDMA read operation. This is attractive, but seems that nobody has reported benefits they get from this feature.

One possible reason is the limited RNIC SRAM, which may cause these 2 problems:

  1. Remote and local HCA cannot store much data, which limit the SGE_NUM. But recent RNIC has much larger SRAM compared with earlier generations, Mellanox CX4 RNIC has about 2MB SRAM, which is big enough for us because node feature in GNN is only about 1-4KB.

  2. Storing too much data in HCA may lead to little memory budget for MTT/MPT which may cause severe MTT shoot down and too many PCIe/DMA overheads for address translation. But this can be solved if we use large physical contiguous memory (using Linux CMA ) and physical memory region which is a new feature in Mellanox CX5.

@eedalong
Copy link
Member Author

eedalong commented May 26, 2022

max_rd_atomic is a crucial QP attribute for performance, it is the number of RDMA Reads & atomic operations outstanding at any time that can be handled by a RC QP as an initiator. Well, for me, I still cannot understand why setting this attribute larger than 1 helps us a lot. We need to find out the reasons behind.

@eedalong eedalong changed the title 关于RDMA的Scatter-Gather特性的验证分析以及MAX_RD_ATOMIC参数的分析 About RDMA Scatter/ Gather & RC QP's max_rd_atomic May 26, 2022
@eedalong eedalong self-assigned this May 26, 2022
@eedalong eedalong added documentation Improvements or additions to documentation Doing labels May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Doing
Projects
None yet
Development

No branches or pull requests

1 participant