Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I identify level1 nvswitch and level2 nvswitch in NCCL #1286

Open
Ryan201802 opened this issue May 14, 2024 · 11 comments
Open

How can I identify level1 nvswitch and level2 nvswitch in NCCL #1286

Ryan201802 opened this issue May 14, 2024 · 11 comments

Comments

@Ryan201802
Copy link

          The different NVSwitches are not visible to NCCL. Not for NVLink communication, not for NVLink SHARP. Traffic is spread on all switches transparently.

Originally posted by @sjeaugey in #1006 (comment)

I am confused how to use two levels nvswitch to do allreduce in NCCL.

@AddyLaddy
Copy link
Collaborator

It's all handled in the NVSwitch HW and Fabric manager and is opaque to the NCCL and CUDA SW stack.

Also, there are no multi-level NVLink switch system products available from Nvidia currently.

@Ryan201802
Copy link
Author

Thank for your help. But @AddyLaddy , how can I understand "inter-node NVLink SHARP since 2.18" mentioned at #895 (comment)
how does the inter-node connect?

@Ryan201802
Copy link
Author

Ryan201802 commented May 15, 2024

Additionally, I find the framework in the GH200 white paper. There is two levels nvswitch.
image
In this framework, can the second level do NVLink SHARP? If can, how to do in the NCCL?

@AddyLaddy
Copy link
Collaborator

We have developed so called Multi-Node NVLink systems (MNNVL) and the first publicly available system will be called GB200 (sometimes referred to as NVL72).
GB200 NVL72
It will have 72 Blackwell generation GPUs connected to a single NVLink domain.

@AddyLaddy
Copy link
Collaborator

The full features of NVLink will be available to all GPUs in such a system, including NVLink SHARP.
NCCL already supports NVLink SHARP on 8x H100 systems, so it's just the same but with a larger NVLink domain.
It's all opaque to NCCL, it just sees 72 GPUs all accessible via NVLink.

@Ryan201802
Copy link
Author

Thanks, I understand.
Now only have a question about the GH200 Framework. How does the second level nvswitch work? or how can I use their Sharps unit?

@AddyLaddy
Copy link
Collaborator

I don't believe we've announced the NVLink topology of GB200 yet.
But NVLink SHARP works in both single and two level NVSwitch networks.
Again, it's all opaque to NCCL. It just works like any other NVLink connected machine that is NVLink SHARP capable.

@hennry205
Copy link

Hi @AddyLaddy, I have a question. How does multimem.ld_reduce command in kernel trigger nvswitch to do Reduce function? Because I don't see any direct control of nvswith in the kernel, Thanks.

@Ryan201802
Copy link
Author

Yes @AddyLaddy , but the topology I mentioned is GH200, not GB200.

@AddyLaddy
Copy link
Collaborator

There are no publicly released Nvidia products that use multi level NVSwitches currently. The first publicly available product will be GB200.
NVLink SHARP works with both single level and multi-level NVLink fabrics.

@AddyLaddy
Copy link
Collaborator

Hi @AddyLaddy, I have a question. How does multimem.ld_reduce command in kernel trigger nvswitch to do Reduce function? Because I don't see any direct control of nvswith in the kernel, Thanks.

The NVLink SHARP implementation is based in HW and is configured by mapping in special "Multicast" enabled buffer addresses to the GPU virtual address space. See CUDA Multicast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants