Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HGX 2-node test with different NIC topologies different network card names hangs, no results #1277

Open
superLiben opened this issue May 8, 2024 · 4 comments

Comments

@superLiben
Copy link

superLiben commented May 8, 2024

I have an HGX H100 server with 2 nodes, and I'm performing node bandwidth testing. After running the command, it hangs. My NCCL is the latest version, and OpenMPI is 4.1.7. I found that the NIC topology is different between the two nodes, which may cause the hang. If I test two nodes with the same IB card topology, there is no issue. My run command is as follows:

root/nccl_apps/openmpi-4.1.7a1/bin/mpirun --allow-run-as-root -np 16 -H 100.64.24.75:8,100.64.24.76:8 --timestamp-output --mca btl_tcp_if_include enp25s0np0 --mca oob_tcp_if_include enp25s0np0 -x NCCL_IB_GID_INDEX=3 -x NCCL_DEBUG=WARN -x NCCL_DEBUG_SUBSYS=INIT,NET,GRAPH -x NCCL_IB_QPS_PER_CONNECTION=4 -x NCCL_PXN_DISABLE=0 -x NCCL_CROSS_NIC=1 -x LD_LIBRARY_PATH=/root/nccl_apps/nccl/lib:/root/nccl_apps/openmpi-4.1.7a1/lib /root/nccl_apps/nccl-test/all_reduce_perf -b 1M -e 20G -g 1 -f 2

Each host configuration:
openmpi version:4.1.7a1
nccl version: 2.21.5
H100 =8
400G CX7 =4 (switch inter-GPU communication)
other:200G=1 /25G=4 (Manage and store communications)

Two images, one with MLX5_0/3/5/8 and another with mlx5_0/1/4/5. They are both 400Gb single-port IB cards, and my network is RoCE V2."
Note: MLX5 refers to a type of network interface card (NIC) from Mellanox, and the numbers (e.g. 0/3/5/8) likely represent the ports or lanes on the card. RoCE V2 stands for RDMA over Converged Ethernet Version 2, which is a protocol for remote direct memory access over Ethernet.

Can the different NIC topology between the two nodes cause the hang?"
20240508121515
20240508121536

@superLiben
Copy link
Author

superLiben commented May 11, 2024

image
I would like to ask whether red and pink are both 400G network cards. Can these network cards with different names be tested?Will NCCL become unresponsive if the network card name is different?
image
No response after executing command

image

@superLiben
Copy link
Author

superLiben commented May 11, 2024

Regarding the above question, if two nodes have the same topology, same network card slot positions, and same network card names, can they perform NCCL testing with a bus bandwidth of 360GB/s,If the two nodes have different network card slot positions, causing the network card names to change, can the NCCL cluster still communicate and perform testing? I am currently encountering an issue where the command I am running is unresponsive, and I am unsure of how to resolve it.

@superLiben superLiben changed the title H100 2-node test with different NIC topologies hangs, no results in NCCL testing. H100 2-node test with different NIC topologies different network card names hangs, no results May 11, 2024
@superLiben superLiben changed the title H100 2-node test with different NIC topologies different network card names hangs, no results HGX 2-node test with different NIC topologies different network card names hangs, no results May 11, 2024
@sjeaugey
Copy link
Member

Configuring multiple RoCE NICs is complicated, because of the IP subnets which may prevent NICs from communicating with each other. Adding the fact that NICs are not numbered in the same way amplifies the complexity by a new order of magnitude.

Making this setup work should be possible, but would requires a lot of time and effort (which we can't provide over github issues). I would suggest you ensure all nodes are exactly identical.

@superLiben
Copy link
Author

Thanx,Can changing the name of the NIC using "rdma set name" solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants