You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
This is the last output line before it hang forever
[0] MPI_Startup(): libfabric provider: verbs;ofi_rxm
No error reported. No traffic flow between nodes.
To Reproduce
I have a fresh environment with intelmpi2021.11, Libfabric 1.18.1-ipmi, slurm 21.08.8-2, and RoCEv2 network
Both intelmpi and openmpi using mpirun single/multi node: ok.
Openmpi using srun single/multi node: ok
Intelmpi using srun with single node: ok
Intelmpi using srun with multi node: not ok
Environment:
Rockylinux 8.6
The text was updated successfully, but these errors were encountered:
If you set FI_LOG_LEVEL=warn I would expect to see some warning messages about connection failure. There may be something wrong in the network setup that prevented rdma-cm from working properly.
Describe the bug
This is the last output line before it hang forever
[0] MPI_Startup(): libfabric provider: verbs;ofi_rxm
No error reported. No traffic flow between nodes.
To Reproduce
I have a fresh environment with intelmpi2021.11, Libfabric 1.18.1-ipmi, slurm 21.08.8-2, and RoCEv2 network
Both intelmpi and openmpi using mpirun single/multi node: ok.
Openmpi using srun single/multi node: ok
Intelmpi using srun with single node: ok
Intelmpi using srun with multi node: not ok
Environment:
Rockylinux 8.6
The text was updated successfully, but these errors were encountered: