You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this method, the read request from one NIC (Irecv) are pipelined such that the next read request always need to wait for the completion of the previous one. The consequences that I observed (using debug output) from this are that, the network is likely to be idle during the waiting, and there are time gaps that the NICs do nothing.
However, is this really true for MPI alltoall collective?
"MPI_Alltoall" should contain independent point-to-point communications among all NICs, according to the official MPI doc (https://docs.open-mpi.org/en/v5.0.x/man-openmpi/man3/MPI_Alltoall.3.html @ 17.2.16.4. DESCRIPTION).
Therefore, I reckon that an NIC should send read requests to as many other NICs as possible at the same time, so that they are independent. Do you agree?
Thanks!
Best regards,
Z.
2 - Describe how to reproduce
run sst with /sst-elements/sst-elements-src/src/sst/elements/ember/tests/dragon_128_allreduce.py, but change the motif to 'Alltoall' or 'Alltoallv'.
3 - What Operating system(s) and versions
4 - What version of external libraries (Boost, MPI)
5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc)
official latest repos
6 - Fill out Labels, Milestones, and Assignee fields as best possible
SST-Firefly; SST-Ember; enhancement; help_wanted
The text was updated successfully, but these errors were encountered:
It seems that it is true that the requests are sent at the same time and each rank waits for all replies at the same time. So I think this is a necessary (?) enhancement of SST-firefly, because some important motifs (such as FFT3D) are heavily based on alltoall and alltoallv.
An update on this issue:
I have tried to measure realistic OpenMPI MPI_Alltoall traffic, with four remote servers that are connected via ethernet. They communicate through TCP/IP, therefore I used tcp-dump to monitor the traffic among the nodes.
The result is surprisingly similar to what is obtained by sst-ember+firely+merlin simulation: the inter-node traffic is a shifting from the first diagonal to the last.
New Issue for sst-elements
1 - Detailed description of problem or enhancement
Hi,
As far as I undertood SST-Firelfy, this following method defines how Firefly executes EmberAlltoallMotif and EmberAlltoallvMotif:
sst-elements/src/sst/elements/firefly/funcSM/alltoallv.cc
Line 58 in 54843c2
In this method, the read request from one NIC (Irecv) are pipelined such that the next read request always need to wait for the completion of the previous one. The consequences that I observed (using debug output) from this are that, the network is likely to be idle during the waiting, and there are time gaps that the NICs do nothing.
However, is this really true for MPI alltoall collective?
"MPI_Alltoall" should contain independent point-to-point communications among all NICs, according to the official MPI doc (https://docs.open-mpi.org/en/v5.0.x/man-openmpi/man3/MPI_Alltoall.3.html @ 17.2.16.4. DESCRIPTION).
Therefore, I reckon that an NIC should send read requests to as many other NICs as possible at the same time, so that they are independent. Do you agree?
Thanks!
Best regards,
Z.
2 - Describe how to reproduce
run sst with /sst-elements/sst-elements-src/src/sst/elements/ember/tests/dragon_128_allreduce.py, but change the motif to 'Alltoall' or 'Alltoallv'.
3 - What Operating system(s) and versions
4 - What version of external libraries (Boost, MPI)
5 - Provide sha1 of all relevant sst repositories (sst-core, sst-elements, etc)
official latest repos
6 - Fill out Labels, Milestones, and Assignee fields as best possible
SST-Firefly; SST-Ember; enhancement; help_wanted
The text was updated successfully, but these errors were encountered: