mca/coll: Add any radix k for alltoall bruck algorithm #12453

jiaxiyan · 2024-04-05T23:03:42Z

This method extends ompi_coll_base_alltoall_intra_bruck to handle any radix k.

lrbison

Looks good @jiaxiyan. A few small suggestions.

Have you checked that the default faninout will be 2 for the dynamic rules so as to not change performance?

Thanks

ompi/mca/coll/base/coll_base_alltoall.c

juntangc · 2024-04-26T03:39:01Z

ompi/mca/coll/base/coll_base_alltoall.c

+            if (err != MPI_SUCCESS) { line = __LINE__; goto err_hndl;  }
+
+            /* Sendreceive */
+            err = ompi_coll_base_sendrecv ( tmpbuf, 1, new_ddt, sendto,


these can be replaced with non-blocking send/recv for better performance

I tried replacing with non-blocking send/recv but it segfaults in osu_alltoall

You won't get any performance buff with large fanout if you are using blocking send/receives. We can setup a meeting next week to see why it crashes.

You do get some improvement because you decrease the depth of the tree, but you don't get the multi-rail benefit.

Talking about multi-rail, I would expect the p2p communication layer to take advantage of the multi-rail, at least after a certain message size. If that's the case when the upper and lower level of the communication stack both try to generate multi-rail traffic things cant go well. How do we protect against that ? Is the plan to only enable collectives multi-rail only for some ranges of messages ?

Jiaxiyan is on an international trip. I will follow up with her to resolve the crash she mentioned.

bosilca · 2024-04-29T19:01:34Z

I was planning to review it after they figure out the issue with replacing the blocking ompi_coll_base_sendrecv with non-blocking communications.

This method extends ompi_coll_base_alltoall_intra_bruck to handle any radix k. Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>

lrbison · 2024-05-07T17:33:07Z

@bosilca I don't follow this comment:

when the upper and lower level of the communication stack both try to generate multi-rail traffic things cant go well.

Are you suggesting that collectives shouldn't be making PML isend/irecv directly?

bosilca · 2024-05-08T08:42:38Z

No, I was wondering how would a multi-rail enabled collective component (which will indeed post multiple isend requests) interacts with a multi-rail enabled low-level communication library (which will split all large messages across multiple lanes) ?

github-actions bot added the Target: main label Apr 5, 2024

jiaxiyan marked this pull request as draft April 5, 2024 23:05

jiaxiyan marked this pull request as ready for review April 8, 2024 18:55

jiaxiyan requested a review from lrbison April 8, 2024 18:55

lrbison reviewed Apr 25, 2024

View reviewed changes

ompi/mca/coll/base/coll_base_alltoall.c Outdated Show resolved Hide resolved

ompi/mca/coll/base/coll_base_alltoall.c Outdated Show resolved Hide resolved

ompi/mca/coll/base/coll_base_alltoall.c Show resolved Hide resolved

juntangc reviewed Apr 26, 2024

View reviewed changes

hppritcha requested review from bosilca, lrbison and juntangc April 28, 2024 17:44

mca/coll: Add any radix k for alltoall bruck algorithm

87e6eb8

This method extends ompi_coll_base_alltoall_intra_bruck to handle any radix k. Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>

jiaxiyan force-pushed the tuned_alltoall branch from e335cd8 to 87e6eb8 Compare May 7, 2024 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mca/coll: Add any radix k for alltoall bruck algorithm #12453

mca/coll: Add any radix k for alltoall bruck algorithm #12453

jiaxiyan commented Apr 5, 2024

lrbison left a comment

juntangc Apr 26, 2024

jiaxiyan Apr 26, 2024

juntangc Apr 26, 2024

bosilca Apr 27, 2024

juntangc Apr 29, 2024

bosilca commented Apr 29, 2024

lrbison commented May 7, 2024

bosilca commented May 8, 2024

mca/coll: Add any radix k for alltoall bruck algorithm #12453

Are you sure you want to change the base?

mca/coll: Add any radix k for alltoall bruck algorithm #12453

Conversation

jiaxiyan commented Apr 5, 2024

lrbison left a comment

Choose a reason for hiding this comment

juntangc Apr 26, 2024

Choose a reason for hiding this comment

jiaxiyan Apr 26, 2024

Choose a reason for hiding this comment

juntangc Apr 26, 2024

Choose a reason for hiding this comment

bosilca Apr 27, 2024

Choose a reason for hiding this comment

juntangc Apr 29, 2024

Choose a reason for hiding this comment

bosilca commented Apr 29, 2024

lrbison commented May 7, 2024

bosilca commented May 8, 2024