Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mca/coll: Add any radix k for alltoall bruck algorithm #12453

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jiaxiyan
Copy link
Contributor

@jiaxiyan jiaxiyan commented Apr 5, 2024

This method extends ompi_coll_base_alltoall_intra_bruck to handle any radix k.

@jiaxiyan jiaxiyan marked this pull request as draft April 5, 2024 23:05
@jiaxiyan jiaxiyan marked this pull request as ready for review April 8, 2024 18:55
@jiaxiyan jiaxiyan requested a review from lrbison April 8, 2024 18:55
Copy link
Contributor

@lrbison lrbison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @jiaxiyan. A few small suggestions.

Have you checked that the default faninout will be 2 for the dynamic rules so as to not change performance?

Thanks

ompi/mca/coll/base/coll_base_alltoall.c Outdated Show resolved Hide resolved
ompi/mca/coll/base/coll_base_alltoall.c Outdated Show resolved Hide resolved
ompi/mca/coll/base/coll_base_alltoall.c Show resolved Hide resolved
if (err != MPI_SUCCESS) { line = __LINE__; goto err_hndl; }

/* Sendreceive */
err = ompi_coll_base_sendrecv ( tmpbuf, 1, new_ddt, sendto,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be replaced with non-blocking send/recv for better performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried replacing with non-blocking send/recv but it segfaults in osu_alltoall

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't get any performance buff with large fanout if you are using blocking send/receives. We can setup a meeting next week to see why it crashes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do get some improvement because you decrease the depth of the tree, but you don't get the multi-rail benefit.

Talking about multi-rail, I would expect the p2p communication layer to take advantage of the multi-rail, at least after a certain message size. If that's the case when the upper and lower level of the communication stack both try to generate multi-rail traffic things cant go well. How do we protect against that ? Is the plan to only enable collectives multi-rail only for some ranges of messages ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jiaxiyan is on an international trip. I will follow up with her to resolve the crash she mentioned.

@bosilca
Copy link
Member

bosilca commented Apr 29, 2024

I was planning to review it after they figure out the issue with replacing the blocking ompi_coll_base_sendrecv with non-blocking communications.

This method extends ompi_coll_base_alltoall_intra_bruck to handle
any radix k.

Signed-off-by: Jessie Yang <jiaxiyan@amazon.com>
@lrbison
Copy link
Contributor

lrbison commented May 7, 2024

@bosilca I don't follow this comment:

when the upper and lower level of the communication stack both try to generate multi-rail traffic things cant go well.

Are you suggesting that collectives shouldn't be making PML isend/irecv directly?

@bosilca
Copy link
Member

bosilca commented May 8, 2024

No, I was wondering how would a multi-rail enabled collective component (which will indeed post multiple isend requests) interacts with a multi-rail enabled low-level communication library (which will split all large messages across multiple lanes) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants