Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/isend-irecv: add parent reques completion in isend/irecv #6640

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

thomasgillis
Copy link
Contributor

Pull Request Description

[Partitioned communication] Add support for meta request completion as part of the isend/irecv path.

Meta requests are common in partitioned communication and continuation API.
This PR allows a request to complete an isend/irecv call and notify a parent's completion counter when doing so.

This is needed for tracking down the completion of partitioned communication requests at low cost

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

adds support for meta request (parent request relying on the completion of
subrequest) in the Isend/Irecv path.

A parent request is becoming a used pattern (continuation, partitioned
communication). Here we add a completion mechanism to the isend/irecv
path to notify the parent request of completion of one of the child
request.
The parent function are similar to the MPID_Isend/Irecv function except
that they have a completion counter pointer as an additional argument.

Those functions are needed by the partitioned communications to track
progress of children requests
goto fn_exit;
}

MPL_STATIC_INLINE_PREFIX int MPID_Isend_coll(const void *buf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accidentally added?

int rank,
int tag,
MPIR_Comm * comm, int attr,
MPIR_cc_t * parent_cc_ptr, MPIR_Request ** request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we pass the parent request instead of the cc_ptr?

Comment on lines +33 to +41
if (parent_cc_ptr) {
if (MPIR_Request_is_complete(*request)) {
/* if the request is already completed, decrement the parent counter */
MPIR_cc_dec(parent_cc_ptr);
} else {
/* if the request is not done yet, assign the completion pointer to the parent one and it will be decremented later */
(*request)->dev.completion_notification = parent_cc_ptr;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done by the caller instead? it might be cleaner, assuming we have access to the parent request when each operation is issued. something like:

MPIDI_NM_mpi_isend(..., &childreq);
if (childreq && !MPIR_Request_is_complete(childreq)) {
    childreq->dev.completion_notification = parentreq->cc_ptr;
} else {
    MPID_Request_complete(parentreq);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your suggestion correctly, we would exit the lock around the child request creation which might not work when async progress is enabled (as you suggested a few months ago :-) )
The reason is that if you are not in the same lock section, you have no guarantee the completion notification will be triggered: you might complete the request between the release of the lock at creation and the assignment of childreq->dev.completion_notification = parentreq->cc_ptr;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants