Fix error handling for generalized requests #12392

dalcinl · 2024-03-05T12:20:14Z

This is a followup of #11683 from @bosilca addressing outstanding issues related to error handling with generalized requests. Some of these issues where reported here #11681 (comment). I did my best to produce separate commits setting the stage for the fixes specific to generalized requests. I believe this PR is easier to review if you go commit by commit.

Throughout testing is available in this mpi4py branch to be merged once this PR lands in v5.0.x.
A test run of that mpi4py branch against this PR was submitted here.

dalcinl · 2024-03-06T14:02:54Z

@jsquyres Given that you reviewed, approved, and merged #11683, may I ask for your review to this one? I will also need help with putting it into v5.0.x, I'm not really sure how that work (just a cherry-pick from commits in main?).

jsquyres

Sorry for the delay in responding; I had to go refresh myself on all these details.

This is unfortunately harping on some grey areas in the MPI standard. ☹️ I think there's 3 areas of question:

Why does Open MPI require 2 calls to actually free the resources? MPICH apparently does not.

I think that this is an area of ambiguity in the standard. Which of Open MPI or MPICH is right? I'm not sure.
For better or worse, Open MPI's current intended usage is like this:

// Or the equivalent in sessions
MPI_Comm_set_errhandler(MPI_COMM_SELF, MPI_ERRORS_RETURN);

int err = MPI_Wait(&req);
if (MPI_SUCCESS != err && req_is_generalized_request) {
    // Try again, because we might be in the case 
    // where the user-defined free function failed
    err = MPI_Request_free(&req);
}
if (MPI_SUCCESS != err) {
    // handle error
}

In the above code sample, why return an error from MPI_REQUEST_FREE if there was no error in freeing the internal MPI data structures?
- Yeah, you're right: that's not intended. If MPI_REQUEST_FREE is able to free everything successfully, it should return MPI_SUCCESS. It shouldn't be affected by the previous error returned by the user's free function.
In the above code sample, why doesn't MPI_REQUEST_FREE set the request to MPI_REQUEST_NULL?
- Yeah, you're right: that's not intended. If MPI_REQUEST_FREE is able to free everything successfully, it should set the INOUT request parameter to MPI_REQUEST_NULL.

I.e., a fix should probably ensure that when the user's free function returns an error:

We set the user's free function pointer to NULL
We return that error back up out the initial MPI function that was invoked (e.g., MPI_WAIT, in the sample code above). Depending on the first MPI function that was invoked, this may mean returning the value from the function or setting the value on the appropriate status.MPI_ERROR (if not MPI_STATUS_IGNORE).
When MPI_REQUEST_FREE is invoked, free the rest of the internal OMPI data structures. Assuming that happens properly, return MPI_SUCCESS. This might be a little complicated if we set the initial error -- from the user's free function -- on status.MPI_ERROR. I think in this case, we should reset status.MPI_ERROR back to MPI_SUCCESS. Or, at least, the non-SUCCESS value on status.MPI_ERROR shouldn't cause us to go into error handling code in the internals of MPI_REQUEST_FREE.

Finally, I'll note that the intended behavior didn't make it into documentation anywhere. Sigh. We definitely need to add this to the docs somewhere -- the individual TEST/WAIT man pages are probably the right place to document this behavior.

jsquyres · 2024-03-12T21:13:53Z

ompi/request/request.h

@@ -532,7 +532,7 @@ static inline int ompi_request_complete(ompi_request_t* request, bool with_signa

            ompi_wait_sync_t *tmp_sync = (ompi_wait_sync_t *) OPAL_ATOMIC_SWAP_PTR(&request->req_complete,
                                                                                    REQUEST_COMPLETED);
-            if( REQUEST_PENDING != tmp_sync ) {
+            if (REQUEST_PENDING != tmp_sync && REQUEST_COMPLETED != tmp_sync) {


I think that a core premise of ompi_request_complete() is that it should only be invoked once for a given request. It wasn't intended to be safe to be called multiple times -- e.g., the first time, it actually completes the request, and subsequent times it effectively does nothing.

ompi_request_complete() called a second time should either fail cleanly returning error, or just happily continue as a non-op because the request already completed, but it should NEVER segfault as would happen currently if the user call MPI_Request_complete() two times. The MPI standard does not say whether the routine can be called multiple times on the same request handle. With this fix, I'm just playing safe and allowing multiple calls to MPI_Request_complete(). For reference, IIRC, MPICH errors not the second but the third time, this is something I was planning to report and/or eventually fix.

IMHO, it is probably easier to just allow multiple MPI_Request_complete() calls on a request, just like I'm doing here. If you do not like such semantics and prefer to error, I believe the fix is trivial.

jsquyres · 2024-03-12T21:15:46Z

ompi/mpi/c/request_free.c

@@ -57,6 +57,10 @@ int MPI_Request_free(MPI_Request *request)
    }

    rc = ompi_request_free(request);
+    if (OPAL_UNLIKELY(OMPI_SUCCESS != rc && MPI_REQUEST_NULL != *request)) {
+        (*request)->req_status.MPI_ERROR = rc;


Did ompi_request_free(request) already set this?

This is the generic user-level MPI_Request_free() for all kinds of requests, not only generalized requests. I'm not so expert with the Open MPI implementation to 100% sure ompi_request_free() will set the status error field on errors. Without that guarantee, my use of ompi_errhandler_request_invoke() may not work as I intended. Note that ompi_errhandler_request_invoke() has the side-effect of cleaning up the request and deallocating the object. Sorry, it's a bit complicated, but I had to work within the constraints of what I was given.

For example, look at ompi_comm_request_free()... I don't see that routine setting the status error field. Therefore, I think being overzealous here is justified. IMHO, this line is a good one and should stay. Otherwise, all internal request types used in Open MPI (and future ones that may be implemented) should be audited and eventually fixed to make sure a failing free sets the error field.

dalcinl · 2024-03-13T07:32:28Z

@jsquyres I think I need to clarify what this PR is exactly doing, which IMHO is an improvement over previous behavior.

When MPI_Wait/MPI_Test fail on a generalized (because of free_fn), now the following happen:

The MPI_Wait/MPI_Test/MPI_Request_free call return an error code.
All the internal resources to the request object are release, no memory leaks.
The wait/test/request_free call sets the request to MPI_REQUEST_NULL.

Therefore, from (3), there is no need at all to call MPI_Request_free() again to dispose the failed request. All disposals have already occured internally, and the request handle has been set to NULL back to the user. What's the point of requiring users to deallocate a failed request, if OMPI can do it properly after the changes in this PR?
From the point of view of a user, all what she should care about is that a request failed, and that error has to be handled somehow, but release of resources are no longer a concern.

You can play with the following simple reproducer. I used a much convoluted example (using multiple calls to MPI_Request_complete, calling MPI_Request_get_status, etc) to make sure everything worked as expected. I ran all that against MPICH to double-check the behavior is identical (modulo that MPICH issue with multiple complete calls I mentioned before).

#include <assert.h>
#include <stdio.h>
#include <mpi.h>

static int query_fn  (void *ctx, MPI_Status *s) { return MPI_SUCCESS;   }
static int free_fn   (void *ctx)                { return MPI_ERR_OTHER; }
static int cancel_fn (void *ctx, int c)         { return MPI_SUCCESS;   }

int main(int argc, char *argv[])
{
  int ierr;
  MPI_Request request = MPI_REQUEST_NULL;
  MPI_Init(&argc, &argv);

  MPI_Comm_set_errhandler(MPI_COMM_SELF, MPI_ERRORS_RETURN);
  MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
  MPI_Grequest_start(query_fn, free_fn, cancel_fn, NULL, &request);

  ierr = MPI_Grequest_complete(request);
  assert (MPI_SUCCESS == ierr);

  //int flag = 0;
  //ierr = MPI_Test(&request, &flag, MPI_STATUS_IGNORE);
  ierr = MPI_Wait(&request, MPI_STATUS_IGNORE);
  //ierr = MPI_Request_free(&request);
  assert (MPI_SUCCESS != ierr); // check that the call above actually failed
  assert (MPI_REQUEST_NULL == request); // check that the request handle was deallocated.
  
  MPI_Finalize();
  return 0;
}

dalcinl · 2024-03-20T11:32:36Z

Gentle requests to keep this PR moving forward.

FYI, mpi4py has very good testing of error handling for generalized requests, albeit disabled waiting for the fixes from this PR. See the following link:
https://github.com/mpi4py/mpi4py/blob/master/test/test_grequest.py#L49
I have a commit waiting for this PR to be merge, adding even more testing
mpi4py/mpi4py@88f6064
BTW, in that test code code, lines with

if greq:
    greq.Free()

are no longer required, as the previous greq.Wait() call already freed the request (while properly reporting the MPI error that mpi4py converts to a Python exception).

dalcinl · 2024-04-03T10:53:36Z

@jsquyres Gentle reminder to move this PR forward and update #12253.

A second call to ompi_request_complete() segfaults. Make it mostly an non-op. Alternatively, the second invocation could generate an error, but let's play safe for now. Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>

If ompi_request_free() fails, use the ompi_errhandler_request_invoke() machinery used in the wait/test routines. This will have the side-effect of attempting to free the request a second time. Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>

Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>

dalcinl · 2024-05-08T06:07:25Z

@jsquyres @wenduwan One month after, a new reminder to get this PR moving forward and eventually backport everything to v5.0.x. If Jeff is not able to do the review, maybe you guys can assign an alternate reviewer?

wenduwan · 2024-05-08T13:36:46Z

Apologies for the delay.

This change appears to touch common critical path. I prefer to get more experienced eyes on this.

@devreal @bosilca I noticed that you were engaged in #12525. I'm not sure if it's related but could you kindly review this change too?

github-actions bot added the Target: main label Mar 5, 2024

dalcinl force-pushed the bugfix/greq-free branch from dd44db1 to 8725699 Compare March 6, 2024 10:24

jsquyres reviewed Mar 12, 2024

View reviewed changes

dalcinl force-pushed the bugfix/greq-free branch from 8725699 to 846a3ba Compare March 18, 2024 11:48

dalcinl mentioned this pull request Mar 22, 2024

v5.0.x: Propagate the error up to the user. #12253

Open

dalcinl force-pushed the bugfix/greq-free branch from 846a3ba to 4012bed Compare April 3, 2024 10:51

dalcinl force-pushed the bugfix/greq-free branch from 4012bed to 7273500 Compare April 21, 2024 13:46

dalcinl requested a review from jsquyres April 21, 2024 13:47

dalcinl added 4 commits May 6, 2024 20:06

request: Fix multiple calls to ompi_request_complete()

283aa97

A second call to ompi_request_complete() segfaults. Make it mostly an non-op. Alternatively, the second invocation could generate an error, but let's play safe for now. Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>

grequest: Fix MPI_Request_get_status() to propagate errors form query_fn

ea5cb80

Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>

grequest: Fix propagation of errors from callback functions.

d73fb7a

Signed-off-by: Lisandro Dalcin <dalcinl@gmail.com>

dalcinl force-pushed the bugfix/greq-free branch from 7273500 to d73fb7a Compare May 6, 2024 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix error handling for generalized requests #12392

Fix error handling for generalized requests #12392

dalcinl commented Mar 5, 2024

dalcinl commented Mar 6, 2024

jsquyres left a comment •

edited

jsquyres Mar 12, 2024

dalcinl Mar 13, 2024 •

edited

jsquyres Mar 12, 2024

dalcinl Mar 13, 2024

dalcinl commented Mar 13, 2024

dalcinl commented Mar 20, 2024 •

edited

dalcinl commented Apr 3, 2024

dalcinl commented May 8, 2024

wenduwan commented May 8, 2024

Fix error handling for generalized requests #12392

Are you sure you want to change the base?

Fix error handling for generalized requests #12392

Conversation

dalcinl commented Mar 5, 2024

dalcinl commented Mar 6, 2024

jsquyres left a comment • edited

Choose a reason for hiding this comment

jsquyres Mar 12, 2024

Choose a reason for hiding this comment

dalcinl Mar 13, 2024 • edited

Choose a reason for hiding this comment

jsquyres Mar 12, 2024

Choose a reason for hiding this comment

dalcinl Mar 13, 2024

Choose a reason for hiding this comment

dalcinl commented Mar 13, 2024

dalcinl commented Mar 20, 2024 • edited

dalcinl commented Apr 3, 2024

dalcinl commented May 8, 2024

wenduwan commented May 8, 2024

jsquyres left a comment •

edited

dalcinl Mar 13, 2024 •

edited

dalcinl commented Mar 20, 2024 •

edited