Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v5.0.x: Propagate the error up to the user. #12253

Open
wants to merge 1 commit into
base: v5.0.x
Choose a base branch
from

Conversation

jsquyres
Copy link
Member

@jsquyres jsquyres commented Jan 19, 2024

Make sure to reset the generalized request to guarantee not to call the free callback a second time.

Signed-off-by: George Bosilca bosilca@icl.utk.edu
(cherry picked from commit ac3647e)

This is the v5.0.x PR corresponding to main PR #11683
Refs #11681.

FYI @dalcinl

@dalcinl
Copy link
Contributor

dalcinl commented Jan 19, 2024

@jsquyres I run this PR: https://github.com/mpi4py/mpi4py-testing/actions/runs/7588222142/job/20670268020
My test is failing, but that's not necessarily Open MPI's fault, but my assumptions. This is how things go:

  1. I call Wait, then the error in the free_fn callback triggers, and MPI reports back the error. All good so far.
  2. The call to Wait does set the request to NULL, so it is not yet deallocated.
  3. I call Free to fully delete the request. But this call fails with MPI_ERR_INTERN. Are you sure that's the proper behavior? Technically, there is no error this time, as the free_fn routine is not invoked again, and I believe there is no error this time to report, as the actual error was already reported at Wait.

PS: For reference, this test is passing with MPICH, but that's simply because of their implementation choices.

@wenduwan
Copy link
Contributor

@dalcinl @jsquyres what is the plan for this change? do we want to carry it in the release next week?

@dalcinl
Copy link
Contributor

dalcinl commented Jan 23, 2024

I'll perform a new test today with ompi@main, I'm not convinced the current behavior is the right one (not that my opinion matters, of course).

@janjust
Copy link
Contributor

janjust commented Jan 24, 2024

@dalcinl Did you get a chance to do a test, our goal was to get an rc out today.

@dalcinl
Copy link
Contributor

dalcinl commented Jan 24, 2024

@janjust Yes, I added my comments in #11681 with a request for the issue to be reopened. IMHO, things are still broken. If it were my call, I would not merge this PR.

@janjust
Copy link
Contributor

janjust commented Jan 24, 2024

Thanks - let me discuss with RMs, thank you very much for the reproducers!

@jsquyres jsquyres modified the milestones: v5.0.2, v5.0.3 Feb 13, 2024
Make sure to reset the generalized request to guarantee not to call the
free callback a second time.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit ac3647e)
@wenduwan wenduwan force-pushed the pr/v5.0.x/fix-grequest-free branch from ac271d3 to 7c1a10a Compare March 21, 2024 21:27
@wenduwan
Copy link
Contributor

Now that @hppritcha has added mpi4py for v5.0.x, I rebased this PR and will check CI result. I assume it's safe to merge if CI passes.

@dalcinl
Copy link
Contributor

dalcinl commented Mar 22, 2024

@wenduwan Please take into account #12392, that's IMHO the definitive fix, the changes there should be added here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants