Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always raise a received result-as-error in spawn tasks #288

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

goodboy
Copy link
Owner

@goodboy goodboy commented Jan 3, 2022

Hopefully fixes #287.

This small change ensures that any actor spawn task that receives an error from its remote target will raise that error locally thus triggering a surrounding nursery error and possible teardown.

ping @didimelli

@goodboy
Copy link
Owner Author

goodboy commented Jan 4, 2022

Ahah, so the (main) issue with this solution is that we don't get all errors from all child actors to be accumulated as previous since some actors may get cancelled and their results (even when errors) abandoned.

I've been trying some other things but likely the way to solve the underlying issue and get as much error collecting as possible would be to do a 2nd pass of all child portals at nursery close once all subprocs are known to be terminated and then fill any later received errors into the errors: dict.

In an effort to support `.run_in_actor()` error raising by our nursery
we ideally collect as many child errors as possible during nursery
teardown and error collection/propagation.

Here we try a couple things,
- factor the per-actor error y retrieval into a new
  `pack_and_report_errors()`
- when a result retrieval via `exhaust_portal()` is cancelled pack the
  `trio.Cancelled` into the `errors: dict` expecting to rescan for
  errors for any such entries after process termination.
- at the end of the spawn task conduct a timed-out 2nd retrieval of any
  late delivered error from the child task for each entry in `errors`
  containing a cancelled.

This causes a bunch of cancellation tests to still fail seemingly due to
the race case where the OCA nursery may have requested cancellation of
children *before* they can remote-error and thus the `MultiError`
matching expectations aren't going to (always) be correct. Previously we
were always waiting for all `.run_in_actor()` results to arrive and
**not** raising any errors early (which in turn triggers local
cancellation).
@goodboy goodboy added bug Something isn't working supervision labels Jan 5, 2022
@goodboy
Copy link
Owner Author

goodboy commented Jan 5, 2022

Oh yeah ignore the mp runs there 😂
Code hasn't been changed for that backend!

@@ -369,6 +397,11 @@ async def new_proc(
f"{subactor.uid}")
nursery.cancel_scope.cancel()

# if errors:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the alternative approach mentioned in #287: cancelling the ria_nursery instead of raising any final error retrieved from the portal.

I think this approach is a bit more, hard to extend wrt supervisor strats being overloaded by a user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working supervision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nursery does not re-raise error from .run_in_actor() remote task
1 participant