Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful shutdown is not graceful #2118

Open
AqlaSolutions opened this issue Apr 18, 2024 · 6 comments
Open

Graceful shutdown is not graceful #2118

AqlaSolutions opened this issue Apr 18, 2024 · 6 comments

Comments

@AqlaSolutions
Copy link
Contributor

When Cluster.ShutdownAsync(true) is called, grain actors don't receive Stopping/Stopped messages. When PID.Stop is called during ActorSystem shutdown process, the user message is created but it doesn't reach the actor code. The shutdown cancellation token is already cancelled at that moment because you use Stop method without awaiting. May be it's better to use StopAsync for children actors here?

ActorContext:
image

image

image

Expected behavior: all grains and actors receive Stopping/Stopped events.

@AqlaSolutions
Copy link
Contributor Author

Also is there a way to use Poison pill instead of Stop when shutting down?

@rogeralsing
Copy link
Contributor

Graceful shutdown is only in relation to how the member leaves the cluster. meaning it will try to properly deregister from the cluster provider and gossip to other members that it is leaving.

That being said. it would be perfectly possible to make the IIdentityLookup also wait for all actors to stop.
And maybe that is the conceptually correct thing to do here.

Open for discussion here

@AqlaSolutions
Copy link
Contributor Author

AqlaSolutions commented Apr 22, 2024

According to the IntelliSense docs for Cluster.ShutdownAsync graceful parameter is meant to gracefully shutdown all grains.

@rogeralsing
Copy link
Contributor

This is now present in this merged PR #2121

This is clearly an area that could use some more thought. e.g. should the graceful stop poison the actors, or just hard stop?
cc @mhelleborg

@mhelleborg
Copy link
Member

This is now present in this merged PR #2121

This is clearly an area that could use some more thought. e.g. should the graceful stop poison the actors, or just hard stop? cc @mhelleborg

I think both ways can make sense. Stop does gives the actors the opportunity to save state, so it might be "graceful enough", while I can imagine situations where you would want to allow the actors to complete its current messages, although it could potentially be slow.

@rogeralsing We could give the caller the ability to choose which strategy to use, potentially with a hard deadline after which it does the hard stop?

@AqlaSolutions
Copy link
Contributor Author

AqlaSolutions commented May 3, 2024

There could be an intermediate state in the queue that is not stored in the sender anymore and haven't been processed by the receiver actor yet. In such case it's necessary to poison. The real question is how to prevent new requests to be put into the queue, especially when another node doesn't know anything about the shutdown. What if some actors need to perform requests to others in their Stopped handler? We can't disconnect from cluster also because the same grain instance may spawn on another node while the previous instance is still finishing its shutdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants