Graceful shutdown is not graceful #2118

AqlaSolutions · 2024-04-18T22:02:43Z

When Cluster.ShutdownAsync(true) is called, grain actors don't receive Stopping/Stopped messages. When PID.Stop is called during ActorSystem shutdown process, the user message is created but it doesn't reach the actor code. The shutdown cancellation token is already cancelled at that moment because you use Stop method without awaiting. May be it's better to use StopAsync for children actors here?

ActorContext:

Expected behavior: all grains and actors receive Stopping/Stopped events.

The text was updated successfully, but these errors were encountered:

AqlaSolutions · 2024-04-19T07:23:22Z

Also is there a way to use Poison pill instead of Stop when shutting down?

rogeralsing · 2024-04-21T17:07:43Z

Graceful shutdown is only in relation to how the member leaves the cluster. meaning it will try to properly deregister from the cluster provider and gossip to other members that it is leaving.

That being said. it would be perfectly possible to make the IIdentityLookup also wait for all actors to stop.
And maybe that is the conceptually correct thing to do here.

Open for discussion here

AqlaSolutions · 2024-04-22T07:58:47Z

According to the IntelliSense docs for Cluster.ShutdownAsync graceful parameter is meant to gracefully shutdown all grains.

rogeralsing · 2024-05-03T05:42:50Z

This is now present in this merged PR #2121

This is clearly an area that could use some more thought. e.g. should the graceful stop poison the actors, or just hard stop?
cc @mhelleborg

mhelleborg · 2024-05-03T06:06:39Z

This is now present in this merged PR #2121

This is clearly an area that could use some more thought. e.g. should the graceful stop poison the actors, or just hard stop? cc @mhelleborg

I think both ways can make sense. Stop does gives the actors the opportunity to save state, so it might be "graceful enough", while I can imagine situations where you would want to allow the actors to complete its current messages, although it could potentially be slow.

@rogeralsing We could give the caller the ability to choose which strategy to use, potentially with a hard deadline after which it does the hard stop?

AqlaSolutions · 2024-05-03T08:26:59Z

There could be an intermediate state in the queue that is not stored in the sender anymore and haven't been processed by the receiver actor yet. In such case it's necessary to poison. The real question is how to prevent new requests to be put into the queue, especially when another node doesn't know anything about the shutdown. What if some actors need to perform requests to others in their Stopped handler? We can't disconnect from cluster also because the same grain instance may spawn on another node while the previous instance is still finishing its shutdown.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown is not graceful #2118

Graceful shutdown is not graceful #2118

AqlaSolutions commented Apr 18, 2024

AqlaSolutions commented Apr 19, 2024

rogeralsing commented Apr 21, 2024

AqlaSolutions commented Apr 22, 2024 •

edited

rogeralsing commented May 3, 2024

mhelleborg commented May 3, 2024

AqlaSolutions commented May 3, 2024 •

edited

Graceful shutdown is not graceful #2118

Graceful shutdown is not graceful #2118

Comments

AqlaSolutions commented Apr 18, 2024

AqlaSolutions commented Apr 19, 2024

rogeralsing commented Apr 21, 2024

AqlaSolutions commented Apr 22, 2024 • edited

rogeralsing commented May 3, 2024

mhelleborg commented May 3, 2024

AqlaSolutions commented May 3, 2024 • edited

AqlaSolutions commented Apr 22, 2024 •

edited

AqlaSolutions commented May 3, 2024 •

edited