Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Really allow graceful shutdown #30

Open
flub opened this issue Jan 10, 2023 · 4 comments
Open

Really allow graceful shutdown #30

flub opened this issue Jan 10, 2023 · 4 comments

Comments

@flub
Copy link
Contributor

flub commented Jan 10, 2023

Ok, that's a pretty rubbish title. But what we really need to be able to do is:

  • Signal shutdown to the server.
  • Server stops accepting new connections.
  • Server keeps handling existing connections.
  • Once there are no more connections the server finishes, an event you can away.

The way the server is used in iroh-memstore would allow making use of this to nicely terminate. And I'm pretty sure the hyper and quinn transports allow us to implement this here.

This is kind of already done for the http2 transport, but because there was no use yet for the explicit waiting API it was not handled so needs extending.

@rklaehn
Copy link
Collaborator

rklaehn commented Jan 13, 2023

So this is two parts:

  1. implement graceful shutdown for the remaining transports
  2. Expose a consistent API

Question: do we need this for the memory transport? Probably not, right?

@flub
Copy link
Contributor Author

flub commented Jan 13, 2023

I think also the memory channel is affected. You could have an RPC running while the ServerChannel is being closed. You still want that rcp channel to stay alive until it is done.

Though this does pose a question wrt to how to handle an rpc which is infinitely streaming like the watch. I hadn't considered that yet. I guess you need a grace period or some other way to handle those.

@rklaehn
Copy link
Collaborator

rklaehn commented Feb 15, 2023

Maybe that is a stupid question, but what is the actual benefit of a graceful shutdown? I expect services to handle all sorts of non graceful scenarios such as network outages and power outages without getting into a weird state. I also want to be able to kill a process with kill -9 without something bad happening.

So why not always shut down non-gracefully, if it makes things simpler?

I guess that is me being influenced by the erlang "let it crash" philosophy that also made it into akka, and that I found quite appealing...

https://medium.com/@vamsimokari/erlang-let-it-crash-philosophy-53486d2a6da

@flub
Copy link
Contributor Author

flub commented Feb 15, 2023

For completeness (and because I'm waiting on CI... 😉 ) I'll add some reasons here from the out-of-band discussion:

  • When doing rolling-restarts of a group of servers you want clients to not notice. This means their existing requests should finish without error while they would send new requests to a different server. This can significantly reduce the error rate.
  • It allows seeing regular maintenance independently from failures in metrics, also on the client side. It is useful to have your client- and server-side metrics agree on error ratios etc.
  • This is additional to clients having to handle server errors, both contribute to having a reliable system.
  • Any ports in use by the server should be ready for re-use to aid fast restarts. After a crash the OS might not release ports right away.

The erlang let-it-crash philosophy is about how to handle errors. It's a good philosophy in the face of decent process supervision. But this doesn't have to mean the only way to shut down a server is to crash it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants