Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graceful shutdown #10701

Draft
wants to merge 3 commits into
base: 4.5.x
Choose a base branch
from
Draft

Add graceful shutdown #10701

wants to merge 3 commits into from

Conversation

yawkat
Copy link
Member

@yawkat yawkat commented Apr 11, 2024

This PR adds a new interface, GracefulShutdownLifecycle, that parts of the framework can implement to provide graceful shutdown functionality. It is designed to complement LifeCycle.stop(), though the implementations are unrelated.

The netty HTTP server, on graceful shutdown, will stop accepting new connections, and wind down existing ones. HTTP/1 connections are closed as soon as in-progress requests have been served, while HTTP/2 and /3 connections send a GOAWAY packet to indicate to the client that no further requests will be processed.

There is also a readiness health indicator that turns DOWN when a graceful shutdown begins. To keep the health endpoint accessible, there is a new micronaut.server.netty.listeners.*.support-graceful-shutdown config option to disable graceful shutdown for a management port.

Graceful shutdown is triggered by application code through a new GracefulShutdownManager API.

Graceful shutdown is best-effort. In-progress requests may take too long, or HTTP/2 clients may refuse to shut down their connection. For this reason, GracefulShutdownLifecycle returns a CompletionStage. The stage will complete when a graceful shutdown has been achieved, but this may be never, so a user should always add a timeout. After a graceful shutdown (or timeout), a normal LifeCycle.stop should be performed.

Docs are still pending, I want Graeme to review the API before I write it up.

This PR adds a new interface, GracefulShutdownLifecycle, that parts of the framework can implement to provide graceful shutdown functionality. It is designed to complement LifeCycle.stop(), though the implementations are unrelated.

The netty HTTP server, on graceful shutdown, will stop accepting new connections, and wind down existing ones. HTTP/1 connections are closed as soon as in-progress requests have been served, while HTTP/2 and /3 connections send a GOAWAY packet to indicate to the client that no further requests will be processed.

There is also a readiness health indicator that turns DOWN when a graceful shutdown begins. To keep the health endpoint accessible, there is a new `micronaut.server.netty.listeners.*.support-graceful-shutdown` config option to disable graceful shutdown for a management port.

Graceful shutdown is triggered by application code through a new GracefulShutdownManager API.

Graceful shutdown is best-effort. In-progress requests may take too long, or HTTP/2 clients may refuse to shut down their connection. For this reason, GracefulShutdownLifecycle returns a CompletionStage. The stage will complete when a graceful shutdown has been achieved, but this may be never, so a user should always add a timeout. After a graceful shutdown (or timeout), a normal LifeCycle.stop should be performed.

Docs are still pending, I want Graeme to review the API before I write it up.
@yawkat yawkat added the type: enhancement New feature or request label Apr 11, 2024
@yawkat yawkat added this to the 4.5.0 milestone Apr 11, 2024
Copy link

sonarcloud bot commented Apr 11, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
1 New Bugs (required ≤ 0)
1 New Critical Issues (required ≤ 0)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

* @author Jonas Konrad
* @since 4.5.0
*/
public interface GracefulShutdownLifecycle {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure lifecycle is the right name here, since this is only graceful shutdown and not startup. Rename to GracefulShutdownCapable

* @param stages The input lifecycles
* @return A future that completes when all inputs have completed shutdown
*/
static CompletionStage<?> shutdownAll(Stream<? extends GracefulShutdownLifecycle> stages) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about timeouts? The idea is a window where graceful shutdown is allowed but then it still has to shutdown. I would expect some duration parameter here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the caller adds a timeout, and after the timeout calls normal Lifecycle.stop.


@Override
public Publisher<HealthResult> getResult() {
return Mono.just(HealthResult.builder("gracefulShutdown")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to constant rename to graceful-shutdown

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other health indicators are camel case, eg diskSpace and compositeDiscoveryClient()

* down
*/
public CompletionStage<?> shutdownGracefully() {
return GracefulShutdownLifecycle.shutdownAll(delegates.stream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need better error handling here. What happens when there are multiple implementations and one of them fails to shutdown? How do we know and log which one failed?

Say other modules like the JMS, Kafka etc. modules implement this interface I would want to know which one failed and why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface is not really designed to fail. If an implementation cannot gracefully shut down for whatever reason, it should just signal completion and Lifecycle.stop will shut it down hard.

@Override
public Publisher<HealthResult> getResult() {
return Mono.just(HealthResult.builder("gracefulShutdown")
.status(shuttingDown ? HealthStatus.DOWN : HealthStatus.UP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to return some detail showing the state of what GracefulShutdownLifecycle is currently shutting down and hence we can know what is blocking the shutdown process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ive added a shutdown state mechanism

* @return A future that completes when all {@link GracefulShutdownLifecycle} beans have shut
* down
*/
public CompletionStage<?> shutdownGracefully() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what calls GracefulShutdownManager.shutdownGracefully()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR nothing. could make it a management endpoint, or just leave it to application code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already a shutdown endpoint so if enabled it should be called from that, but also we would likely need to integrate the shutdown hooks we add already to handle SIGTERM

See https://dratler-shay.medium.com/how-to-optimize-graceful-shutdown-in-kubernetes-and-avoid-customer-impact-8c93ee3a6483

We need to be able to specify a termination grace period as well, in Spring this spring.lifecycle.timeout-per-shutdown-phase=20s

@graemerocher
Copy link
Contributor

Perhaps we should try and address #6493 in this PR as well

Copy link
Collaborator

@sdelamo sdelamo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to include documentation about this new API in the main documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
Status: No status
3 participants