Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC server graceful shutdown #1110

Closed
o-shevchenko opened this issue May 14, 2024 · 15 comments
Closed

GRPC server graceful shutdown #1110

o-shevchenko opened this issue May 14, 2024 · 15 comments
Labels
question A question about this library or its usage

Comments

@o-shevchenko
Copy link
Contributor

The context
We deploy our service in K8s and provide a gRPC streaming API so the server can hold open connections for a period of time.
We need to have a CD to redeploy the new version of the service, but we want to prevent K8s from killing our service if there is an open GRPC stream.

The question
Do we have a support for graceful shutdown of the service only when we don't have open connections?
I see this: https://github.com/grpc-ecosystem/grpc-spring/blob/master/grpc-server-spring-boot-starter/src/main/java/net/devh/boot/grpc/server/serverfactory/GrpcServerLifecycle.java#L58
But I don't see we check the state of the service itself

@o-shevchenko o-shevchenko added the question A question about this library or its usage label May 14, 2024
@ST-DDT
Copy link
Collaborator

ST-DDT commented May 14, 2024

Have you tried this config:

private Duration shutdownGracePeriod = Duration.of(30, ChronoUnit.SECONDS);

@o-shevchenko
Copy link
Contributor Author

Thanks
I haven't tried it yet. I will test it with K8s and let you know the result

@o-shevchenko
Copy link
Contributor Author

o-shevchenko commented May 16, 2024

Looks like it works. At least now I see that K8s can't kill it for configured period of time.
Additionaly to shutdownGracePeriod =-1 I configured terminationGracePeriodSeconds for 24h (just for testing).
I also tried to adjust various confs:

grpc:
  server:
    port: 6565
    reflection-service-enabled: true
    shutdown-grace-period: -1
    enable-keep-alive: true
    keep-alive-time: 86400
    keep-alive-timeout: 86400
    permit-keep-alive-without-calls: true
    permit-keep-alive-time: 86400

But after 5 minutes the app is getting killed anyway. I can't find a conf that is responsible for that.

[SpringApplicationShutdownHook] [trace_id=, span_id=]n.d.b.g.s.s.GrpcServerLifecycle          : Completed gRPC server shutdown

Looks like it's Spring conf. I will try to experiment with it more

@ST-DDT
Copy link
Collaborator

ST-DDT commented May 16, 2024

You could add a log line/debug break point here:

To check if the waiting gets interrupted somehow.

@o-shevchenko
Copy link
Contributor Author

Thanks, I'm already looking into such a logic. It's not easy to debug everything with K8s. I will try to add more logs by DEBUG or use Telepresence or something to understand why the service is getting killed after 5 minutes.

@ST-DDT
Copy link
Collaborator

ST-DDT commented May 16, 2024

Depending on your setup debugging in K8s is easy.
Just expose an additional port or tunnel/port-forward(?) into the container and then connect as usual.

@o-shevchenko
Copy link
Contributor Author

o-shevchenko commented May 21, 2024

The connection is closed from the K8 side. When I run the server without K8s and send kill -TERM to the Java process, it waits to close all connections properly. For k8s, the connection is closed, service shut down, and k8s kill container. Need to check ingress timeouts. Or maybe I need to adjust keep-alive confs as well

@ST-DDT
Copy link
Collaborator

ST-DDT commented May 21, 2024

Thanks for the update

@o-shevchenko
Copy link
Contributor Author

o-shevchenko commented May 22, 2024

When running a Java process inside a Docker container, sending a SIGTERM signal (kill -TERM 1) results in immediate termination rather than a graceful shutdown. This issue does not occur when running the same Java process locally and do the same kill.

kubectl exec -it pod_id -- /bin/bash
kill -TERM 1

Locally, it works fine but when server is inside Docker container graceful shutdown doesn't work and I can't understand why localServer.awaitTermination(); immediately kills the server. I don't see any InterruptedException when I connect via 5005.

@o-shevchenko
Copy link
Contributor Author

I'm running out of ideas. Do you have any ideas on further investigation or narrowing down the scope?
Thanks!

@ST-DDT
Copy link
Collaborator

ST-DDT commented May 22, 2024

Sorry, unfortunately not.

@o-shevchenko
Copy link
Contributor Author

I think localServer.awaitTermination(); doesn't work as I expect :( . I'm working on implementing a custom ShutDown Hook to check the number of active streams before terminating.
This article describes similar problem https://fedor.medium.com/shutting-down-grpc-services-gracefully-961a95b08f8
Just FYI. Thanks for the help

@ST-DDT
Copy link
Collaborator

ST-DDT commented May 22, 2024

Maybe also create an issue upstream in grpc-java and link it here.
Maybe they can add a build in variant as well because I cannot imagine that you are the only one having this problem.

@o-shevchenko
Copy link
Contributor Author

Yes, I expected it should already be handled downstream. Creating an issue on grpc-java is a good thing.

@o-shevchenko
Copy link
Contributor Author

I've created an issue: grpc/grpc-java#11229

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A question about this library or its usage
Projects
None yet
Development

No branches or pull requests

2 participants