Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling update for shinyproxy deployment causes orphan pods #169

Closed
ramkumarg1 opened this issue Aug 19, 2019 · 7 comments
Closed

Rolling update for shinyproxy deployment causes orphan pods #169

ramkumarg1 opened this issue Aug 19, 2019 · 7 comments

Comments

@ramkumarg1
Copy link

Hi, when there is a change in application.yaml and the rolling update is chosen (with replicas set to 0 and then back to 1) - mainly because the new shinyproxy image needs to be downloaded from the artifactory - All the earlier pods that were spun up by the previous shinyproxy get left behind as zombie's

To reproduce:

  • kubectl get all

NAME READY STATUS RESTARTS AGE
pod/shinyproxy-7f76d48c79-8x9hs 2/2 Running 0 41m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/shinyproxy NodePort 172.30.85.191 8080:32094/TCP 40m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/shinyproxy 1 1 1 1 41m

NAME DESIRED CURRENT READY AGE
replicaset.apps/shinyproxy-7f76d48c79 1 1 1 41m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/shinyproxy shinyproxy-aap.apps.cpaas.service.test shinyproxy None

  • Logon to the app (In my case I am using LDAP auth and /app_direct/ to a shiny application (new pod for the application is spun up) - as expected

NAME READY STATUS RESTARTS AGE
pod/shinyproxy-7f76d48c79-8x9hs 2/2 Running 0 43m
pod/sp-pod-e7603441-03ba-470b-925a-22cfba1716de 1/1 Running 0 12s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/shinyproxy NodePort 172.30.85.191 8080:32094/TCP 43m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/shinyproxy 1 1 1 1 43m

NAME DESIRED CURRENT READY AGE
replicaset.apps/shinyproxy-7f76d48c79 1 1 1 43m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/shinyproxy shinyproxy-aap.apps.cpaas.service.test shinyproxy None

  • After the new shinyproxy image build:

kubectl scale --replicas=0 deployment/shinyproxy
deployment.extensions/shinyproxy scaled

kubectl scale --replicas=1 deployment/shinyproxy
deployment.extensions/shinyproxy scaled

  • New Image has been downloaded for shiny proxy and container being created.

NAME READY STATUS RESTARTS AGE
pod/shinyproxy-7f76d48c79-l5fvw 0/2 ContainerCreating 0 4s
pod/sp-pod-e7603441-03ba-470b-925a-22cfba1716de 1/1 Running 0 1m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/shinyproxy NodePort 172.30.85.191 8080:32094/TCP 44m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/shinyproxy 1 1 1 0 45m

NAME DESIRED CURRENT READY AGE
replicaset.apps/shinyproxy-7f76d48c79 1 1 0 45m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/shinyproxy shinyproxy-aap.apps.cpaas.service.test shinyproxy None

  • At this stage my web-application is irresponsive - the only thing to do is to close the tab/window. And the pod (for the R application) continues to stay unless manually deleted.

  • The pod which is consuming resources is not accessible, because the new service points to the updated deployment and application can be only accessed through a route over the service

  • It also is very difficult to identify which of the pods are the stale ones and delete manually

@dseynaev
Copy link

Hi @ramkumarg1

When shinyproxy receives as SIGTERM signal (when the deployment is scaled down), it should gracefully terminate by stopping all application pods first. You may have to increase the grace period terminationGracePeriodSeconds in the pod spec (default is 30s). If shinyproxy is unable to terminate within this period, it will receive a SIGKILL and be terminated immediately, leaving behind orphan pods. More info here: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/

@ramkumarg1
Copy link
Author

ramkumarg1 commented Aug 20, 2019

Thanks @dseynaev I changed the deployment spec to include terminationGracePeriodSeconds - but it didnt' make a difference. The pod was killed immediately - Perhaps, this issue is linked to kubernetes/kubernetes#47576 where spring boot needs to handle SIGTERM gracefully?

spec:
  terminationGracePeriodSeconds : 180
  containers:
  - name: shinyproxy

@muscovitebob
Copy link

We observe the same issue with zombie pods, and for us the termination grace period setting also does not resolve this.

@fmannhardt
Copy link

I have the same issue and this is what is logged by shiny/containerproxy upon termination:

2020-01-30 10:56:56.785  INFO 1 --- [           main] e.o.c.ContainerProxyApplication          : Started ContainerProxyApplication in 39.115 seconds (JVM running for 43.619)
2020-01-30 10:57:01.374  INFO 1 --- [  XNIO-2 task-1] io.undertow.servlet                      : Initializing Spring FrameworkServlet 'dispatcherServlet'
2020-01-30 10:57:01.375  INFO 1 --- [  XNIO-2 task-1] o.s.web.servlet.DispatcherServlet        : FrameworkServlet 'dispatcherServlet': initialization started
2020-01-30 10:57:01.507  INFO 1 --- [  XNIO-2 task-1] o.s.web.servlet.DispatcherServlet        : FrameworkServlet 'dispatcherServlet': initialization completed in 131 ms
2020-01-30 10:57:26.275  INFO 1 --- [ XNIO-2 task-16] e.o.containerproxy.service.UserService   : User logged in [user: **]
2020-01-30 10:57:35.802  INFO 1 --- [  XNIO-2 task-3] e.o.containerproxy.service.ProxyService  : Proxy activated [user: ***] [spec: insight] [id: 9274ad33-665a-4d47-bab5-6c4b39a618b8]
2020-01-30 10:59:02.376  INFO 1 --- [       Thread-2] ConfigServletWebServerApplicationContext : Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@2b2948e2: startup date [Thu Jan 30 10:56:24 GMT 2020]; root of context hierarchy
2020-01-30 10:59:02.377 ERROR 1 --- [pool-4-thread-1] java.io.InputStreamReader                : Error while pumping stream.
java.io.EOFException: null
	at okio.RealBufferedSource.require(RealBufferedSource.java:61) ~[okio-1.15.0.jar!/:na]
	at okio.RealBufferedSource.readHexadecimalUnsignedLong(RealBufferedSource.java:303) ~[okio-1.15.0.jar!/:na]
	at okhttp3.internal.http1.Http1Codec$ChunkedSource.readChunkSize(Http1Codec.java:469) ~[okhttp-3.12.0.jar!/:na]
	at okhttp3.internal.http1.Http1Codec$ChunkedSource.read(Http1Codec.java:449) ~[okhttp-3.12.0.jar!/:na]
	at okio.RealBufferedSource$1.read(RealBufferedSource.java:439) ~[okio-1.15.0.jar!/:na]
	at java.io.InputStream.read(InputStream.java:101) ~[na:1.8.0_171]
	at io.fabric8.kubernetes.client.utils.BlockingInputStreamPumper.run(BlockingInputStreamPumper.java:49) ~[kubernetes-client-4.2.2.jar!/:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_171]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_171]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
2020-01-30 10:59:02.394  INFO 1 --- [       Thread-2] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
2020-01-30 10:59:02.403  INFO 1 --- [       Thread-2] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans
2020-01-30 10:59:02.514  WARN 1 --- [       Thread-2] .s.c.a.CommonAnnotationBeanPostProcessor : Invocation of destroy method failed on bean with name 'proxyService': eu.openanalytics.containerproxy.ContainerProxyException: Failed to stop container
2020-01-30 10:59:02.525  INFO 1 --- [       Thread-2] io.undertow.servlet                      : Destroying Spring FrameworkServlet 'dispatcherServlet'

@fmannhardt
Copy link

I found a solution for this issue. This is not actually a problem in shinyproxy or containerproxy as the Spring Boot app is correctly and gracefully shut down.

The problem is the kubctl proxy sidecar container. For Kubernetes it is not clear that containerproxy relies on the sidecar container to communicate with Kubernetes itself. So, on a new deployment Kubernetes will send SIGTERM to both the proxy and the sidecar container in all the old pods. The sidecar container will terminate immediately and containerproxy fails to communicate with Kubernetes.

I read that Kubernetes is about to solve these startup and shutdown dependencies in v1.18 as documented here:
kubernetes/enhancements#753
https://banzaicloud.com/blog/k8s-sidecars/

Until then there is a simple workaround to put the following lifecycle annotation to the sidecar container:

          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 5"] # wait 5 seconds to let shinyproxy remove the pods on graceful shutdown

fmannhardt added a commit to fmannhardt/shinyproxy-config-examples that referenced this issue Feb 15, 2020
Allow graceful shutdown by delaying the SIGTERM to the sidecar container by some time, for example, 5s. This solves the issue here:
openanalytics/shinyproxy#169
@muscovitebob
Copy link

I can confirm @fmannhardt's fix resolves this. Thank you so much!

@LEDfan
Copy link
Member

LEDfan commented Mar 3, 2021

Hi all

With recent versions of ShinyProxy (I'm not sure which version exactly, but at least ShinyProxy 2.3.1) there is no need to use a kube-proxy sidecar. ShinyProxy automatically detects the location and authentication of the Kubernetes API.
Therefore I think this problem is automatically solved.
Nevertheless, thank you for your time and investigation!

@LEDfan LEDfan closed this as completed Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants