Rolling update for shinyproxy deployment causes orphan pods #169

ramkumarg1 · 2019-08-19T02:27:03Z

Hi, when there is a change in application.yaml and the rolling update is chosen (with replicas set to 0 and then back to 1) - mainly because the new shinyproxy image needs to be downloaded from the artifactory - All the earlier pods that were spun up by the previous shinyproxy get left behind as zombie's

To reproduce:

kubectl get all

NAME READY STATUS RESTARTS AGE
pod/shinyproxy-7f76d48c79-8x9hs 2/2 Running 0 41m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/shinyproxy NodePort 172.30.85.191 8080:32094/TCP 40m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/shinyproxy 1 1 1 1 41m

NAME DESIRED CURRENT READY AGE
replicaset.apps/shinyproxy-7f76d48c79 1 1 1 41m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/shinyproxy shinyproxy-aap.apps.cpaas.service.test shinyproxy None

Logon to the app (In my case I am using LDAP auth and /app_direct/ to a shiny application (new pod for the application is spun up) - as expected

NAME READY STATUS RESTARTS AGE
pod/shinyproxy-7f76d48c79-8x9hs 2/2 Running 0 43m
pod/sp-pod-e7603441-03ba-470b-925a-22cfba1716de 1/1 Running 0 12s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/shinyproxy NodePort 172.30.85.191 8080:32094/TCP 43m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/shinyproxy 1 1 1 1 43m

NAME DESIRED CURRENT READY AGE
replicaset.apps/shinyproxy-7f76d48c79 1 1 1 43m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/shinyproxy shinyproxy-aap.apps.cpaas.service.test shinyproxy None

After the new shinyproxy image build:

kubectl scale --replicas=0 deployment/shinyproxy
deployment.extensions/shinyproxy scaled

kubectl scale --replicas=1 deployment/shinyproxy
deployment.extensions/shinyproxy scaled

New Image has been downloaded for shiny proxy and container being created.

NAME READY STATUS RESTARTS AGE
pod/shinyproxy-7f76d48c79-l5fvw 0/2 ContainerCreating 0 4s
pod/sp-pod-e7603441-03ba-470b-925a-22cfba1716de 1/1 Running 0 1m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/shinyproxy NodePort 172.30.85.191 8080:32094/TCP 44m

NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/shinyproxy 1 1 1 0 45m

NAME DESIRED CURRENT READY AGE
replicaset.apps/shinyproxy-7f76d48c79 1 1 0 45m

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/shinyproxy shinyproxy-aap.apps.cpaas.service.test shinyproxy None

At this stage my web-application is irresponsive - the only thing to do is to close the tab/window. And the pod (for the R application) continues to stay unless manually deleted.
The pod which is consuming resources is not accessible, because the new service points to the updated deployment and application can be only accessed through a route over the service
It also is very difficult to identify which of the pods are the stale ones and delete manually

dseynaev · 2019-08-19T12:31:49Z

Hi @ramkumarg1

When shinyproxy receives as SIGTERM signal (when the deployment is scaled down), it should gracefully terminate by stopping all application pods first. You may have to increase the grace period terminationGracePeriodSeconds in the pod spec (default is 30s). If shinyproxy is unable to terminate within this period, it will receive a SIGKILL and be terminated immediately, leaving behind orphan pods. More info here: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/

ramkumarg1 · 2019-08-20T01:26:50Z

Thanks @dseynaev I changed the deployment spec to include terminationGracePeriodSeconds - but it didnt' make a difference. The pod was killed immediately - Perhaps, this issue is linked to kubernetes/kubernetes#47576 where spring boot needs to handle SIGTERM gracefully?

spec:
  terminationGracePeriodSeconds : 180
  containers:
  - name: shinyproxy

muscovitebob · 2020-01-22T14:02:36Z

We observe the same issue with zombie pods, and for us the termination grace period setting also does not resolve this.

fmannhardt · 2020-01-30T11:04:44Z

I have the same issue and this is what is logged by shiny/containerproxy upon termination:

2020-01-30 10:56:56.785  INFO 1 --- [           main] e.o.c.ContainerProxyApplication          : Started ContainerProxyApplication in 39.115 seconds (JVM running for 43.619)
2020-01-30 10:57:01.374  INFO 1 --- [  XNIO-2 task-1] io.undertow.servlet                      : Initializing Spring FrameworkServlet 'dispatcherServlet'
2020-01-30 10:57:01.375  INFO 1 --- [  XNIO-2 task-1] o.s.web.servlet.DispatcherServlet        : FrameworkServlet 'dispatcherServlet': initialization started
2020-01-30 10:57:01.507  INFO 1 --- [  XNIO-2 task-1] o.s.web.servlet.DispatcherServlet        : FrameworkServlet 'dispatcherServlet': initialization completed in 131 ms
2020-01-30 10:57:26.275  INFO 1 --- [ XNIO-2 task-16] e.o.containerproxy.service.UserService   : User logged in [user: **]
2020-01-30 10:57:35.802  INFO 1 --- [  XNIO-2 task-3] e.o.containerproxy.service.ProxyService  : Proxy activated [user: ***] [spec: insight] [id: 9274ad33-665a-4d47-bab5-6c4b39a618b8]
2020-01-30 10:59:02.376  INFO 1 --- [       Thread-2] ConfigServletWebServerApplicationContext : Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@2b2948e2: startup date [Thu Jan 30 10:56:24 GMT 2020]; root of context hierarchy
2020-01-30 10:59:02.377 ERROR 1 --- [pool-4-thread-1] java.io.InputStreamReader                : Error while pumping stream.
java.io.EOFException: null
	at okio.RealBufferedSource.require(RealBufferedSource.java:61) ~[okio-1.15.0.jar!/:na]
	at okio.RealBufferedSource.readHexadecimalUnsignedLong(RealBufferedSource.java:303) ~[okio-1.15.0.jar!/:na]
	at okhttp3.internal.http1.Http1Codec$ChunkedSource.readChunkSize(Http1Codec.java:469) ~[okhttp-3.12.0.jar!/:na]
	at okhttp3.internal.http1.Http1Codec$ChunkedSource.read(Http1Codec.java:449) ~[okhttp-3.12.0.jar!/:na]
	at okio.RealBufferedSource$1.read(RealBufferedSource.java:439) ~[okio-1.15.0.jar!/:na]
	at java.io.InputStream.read(InputStream.java:101) ~[na:1.8.0_171]
	at io.fabric8.kubernetes.client.utils.BlockingInputStreamPumper.run(BlockingInputStreamPumper.java:49) ~[kubernetes-client-4.2.2.jar!/:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_171]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_171]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_171]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
2020-01-30 10:59:02.394  INFO 1 --- [       Thread-2] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans on shutdown
2020-01-30 10:59:02.403  INFO 1 --- [       Thread-2] o.s.j.e.a.AnnotationMBeanExporter        : Unregistering JMX-exposed beans
2020-01-30 10:59:02.514  WARN 1 --- [       Thread-2] .s.c.a.CommonAnnotationBeanPostProcessor : Invocation of destroy method failed on bean with name 'proxyService': eu.openanalytics.containerproxy.ContainerProxyException: Failed to stop container
2020-01-30 10:59:02.525  INFO 1 --- [       Thread-2] io.undertow.servlet                      : Destroying Spring FrameworkServlet 'dispatcherServlet'

fmannhardt · 2020-02-15T21:17:44Z

I found a solution for this issue. This is not actually a problem in shinyproxy or containerproxy as the Spring Boot app is correctly and gracefully shut down.

The problem is the kubctl proxy sidecar container. For Kubernetes it is not clear that containerproxy relies on the sidecar container to communicate with Kubernetes itself. So, on a new deployment Kubernetes will send SIGTERM to both the proxy and the sidecar container in all the old pods. The sidecar container will terminate immediately and containerproxy fails to communicate with Kubernetes.

I read that Kubernetes is about to solve these startup and shutdown dependencies in v1.18 as documented here:
kubernetes/enhancements#753
https://banzaicloud.com/blog/k8s-sidecars/

Until then there is a simple workaround to put the following lifecycle annotation to the sidecar container:

          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 5"] # wait 5 seconds to let shinyproxy remove the pods on graceful shutdown

Allow graceful shutdown by delaying the SIGTERM to the sidecar container by some time, for example, 5s. This solves the issue here: openanalytics/shinyproxy#169

muscovitebob · 2020-02-19T08:33:30Z

I can confirm @fmannhardt's fix resolves this. Thank you so much!

LEDfan · 2021-03-03T14:24:23Z

Hi all

With recent versions of ShinyProxy (I'm not sure which version exactly, but at least ShinyProxy 2.3.1) there is no need to use a kube-proxy sidecar. ShinyProxy automatically detects the location and authentication of the Kubernetes API.
Therefore I think this problem is automatically solved.
Nevertheless, thank you for your time and investigation!

fmannhardt mentioned this issue Feb 15, 2020

Allow graceful shutdown in Kubernetes config openanalytics/shinyproxy-config-examples#19

Closed

LEDfan closed this as completed Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling update for shinyproxy deployment causes orphan pods #169

Rolling update for shinyproxy deployment causes orphan pods #169

ramkumarg1 commented Aug 19, 2019

dseynaev commented Aug 19, 2019

ramkumarg1 commented Aug 20, 2019 •

edited

muscovitebob commented Jan 22, 2020

fmannhardt commented Jan 30, 2020

fmannhardt commented Feb 15, 2020

muscovitebob commented Feb 19, 2020

LEDfan commented Mar 3, 2021

Rolling update for shinyproxy deployment causes orphan pods #169

Rolling update for shinyproxy deployment causes orphan pods #169

Comments

ramkumarg1 commented Aug 19, 2019

dseynaev commented Aug 19, 2019

ramkumarg1 commented Aug 20, 2019 • edited

muscovitebob commented Jan 22, 2020

fmannhardt commented Jan 30, 2020

fmannhardt commented Feb 15, 2020

muscovitebob commented Feb 19, 2020

LEDfan commented Mar 3, 2021

ramkumarg1 commented Aug 20, 2019 •

edited