New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
containerd-shim process isn't reaped for some killed containers #5708
Comments
I'm facing something that seems really closely related (and IMO it doesn't feel like it can be pure coincidence), although maybe not exactly the same? When running Docker in Docker (or even just raw conatinerd-in-Docker), I'm seeing 100% reliable behavior where every invocation of a container ends up in a $ docker run -dit --privileged --name test --volume /var/lib/containerd --entrypoint containerd docker:20-dind
2fa1f7a0b543808572a7a2da7ad28fd165d783f1ac8f3e9c59ebb30417f43b9f
$ docker exec test ps faux
PID USER TIME COMMAND
1 root 0:00 containerd
44 root 0:00 ps faux
$ docker exec test ctr i pull docker.io/tianon/true:latest
...
$ docker exec test ctr run --rm docker.io/tianon/true:latest foo
$ docker exec test ps faux
PID USER TIME COMMAND
1 root 0:00 containerd
110 root 0:00 [containerd-shim]
152 root 0:00 ps faux $ docker run -dit --privileged --name test --volume /var/lib/containerd --entrypoint containerd --init docker:20-dind
5d2d6ac195d6fdbb0646b6df8d64de3ac00c4ae3fc0dce62bdd8eb59ac20a322
$ docker exec test ps faux
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- containerd
8 root 0:00 containerd
32 root 0:00 ps faux
$ docker exec test ctr i pull docker.io/tianon/true:latest
...
$ docker exec test ctr run --rm docker.io/tianon/true:latest foo
$ docker exec test ps faux
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- containerd
8 root 0:00 containerd
142 root 0:00 ps faux (See also docker-library/docker#318.) |
Maybe related to opencontainers/runc#2575. Will take a look |
@dany74q not sure that the container-shim |
Hey @fuweid - The container shim mentioned here is already dead (killed it manually), but I did manage to find another container in the same state, here's the state.json:
I have access to this, or machines in similar state - so feel free to request any additional information, thanks a lot - appreciate your time ! In the mean time, I'll update that - in order to test the hypothesis that what triggers this behavior is dockerd (and not crd/runc) - I'll soon implement a transition of our EKS AMIs to use the one launched yesterday, which supports using crd as a CRI w/o dockerd. |
@tianon The ctr uses containerd-shim-runc-v2 by default right now. The shimv2 binary will re-exec itself to start the running shim server, which makes that the parent pid of running shim server is 1. But the containerd isn't the reaper for the exited child processes. That is why that is zombie shim in dind. And when use containerd/cmd/containerd-shim/main_unix.go Lines 287 to 318 in a963242
But the containerd/runtime/v1/shim/service.go Lines 509 to 541 in a963242
The
If you run the
|
@tianon it seems your case is not related to this one~ |
My bad, thanks for checking! (and the detailed reply) |
Updated: didn't find any clue about this issue. It seems that docker daemon lost the exit event so that the status of container is still running. |
Based on the OP's description, this sounds like docker not handling errors from kill correctly, but it may be because the error from runc is obfuscated by containerd and docker is not able to understand the current state. We squashed at least one of these issues before. |
I notice OP is using a really old version of docker and containerd, which could be related to why it is not fixed, but I haven't found the last fix we did yet to know if its in that release. |
We experience something similar in our EKS (
We detect this by checking for logs that are similar to this one:
Its because We manually have to kill the |
Hey @Kyslik , are you using Amazon Linux or Bottlerocket on your EKS clusters? We have experienced similar behavior in Bottlerocket and I want to discard Bottlerocket being the source of problems here. |
We are baking AMI ourselves based on Amazon Linux. |
Description
We have several EKS clusters which autoscale throughout the day - they handle burst workloads,
and in a given day the underlying ASG-s may scale down to 0 nodes, or scale up to tens of nodes.
We've noticed that once in a while, we have nodes which have pods stuck in a 'Terminating' status on them
for days on-end, until we manually intervene and force-delete them;
I've SSH-ed to one of the nodes which experienced this behavior and tried to introspect the behavior,
here's what I've gathered, hopefully covering most abstraction layers, but I could not find the root cause unfortunately -
I'd love to know what can I do to further debug this.
Quick summary (more info below):
The above leads me to conclude that there's something fishy in the distributed orchestration of killing a container, I'd assume that it's somewhere between containerd<->runc, but I'm not entirely sure - and would love to know how can I better pinpoint the exact cause.
Steps to reproduce the issue:
I'm not entirely sure how to reproduce the behavior yet, as it happens sporadically in arbitrary nodes.
Describe the results you received:
Containers are dead, containerd is aware they are stopped, but the shim isn't reaped, docker thinks the container is still running and misleads kubelet to keep it as Terminated until manually intervening (force deleting the pod).
Describe the results you expected:
The shim should go down with the container, docker should be notified the container is stopped so that kubelet will update the pod's status accordingly.
What version of containerd are you using:
Any other relevant information (runC version, CRI configuration, OS/Kernel version, etc.):
Deeper dive per abstraction layer -
Kubernetes / kubelet:
Docker:
Containerd:
cd7ed93ae2d106564609055e17b24679860bc6cfbfdb5c845f3644815387a37a 7064 STOPPED
containerd-shim:
Then I looked at the containerd journal entries for the shim id (6e99a6...) and the only thing that came up was:
Jul 07 00:11:36 ip-10-0-73-87.ec2.internal containerd[3538]: time="2021-07-07T00:11:36.521930693Z" level=info msg="shim containerd-shim started" address="unix:///run/containerd/s/6e99a634bfa5b915cbeade50e47384f6087 4a9358e5e96cb59523a46339c138b" debug=false pid=7023
I've managed to retrieve the shim's go stack trace by strace-ing to a file and sending a kill -USR1 to it,
but I don't see anything of particular interest there:
shim stacktrace
to which runc returns "container not running", and in turn the shim reports to containerd - "process already finished: not found"
runc:
not sure how to introspect this layer after the fact.
runc --root /var/run/docker/runtime-runc/moby events cd7ed93ae2d106564609055e17b24679860bc6cfbfdb5c845f3644815387a37a
shows "container with id ... is not running"runc list
shows the container as stoppedrunc state
shows the following:os:
What you expected to happen:
Pods should terminate once their underlying container had died.
How to reproduce it (as minimally and precisely as possible):
Not actually sure how to reproduce it consistently - it happens when creating and destroying nodes
rapidly, I'd assume.
Environment
runc --version
uname -a
The text was updated successfully, but these errors were encountered: