Scheduled pods cannot be started in docker. Container name already in used. #25359

rodcloutier · 2020-02-11T15:02:00Z

What kind of request is this (question/bug/enhancement/feature request):

Bug (on Rancher 2.2.9, haven't tried 2.3.x)

Steps to reproduce (least amount of steps as possible):

Note: Issue is sporadic and hard to reproduce. We are working on trying to create reliable reproduction steps.
Create a cluster through Rancher
Perform several pods rescheduling

Expected Result:

With normal condition, available resources, all containers from pods should be able to be scheduled and started in docker daemon.

Actual Result:

Observed pods unable to be started with the following event pattern:

Warning FailedCreatePodSandBox Failed create pod sandbox: rpc error: code = Unknown desc = failed to create a sandbox for pod "test-64cd57b5c4-rk5bs": Error response from daemon: Conflict. The container name "/k8s_POD_test-64cd57b5c4-rk5bs_default_7c8ebf47-42a1-11ea-855b-fa163e9c2fd4_0" is already in use by container "be4f2ae1acbc90a7ce6d06a978c9080993d7fae6c6954e46646c149bb3d4755f". You have to remove (or rename) that container to be able to reuse that name. 2 minutes ago

Running docker ps -a on the targeted node does not list the offending container

Current workaround

Drain the node
Restart Docker in the node or, as an alternative, reboot the node
```
$ rancher ssh <node>
$ systemctl restart docker.service
```
Uncordon (draining the node will, in fact, cordon the node) from the UI or using the following command:
```
$ kubectl uncordon <node>
```

Other details that may be helpful:

Might be related to docker bug The name "/data-container-name" is already used by container <hash>. You have to remove (or rename) that container to be able to reuse that name. moby/moby#23371 (comment)

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI):

Rancher: v2.2.9
User Interface: v2.2.98
Helm: v2.10.0-rancher11
Machine: v0.15.0-rancher8-1

Installation option (single install/HA): HA deployment with 2 replicas on K8s.

Rancher Cluster information

Cluster type: Hosted
Machine type and specifications (CPU/memory): VM
Kubernetes version (use kubectl version): 1.13.5
Docker version (use docker version):

$ docker version
Client:
 Version:           18.06.3-ce
 API version:       1.38
 Go version:        go1.10.8
 Git commit:        d7080c1
 Built:             Tue Feb 19 23:07:53 2019
 OS/Arch:           linux/amd64
 Experimental:      false
Server:
 Engine:
  Version:          18.06.3-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       d7080c1
  Built:            Tue Feb 19 23:07:53 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Target Cluster information (spawned by Rancher)

Kubernetes version (use kubectl version): 1.13.4, 1.13.5
Host OS: Seen with CoreOS 2079.4.0 2132.6.0
Docker version (use docker version):

$ docker version
Client:
 Version:           18.06.3-ce
 API version:       1.38
 Go version:        go1.10.8
 Git commit:        d7080c1
 Built:             Tue Feb 19 23:07:53 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.3-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       d7080c1
  Built:            Tue Feb 19 23:07:53 2019
  OS/Arch:          linux/amd64
  Experimental:     false

output of the kubelet and docker will be provided once we can catch or reproduce the error.

The text was updated successfully, but these errors were encountered:

zaggash · 2020-02-18T10:56:44Z

I think we can close this issue.

The issue is introduced by a change in Docker 17.04.
I see you are using k8s 1.13.5
Is it the downstream cluster version ?
The PR has been cherry picked to k8s 1.13 too.
kubernetes/kubernetes#79623
kubernetes/kubernetes#80758

Looks like it has been merged on Sep 11, 2019 so the fixed release is v1.13.11 published_at 2019-09-18T16:24:07Z

rodcloutier · 2020-02-18T13:48:27Z

Yes we can close this issue.
It was fixed in version 1.13.11, 1.14.7 and 1.15.4

Teja1126 · 2020-12-07T07:11:07Z

@zaggash

same issue we are observing while using below version

using k8s version 1.19.0

Docker version 19.03.13

Scheduled pods cannot be started in docker. Container name already in used
this issue frequently obsorved when i do reboot nodes/master in the k8s cluster

Warning Failed 3m48s (x4 over 5m51s) kubelet, master-3 Error: Error response from daemon: Conflict. The container name "/k8s_kube-apiserver_kube-apiserver-master-3_kube-system_1d25fb42cda5d90beda502e06a30a585_4" is already in use by container "04be72e367e5e30be717c10e9ef33dc6be7510653777af300aa67c1714b666fe". You have to remove (or rename) that container to be able to reuse that name.
Warning BackOff 3m33s (x10 over 5m50s) kubelet, master-3 Back-off restarting failed container
Normal Pulled 36s (x11 over ) kubelet, master-3 Container image "artifactory.radisys.com:8088/k8s.gcr.io/kube-apiserver:v1.19.0" already present on machine
Normal SandboxChanged kubelet, master-3 Pod sandbox changed, it will be killed and re-created.

rodcloutier closed this as completed Feb 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduled pods cannot be started in docker. Container name already in used. #25359

Scheduled pods cannot be started in docker. Container name already in used. #25359

rodcloutier commented Feb 11, 2020

zaggash commented Feb 18, 2020 •

edited

rodcloutier commented Feb 18, 2020

Teja1126 commented Dec 7, 2020 •

edited

Scheduled pods cannot be started in docker. Container name already in used. #25359

Scheduled pods cannot be started in docker. Container name already in used. #25359

Comments

rodcloutier commented Feb 11, 2020

zaggash commented Feb 18, 2020 • edited

rodcloutier commented Feb 18, 2020

Teja1126 commented Dec 7, 2020 • edited

zaggash commented Feb 18, 2020 •

edited

Teja1126 commented Dec 7, 2020 •

edited