Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't stop docker container #35933

Closed
Timunas opened this issue Jan 4, 2018 · 146 comments
Closed

Can't stop docker container #35933

Timunas opened this issue Jan 4, 2018 · 146 comments
Labels
area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/more-info-needed status/needs-attention Calls for a collective discussion during a review session version/17.12

Comments

@Timunas
Copy link

Timunas commented Jan 4, 2018

Description

Can't stop container.

I'm starting and removing containers concurrently using docker-compose.
Sometimes it fails to remove the containers.

I checked that I can't docker stop the container. The command hangs and after change docker daemon to debug I just see this line when I run the command.
dockerd[101922]: time="2018-01-04T15:54:07.406980654Z" level=debug msg="Calling POST /v1.35/containers/4c2b5e7f466c/stop"

Steps to reproduce the issue:

  1. Run tests in jenkins
  2. Eventually it fails to remove containers.

Describe the results you received:

Can't stop container.

Describe the results you expected:

Container should have been stopped. And then removed.

Additional information you deem important (e.g. issue happens only occasionally):

Issue happens only occasionally

Output of docker version:

Client:
 Version:	17.12.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	c97c6d6
 Built:	Wed Dec 27 20:10:14 2017
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:12:46 2017
  OS/Arch:	linux/amd64
  Experimental:	false

Output of docker info:

Containers: 6
 Running: 1
 Paused: 0
 Stopped: 5
Images: 75
Server Version: 17.12.0-ce
Storage Driver: devicemapper
 Pool Name: docker-253:0-33643212-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 31.43GB
 Data Space Total: 107.4GB
 Data Space Available: 75.95GB
 Metadata Space Used: 35.81MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.112GB
 Thin Pool Minimum Free Space: 10.74GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 1
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 36
Total Memory: 117.9GiB
Name: jenkins-node.com
ID: 5M6L:G2KF:732H:Y7RF:QHNO:3XM4:U6RV:U5QR:ANPA:7XRZ:M3S4:GUZC
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 37
 Goroutines: 51
 System Time: 2018-01-04T16:02:36.54459153Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.

@thaJeztah
Copy link
Member

This really needs more information, and reproduction steps

dockerd[101922]: time="2018-01-04T15:54:07.406980654Z" level=debug msg="Calling POST /v1.35/containers/4c2b5e7f466c/stop"
  • The message above is only showing that the call was made to stop the container; are there any messages after that?
  • How is docker setup? Are you running docker-in-docker?
  • Have you verified the container is still running? What does docker inspect of the container show? Is there a PID in the output? And is that process still running? (ps auxf on the host)
  • Can you reproduce the issue without Jenkins? Can you provide exact steps to reproduce?

@Timunas
Copy link
Author

Timunas commented Jan 5, 2018

No more messages are logged.

Meanwhile, I made some more tests and after the container enters in this state:

  • I can't stop the container
  • I can't docker exec to bash.
  • I can start and stop other containers

To exit this state I have to:

  • service docker stop
  • kill container processes, (if not docker doesn't start)
  • service docker start

I think I have reproduced this outside Jenkins one time but thought it was another problem.

As with Jenkins, it is easier to reproduce this, I'll wait for next time to do docker inspect container.

The setup (running in CENTOS vm):

  • Starting containers with certain images using docker-compose up
  • Performing some tests using the applications started in containers
  • Stopping containers using docker-compose down

And these steps are done for each test, and I'm running tests concurrently.

I'm sorry for not giving more information, but this is what I could collect so far.

@Timunas
Copy link
Author

Timunas commented Jan 8, 2018

I got a similar problem now with different docker version. I can't stop any container that is created.

And this is logged for all containers.

Jan 08 16:53:10  dockerd[7012]: time="2018-01-08T16:53:10.984024605Z" level=debug msg="Sending kill signal 15 to container 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94"
Jan 08 16:53:12  dockerd[7012]: time="2018-01-08T16:53:12.985034572Z" level=info msg="Container failed to stop after sending signal 15 to the process, force killing"
Jan 08 16:53:12  dockerd[7012]: time="2018-01-08T16:53:12.985087603Z" level=debug msg="Sending kill signal 9 to container 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94"
Jan 08 16:53:12  dockerd[7012]: time="2018-01-08T16:53:12.986759908Z" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: not found\ngithub.com/docker/docker/vendor/github.com/containerd/containerd/errdefs.init\n\t/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/errdefs/errors.go:25\ngithub.com/docker/docker/vendor/github.com/containerd/containerd/content.init\n\t/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/content/helpers.go:141\ngithub.com/docker/docker/vendor/github.com/containerd/containerd.init\n\t/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task_opts.go:78\ngithub.com/docker/docker/container.init\n\t/go/src/github.com/docker/docker/container/view.go:496\ngithub.com/docker/docker/builder.init\n\t/go/src/github.com/docker/docker/builder/builder.go:108\ngithub.com/docker/docker/api/server/backend/build.init\n\t/go/src/github.com/docker/docker/api/server/backend/build/tag.go:85\nmain.init\n\t/go/src/github.com/docker/docker/cmd/dockerd/service_unsupported.go:15\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:173\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2197" error_type="*errors.fundamental" module=api
Jan 08 16:53:12  dockerd[7012]: time="2018-01-08T16:53:12.986856140Z" level=error msg="Handler for POST /v1.34/containers/9cdc36c44340/stop returned error: cannot stop container: 9cdc36c44340: Cannot kill container 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94: process 9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94 not found: not found"
Jan 08 16:53:12  dockerd[7012]: time="2018-01-08T16:53:12.987051906Z" level=debug msg="FIXME: Got an API for which error does not match any expected type!!!: not found\ngithub.com/docker/docker/vendor/github.com/containerd/containerd/errdefs.init\n\t/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/errdefs/errors.go:25\ngithub.com/docker/docker/vendor/github.com/containerd/containerd/content.init\n\t/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/content/helpers.go:141\ngithub.com/docker/docker/vendor/github.com/containerd/containerd.init\n\t/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/task_opts.go:78\ngithub.com/docker/docker/container.init\n\t/go/src/github.com/docker/docker/container/view.go:496\ngithub.com/docker/docker/builder.init\n\t/go/src/github.com/docker/docker/builder/builder.go:108\ngithub.com/docker/docker/api/server/backend/build.init\n\t/go/src/github.com/docker/docker/api/server/backend/build/tag.go:85\nmain.init\n\t/go/src/github.com/docker/docker/cmd/dockerd/service_unsupported.go:15\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:173\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2197" error_type="*errors.fundamental" module=api

docker-info :

 Running: 6
 Paused: 0
 Stopped: 0
Images: 61
Server Version: 17.11.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 992280e8e265f491f7a624ab82f3e238be086e49
runc version: 0351df1c5a66838d0c392b4ac4cf9450de844e2d
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.10.0-42-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31GiB
Name: Laptop-749
ID: WZVE:HR5Q:3GYH:WNS6:FJCQ:TGHD:UMU5:PPWM:7Z77:QSBV:G2SW:HI77
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 26
 Goroutines: 48
 System Time: 2018-01-08T16:58:47.457072503Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

docker version:

 Version:      17.11.0-ce
 API version:  1.34
 Go version:   go1.8.3
 Git commit:   1caf76c
 Built:        Mon Nov 20 18:37:39 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.11.0-ce
 API version:  1.34 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   1caf76c
 Built:        Mon Nov 20 18:36:09 2017
 OS/Arch:      linux/amd64
 Experimental: false

docker inspect :

    {
        "Id": "9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94",
        "Created": "2018-01-08T16:32:30.716158282Z",
        "Path": "/opt/entrypoint.sh",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 477,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2018-01-08T16:32:31.370353796Z",
            "FinishedAt": "0001-01-01T00:00:00Z",
            "Health": {
                "Status": "healthy",
                "FailingStreak": 0,
                "Log": [
                    {
                        "Start": "2018-01-08T16:40:52.760255527Z",
                        "End": "2018-01-08T16:40:52.814916997Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2018-01-08T16:41:12.821209911Z",
                        "End": "2018-01-08T16:41:12.872327217Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2018-01-08T16:41:32.879017542Z",
                        "End": "2018-01-08T16:41:32.932394782Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2018-01-08T16:41:52.938598813Z",
                        "End": "2018-01-08T16:41:52.993106466Z",
                        "ExitCode": 0,
                        "Output": ""
                    },
                    {
                        "Start": "2018-01-08T16:42:12.998820005Z",
                        "End": "2018-01-08T16:42:13.056301771Z",
                        "ExitCode": 0,
                        "Output": ""
                    }
                ]
            }
        },
        "Image": "sha256:71843cc0ac81d2a365553dd5b69f6643dab212fd8b45d498c6a92614352ed75f",
        "ResolvConfPath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/hostname",
        "HostsPath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/hosts",
        "LogPath": "/var/lib/docker/containers/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94/9cdc36c44340cd23a5cbfb884c1fab4d47b173552dd992f392d4398603b46a94-json.log",
        "Name": "/kegfngsmzx_component_1",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/dev/null:/tmp/conf/4:rw",
                "/home/joao.suzana/gitprojects/superComponent/docker/configurations/default/component/common:/tmp/conf/1:rw",
                "/home/joao.suzana/gitprojects/superComponent/docker/configurations/default/component/basic:/tmp/conf/0:rw",
                "/home/joao.suzana/gitprojects/superComponent/docker/configurations/system-tests/component:/tmp/conf/3:rw",
                "/home/joao.suzana/gitprojects/superComponent/docker/configurations/custom/component:/tmp/conf/2:rw"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "kegfngsmzx_default",
            "PortBindings": {
                "1099/tcp": [
                    {
                        "HostIp": "",
                        "HostPort": ""
                    }
                ],
                "7000/tcp": [
                    {
                        "HostIp": "",
                        "HostPort": ""
                    }
                ],
                "8080/tcp": [
                    {
                        "HostIp": "",
                        "HostPort": ""
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [],
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "shareable",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3-init/diff:/var/lib/docker/overlay2/a9c4a86986bf84eff4d3156580e986daed91c7a37d937c5e4f608cd90b78f50a/diff:/var/lib/docker/overlay2/566bb33f0a3140bdb3726e3581bc703557f729010d2fb5b76ba21ac04157e5eb/diff:/var/lib/docker/overlay2/92302187d5633c0e6f3577edf93e2f1fbc133ccfcd11c6ce4a2b0fd06eb33db4/diff:/var/lib/docker/overlay2/3ac16dcca78ec2202d9af5e2e1ca50053612b75247d685c66418516aa7a1f91e/diff:/var/lib/docker/overlay2/3c2bef86bfac98dace20fb5ad4461601d444797454a5561bb543e4478d3aed25/diff:/var/lib/docker/overlay2/82de5471b51e7a55f8d9ff61983b36e9302b2fc7f4ba3fcc6ce5bde9f426ac9b/diff:/var/lib/docker/overlay2/7103da23a70519f91ae53950b6da99797d75104815ff43a1662efc92a933dc45/diff:/var/lib/docker/overlay2/70d522784351b087ee139f429dd041e1966308365e222f9022ab33f1f6da5089/diff:/var/lib/docker/overlay2/05d68822eebc4564c7e4597ee7c3d2bece406703e2e042bdf2ec35061a178f3a/diff:/var/lib/docker/overlay2/cc4fbcefd6fc474463d00d55d708988fc68f6eca5534675992e157743cb04af7/diff:/var/lib/docker/overlay2/50a363caa96c54de6cf17bfa477e384694f0fdf15a81c27cb92b830c0a8782b1/diff:/var/lib/docker/overlay2/ee1dadb2c4a98b37896eeb4e97f0715d97485bd10ef2b70d3b279d7fb93a4b18/diff:/var/lib/docker/overlay2/a66b6a45869ab5484cc04259ee7e11d32526a1fa1c91748f71754b57a87b69d9/diff:/var/lib/docker/overlay2/58472f6337dd2f95a5bda690e630fc6ddf4f661b6e965cfa798c666cde72457a/diff:/var/lib/docker/overlay2/22657f15e2d1411269f3201e63705babaaa7a04275f6c91ca5df4dc167abd93f/diff:/var/lib/docker/overlay2/5483cd1fad2a005e68e2656c5fcee54b8844576743288c06e49f40f6a4381a63/diff:/var/lib/docker/overlay2/ba02a2666cd21a254805404d1757f8ed90e28089e4a924e15a524c1e09265d0a/diff:/var/lib/docker/overlay2/07359ba2f66ba314629b1a6df441a7b96470e5d55ec22b88a48cc7c93b34f515/diff:/var/lib/docker/overlay2/99ecef114a5db24e123e4f5d9a8a01c3a79fa6aaed1af1095669f374a689294d/diff:/var/lib/docker/overlay2/7cfa73084c807c05112368f9c60627622b807b5ad932ace14541994f95209329/diff:/var/lib/docker/overlay2/b8e4cd0ea2811b61210129cc97ef4d10489bcb61b3b1dbe64d5a7af65bc284e2/diff:/var/lib/docker/overlay2/5cb7c00c701b24ca232c773eff803b0ca26a4bb137a5960920f5f3e9c96cfe7b/diff:/var/lib/docker/overlay2/6e722e736fb0acf96c2bbd2b29cd10e79955fe4b5fd8bf862a17ffa241b68a1b/diff:/var/lib/docker/overlay2/160835aace0cb1e2f4b9360934188b99ca9a65c74ee8d100f613275024e9d811/diff:/var/lib/docker/overlay2/5c7ba1cf63c83cda117ef0eca2bfd65d9bd44669e0e80933e351620bce546354/diff:/var/lib/docker/overlay2/c58b587a8318b57dc1f39c2aa2df68fa86295280fc007650a16008d05685b356/diff",
                "MergedDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3/merged",
                "UpperDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3/diff",
                "WorkDir": "/var/lib/docker/overlay2/5fbfde1f36fe0da7cee8ed92b728f7b92ddd4a1b8d9aeafe44b7e8cde581aeb3/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/default/component/common",
                "Destination": "/tmp/conf/1",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/custom/component",
                "Destination": "/tmp/conf/2",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/system-tests/component",
                "Destination": "/tmp/conf/3",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/dev/null",
                "Destination": "/tmp/conf/4",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/home/joao.suzana/gitprojects/superComponent/docker/configurations/default/component/basic",
                "Destination": "/tmp/conf/0",
                "Mode": "rw",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "9cdc36c44340",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "1099/tcp": {},
                "7000/tcp": {},
                "8080/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "LC_ALL=en_US.UTF-8",
                "JDK_RPM=jdk-8u131-linux-x64.rpm",
                "JAVA_HOME=/usr/java/jdk1.8.0_131/",
                "COMPONENT_HOME=/opt/component"
            ],
            "Cmd": [
                "/opt/entrypoint.sh"
            ],
            "Healthcheck": {
                "Test": [
                    "CMD-SHELL",
                    "grep -q \"App Service is ready.\""
                ],
                "Interval": 20000000000,
                "Retries": 30
            },
            "ArgsEscaped": true,
            "Image": "docker.privateimage.com/private:latest",
            "Volumes": {
                "/tmp/conf/0": {},
                "/tmp/conf/1": {},
                "/tmp/conf/2": {},
                "/tmp/conf/3": {},
                "/tmp/conf/4": {}
            },
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "build-date": "20171128",
                "com.docker.compose.config-hash": "51a3c3781142fce6292e53a5a42dd804a41e5c6e81b02b2dab14647d5f3fe774",
                "com.docker.compose.container-number": "1",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "kegfngsmzx",
                "com.docker.compose.service": "private-component",
                "com.docker.compose.version": "1.17.1",
                "com.super.component": "Super",
                "license": "GPLv2",
                "name": "CentOS Base Image",
                "vendor": "CentOS"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "4a6a1b4492dce570a42cb735915c76fab4c0e92dd712bf81ae323df8eec1d0a3",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "1099/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "32863"
                    }
                ],
                "7000/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "32862"
                    }
                ],
                "8080/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "32861"
                    }
                ]
            },
            "SandboxKey": "/var/run/docker/netns/4a6a1b4492dc",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "kegfngsmzx_default": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "component",
                        "9cdc36c44340"
                    ],
                    "NetworkID": "19e6624e9254883228576ad289770611fd066ed7fc1c847eb0dd25899b240d07",
                    "EndpointID": "850780c0914d118382913f0ff287433e88c01a56d3e42fa95ce890c737027b76",
                    "Gateway": "172.18.0.1",
                    "IPAddress": "172.18.0.7",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:12:00:07",
                    "DriverOpts": null
                }
            }
        }
    }
]

@thaJeztah
Copy link
Member

@Timunas can you try updating to 17.12?

@Timunas
Copy link
Author

Timunas commented Jan 8, 2018

The original issue was with 17.12

Regarding the original issue, I reproduced it once again and I cannot docker inspect it just hangs for all commands

@EliRibble
Copy link

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

@hallvar
Copy link

hallvar commented Jan 9, 2018

I have the same issue with docker swarm. I remove one of multuple docker stacks, but only some of the containers in the stack are removed, while some containers hang around. Commands to docker inspect or docker rm on the hung containers just hang on the command line until I Ctrl-c. Need to reboot to get the containers removed. Did not have the issue in 17.09, only after upgrading to 17.12.0-ce (also had the problem on 17.12.0-ce-rc4).

I have the issue on an Azure VM: docker info

 Running: 83
 Paused: 0
 Stopped: 12
Images: 579
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: hy0kx44q5m9jg0lc1n5ylxkw6
 Is Manager: true
 ClusterID: ordhsz694y98k3r4604ksc937
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 2
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.0.0.10
 Manager Addresses:
  10.0.0.10:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-104-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 27.47GiB
Name: build-agent-vm001
ID: S7WY:RCKF:G3P7:TI3H:MJ2F:UXZ3:C5DS:YQG3:OPF4:V4RS:5EQ7:AWG4
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

I also have the same issue on Docker for Mac (Edge: 17.12): docker info

 Running: 65
 Paused: 0
 Stopped: 45
Images: 607
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: qfzh0tqkchl2m42uhju7k3ml4
 Is Manager: true
 ClusterID: q14zy6epqkpx0w112wusdtd3u
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 2
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.65.3
 Manager Addresses:
  192.168.65.3:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.60-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 6
Total Memory: 5.817GiB
Name: linuxkit-025000000001
ID: DSXX:YVTO:DLFW:MN3X:MTJC:3EGK:MUYT:6JMN:C2NC:TQMW:BE44:3P6H
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 260
 Goroutines: 491
 System Time: 2018-01-09T00:13:09.053688513Z
 EventsListeners: 28
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3128
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

@mborejdo
Copy link

mborejdo commented Jan 11, 2018

We are also experiencing non-responsive docker-deamon on some commands:

currently I cannot

docker rmi
docker system prune -f
docker exec
docker logs

this happends on multiple engines, all running 17.12.

seems related to #35408

@achekulaev
Copy link

achekulaev commented Jan 11, 2018

I experience the same bug. It is not consistent though. I don't see a pattern yet but it does happen.

I am running Docker for Mac Version 17.12.0-ce-mac46 (21698). I am not running Docker in Docker.

Container is created by docker-compose up.

Yes I can see that container is still running but stop or kill just hangs and does nothing.

10:13:13 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER
$ docker ps
CONTAINER ID        IMAGE                     COMMAND                  CREATED             STATUS                    PORTS                                                    NAMES
f0e36d3589d3        docksal/cli:1.3-php7      "/opt/startup.sh sup…"   44 hours ago        Up 28 minutes (healthy)   22/tcp, 9000/tcp                                         sbdmaster_cli_1
b93c84c9a3a3        docksal/ssh-agent:1.0     "/run.sh ssh-agent"      44 hours ago        Up 29 minutes                                                                      docksal-ssh-agent
91ce00eb35fa        docksal/dns:1.0           "/opt/entrypoint.sh …"   44 hours ago        Up 29 minutes             192.168.64.100:53->53/udp                                docksal-dns
ae867cca0f21        docksal/vhost-proxy:1.1   "docker-entrypoint.s…"   44 hours ago        Up 29 minutes             192.168.64.100:80->80/tcp, 192.168.64.100:443->443/tcp   docksal-vhost-proxy
10:13:17 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER
$ docker stop f0e36d3589d3
^C
10:16:03 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER
$ docker kill f0e36d3589d3
^C
10:30:51 Alexei-Workstation /Users/alexei.chekulaev/Projects/SBD-MASTER

(You can see that minutes passed before I pressed Ctrl-C)

In another Terminal I tried to start another docker-compose project, that's what I have seen in the output the first time:

$ docker-compose up
rm: can't remove '/.ssh/id_rsa.pub': Stale file handle
rm: can't remove '/.ssh/authorized_keys': Stale file handle
rm: can't remove '/.ssh/id_rsa2.pub': Stale file handle
rm: can't remove '/.ssh/known_hosts': Stale file handle
rm: can't remove '/.ssh/id_test': Stale file handle
rm: can't remove '/.ssh/id_test.pub': Stale file handle
rm: can't remove '/.ssh/id_rsa2': Stale file handle
rm: can't remove '/.ssh/id_dsa': Stale file handle
rm: can't remove '/.ssh/id_boot2docker': Stale file handle
rm: can't remove '/.ssh/id_sbd.pub': Stale file handle
rm: can't remove '/.ssh/id_sbd': Stale file handle
rm: can't remove '/.ssh/id_rsa': Stale file handle
rm: can't remove '/.ssh/id_boot2docker.pub': Stale file handle
rm: can't remove '/.ssh': Directory not empty
Starting services...
Creating network "demonodb_default" with the default driver
Creating demonodb_cli_1 ... done
Creating demonodb_cli_1 ... 
Creating demonodb_web_1 ... done

Another project started fine but with these errors about stale file names above. Subsequent stops and starts of the another project did not throw any errors and worked fine.

These files are on a named volume. The volume is mounted as ro in docker-compose, so I'm not sure why there are "cant remove" messages.

Restarting Docker daemon solves the issue... temporarily. I forgot to do docker inspect and already restarted daemon but I think inspect would just hang like stop and kill do.

UPDATE: wanted to note that the container with issues has healthcheck on it. Looks like this might be the culprit.

@thaJeztah thaJeztah added this to backlog in maintainers-session Jan 11, 2018
@crossdot
Copy link

I get the same issue. Can reproduce it everytime using different environments:
Docker for Mac Version 17.12.0-ce-mac46 (started hanging after update)
or using docker natively on Arch linux (kernel 4.14.14-1-ARCH), cannot restart docker service using systemctl restart docker.service, it hangs too. docker info

Client:
 Version:       18.01.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    03596f51b1
 Built: Sun Jan 14 23:10:39 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm
Server:
 Engine:
  Version:      18.01.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   03596f51b1
  Built:        Sun Jan 14 23:11:14 2018
  OS/Arch:      linux/amd64
  Experimental: false

journalctl shows

dockerd[26382]: time="2018-01-25T12:39:22.289082720+03:00" level=error msg="stream copy error: reading from a closed fifo"

@enkoder
Copy link

enkoder commented Jan 25, 2018

Also seeing this on 18.01. Hang on container inspect.

Client:
 Version:	18.01.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	03596f51b1
 Built:	Sun Jan 14 23:10:39 2018
 OS/Arch:	linux/amd64
 Experimental:	false
 Orchestrator:	swarm

Server:
 Engine:
  Version:	18.01.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	03596f51b1
  Built:	Sun Jan 14 23:11:14 2018
  OS/Arch:	linux/amd64
  Experimental:	false
compose.cli.command.get_client: Docker version: Platform={'Name': ''}, Components=[{'Name': 'Engine', 'Version': '18.01.0-ce', 'Details': {'ApiVersion': '1.35', 'Arch': 'amd64', 'BuildTime': '2018-01-14T23:11:14.000000000+00:00', 'Experimental': 'false', 'GitCommit': '03596f51b1', 'GoVersion': 'go1.9.2', 'KernelVersion': '4.14.15-1-ARCH', 'MinAPIVersion': '1.12', 'Os': 'linux'}}], Version=18.01.0-ce, ApiVersion=1.35, MinAPIVersion=1.12, GitCommit=03596f51b1, GoVersion=go1.9.2, Os=linux, Arch=amd64, KernelVersion=4.14.15-1-ARCH, BuildTime=2018-01-14T23:11:14.000000000+00:00
compose.cli.verbose_proxy.proxy_callable: docker containers <- (all=False, filters={'label': ['com.docker.compose.project=discord']})
urllib3.connectionpool._make_request: http://localhost:None "GET /v1.24/containers/json?limit=-1&all=0&size=0&trunc_cmd=0&filters=%7B%22label%22%3A+%5B%22com.docker.compose.project%3Ddiscord%22%5D%7D HTTP/1.1" 200 1762
compose.cli.verbose_proxy.proxy_callable: docker containers -> (list with 1 items)
compose.cli.verbose_proxy.proxy_callable: docker inspect_container <- ('59760b63049318f7b0bef2605e63d0fd8b13f4e134a7aea435db9eb1bdf2b389')

@rfay
Copy link

rfay commented Jan 30, 2018

We have stopped using 17.12 completely and rolled back to 17.09 because of this problem on 17.12 (macOS and apparently Linux as well).

This is a critical, persistent problem.

And unfortunately I have not found way to recreate it except using docker a lot.

@ay0o
Copy link

ay0o commented Jan 30, 2018

I'm experiencing the same issue in multiple servers using 17.12. As @rfay said, it didn't happen on 17.09.

Checking the changelog, a major difference between 17.12 and 17.09 is that, since 17.11, Docker is based on containerd. So, as the evidences seem to indicate this is an issue in the runtime, maybe it would be good to investigate down this path.

@achekulaev
Copy link

Yup, same here. I stick with 17.09 and recommend everyone using docker-compose or swarm to stick with it until the issue is resolved.

@cpuguy83
Copy link
Member

If you can grab a stacktrace from the running daemon it would be very helpful.
You can get this by hitting GET /debug/pprof/goroutine?debug=2

I suspect, though, that this is the recent bug that was found in runc that is a race in handling the container I/O... which has been around since forever, apparently.
if so, we suspect this is exposed by changes in the kernel and everyone is upgrading their kernel recently for spectre/meltdown patches.

@cpuguy83
Copy link
Member

The relevant runc patch is here, which you can try if you don't want to wait for a patched docker release: opencontainers/runc#1698

@achekulaev
Copy link

achekulaev commented Jan 30, 2018

@cpuguy83

You can get this by hitting GET /debug/pprof/goroutine?debug=2

Please provide commands. I don't understand how to "hit" a relative url, and what is it relative to. I use Docker for Mac. What should I hit?

@cpuguy83
Copy link
Member

@achekulaev
Assuming you have docker listening on a unix socket at /var/run/docker.sock (the default):

curl --unix-socket /var/run/docker.sock http:/./debug/pprof/goroutine?debug=2

or a TCP socket

curl http://<ip>:<port>/debug/pprof/goroutine?debug=2

@ay0o
Copy link

ay0o commented Jan 30, 2018

The following file is the output of that command ran in an AWS Ubuntu 16.04 instance using Docker version 17.12.0-ce, build c97c6d6

moby 35933.txt

@cpuguy83
Copy link
Member

@ay0o Thanks!
Is there something blocked on the system right now?
I don't see any in progress stop/kills, just looks like a bunch of running containers, unfortunately.

@AlterEgo7
Copy link

I took the logs on a MacBook Pro running macOS High Sierra 10.13.3, running docker 18.01.0-ce-mac48, channel: edge ee2282129d.

docker_output.log

@cpuguy83
Copy link
Member

@AlterEgo7 Thanks! This looks like docker is blocked in a syscall to write to disk, and even read from disk at least in one place. Seems like something is very wrong with the disk that is allocated for that docker VM in docker4mac.

@cpuguy83
Copy link
Member

A number of i/o bound syscalls blocked for ~1 minute, actually.

@cpuguy83
Copy link
Member

cpuguy83 commented May 5, 2018

18.03.1 is out with some mitigations for this. Please let us know if it's still a problem on that release.

@vce-xx
Copy link

vce-xx commented May 5, 2018

@cpuguy83 Docker for AWS was still on 18.03.0 last time I checked. And last version listed in release notes is 18.03.0. I am eager to check. Any idea when Docker for AWS stable will upgrade to 18.03.1 ?

@marcomsousa
Copy link

Caution upgrading your swarm cluster, bug: #36961, your cluster can became dead.

@tnguyenns1
Copy link

@cpuguy83 18.03.1 is not there yet at the release page: https://docs.docker.com/release-notes/docker-ce/ or am I blind?

@marcomsousa
Copy link

marcomsousa commented May 7, 2018

18.03.1 is not there yet at the release page: https://docs.docker.com/release-notes/docker-ce/ or am I blind?

That doc are out of date, you can see here: https://github.com/docker/docker-ce/releases/tag/v18.03.1-ce
released 11 days ago.

@thaJeztah
Copy link
Member

@marcomsousa thanks for noticing that; release-notes are now also added on the docs website; https://docs.docker.com/release-notes/docker-ce/#18031-ce-2018-04-26

@timdau
Copy link

timdau commented May 9, 2018

@cpuguy83 Is there a list somewhere of all of the issues related to this problem? That way we can know for sure when this issue is resolved and its safe to upgrade.

@cpuguy83
Copy link
Member

cpuguy83 commented May 9, 2018

@timdau This is mitigated by containerd/containerd@d235ae9

@marcomsousa
Copy link

marcomsousa commented May 9, 2018

This commit containerd/containerd@d235ae9 was released in containerd 1.0.3.
Docker-ce 18.03.1 include this version of containerd.

So we need to test if this error fixed in the 18.03.1 version

@achekulaev
Copy link

Seems like 18.03.1 has fixed the issue for me. I have been using it for a week locally, but did not experience the issue, that was easily reproducible within a day otherwise.

stayclassychicago added a commit to stayclassychicago/teamcity-docker-agent that referenced this issue May 21, 2018
Update Docker to stable version which contains fix for moby/moby#35933

Docker-ce 18.03.1 includes the commit which fixes this containerd issue, per
moby/moby#35933 (comment)

And feedback based on comment:
moby/moby#35933 (comment)
@marcomsousa
Copy link

marcomsousa commented May 22, 2018

The 18.03.1 version seems to fixed this issue. (or mitigated was said @cpuguy83)

I tested in 4 clusters.

@thaJeztah
Copy link
Member

Thank you all for confirming; I'll go ahead and close this issue.

If you still run into this on Docker 18.03.1 or above; please open a new issue with details

@casperWWW
Copy link

@mavogel I had the same problem with freezing docker containers. The solution for me was that if I move logging from /dev/stderr to internal file inside docker container then the problem is gone. Probably there is some disk issue when container logs to /dev/stderr and probably it is the case for most of problems.

@EduJGURJC
Copy link

EduJGURJC commented Nov 21, 2018

My (temporary) solution in both version 18.06.1-ce and 18.09 was similar to @casperWWW. In my case I lowered the log level of the applications executed inside the containers and they stopped hanging.

@loretoparisi
Copy link

so what it seems is that the container cannot release allocated I/O resources.

@wh0am111
Copy link

wh0am111 commented May 6, 2019

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

I 'm same with you,except restart docker on the effected node,any other way to solve the problem?

@casperWWW
Copy link

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

I 'm same with you,except restart docker on the effected node,any other way to solve the problem?

See my comment I've sent earlier here - #35933 (comment)
Hopefully that will help you as well.

desimaniac added a commit to Cloudbox/Cloudbox that referenced this issue May 29, 2019
- Should prevent issues mentioned here (moby/moby#35933) from happening.
- Will update the version to latest once a stable release is out.
@sunkeysun
Copy link

sunkeysun commented Jun 18, 2019

I get the same issue, though without using docker-compose. I'm using docker swarm. Same thing though, I occasionally get containers that neither docker swarm nor I with the docker CLI can stop. This causes docker swarm to end up collecting more replicas than desired that it can't scale down. Sometimes these replicas can still service requests and receive traffic. The only way to remove the containers is to restart docker on the effected node.

My docker version v17.12.1.
I get the same issue. this cause my service load balance on different image version and containers count more than replicas set. I think this a big bug of docker. It seriously affect my service in production. Please help resolve . @thaJeztah

@thaJeztah
Copy link
Member

Docker 17.12 has reached EOL over a year ago; are you able to reproduce on a current version?

desimaniac added a commit to Cloudbox/Cloudbox that referenced this issue Aug 31, 2019
- Should prevent issues mentioned here (moby/moby#35933) from happening.
- Will update the version to latest once a stable release is out.
charleskorn added a commit to batect/batect that referenced this issue Nov 28, 2019
17.12.1 has an issue where it will randomly freeze (see
moby/moby#35933).

If this resolves the issue running the tests on CI, we'll bump the
minimum version required by batect itself as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/more-info-needed status/needs-attention Calls for a collective discussion during a review session version/17.12
Projects
None yet
Development

No branches or pull requests