Driver devicemapper failed to remove root filesystem. Device is busy #27381

ceecko · 2016-10-14T11:02:24Z

Description
Cannot remove containers, docker reports Driver devicemapper failed to remove root filesystem. Device is busy. This leaves containers in Dead state.

Steps to reproduce the issue:

docker rm container_id

Describe the results you received:
Error message is displayed: Error response from daemon: Driver devicemapper failed to remove root filesystem ce2ea989895b7e073b9c3103a7312f32e70b5ad01d808b42f16655ffcb06c535: Device is Busy

Describe the results you expected:
Container should be removed.

Additional information you deem important (e.g. issue happens only occasionally):
This started to occur after upgrade from 1.11.2 to 1.12.2 and happens occasionally (10% of removals)

Output of docker version:

Client:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 83
 Running: 72
 Paused: 0
 Stopped: 11
Images: 49
Server Version: 1.12.2
Storage Driver: devicemapper
 Pool Name: data-docker_thin
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: ext4
 Data file:
 Metadata file:
 Data Space Used: 33.66 GB
 Data Space Total: 86.72 GB
 Data Space Available: 53.06 GB
 Metadata Space Used: 37.3 MB
 Metadata Space Total: 268.4 MB
 Metadata Space Available: 231.1 MB
 Thin Pool Minimum Free Space: 8.672 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null overlay host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.305 GiB
Name: us-2.c.evennode-1234.internal
ID: HVU4:BVZ3:QYUQ:IJ6F:Q2FP:Z4T3:MBKH:I4KC:XFIF:W5DV:4HZW:45NJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):
All environments we run servers in - AWS, gcloud, physical, etc.

The text was updated successfully, but these errors were encountered:

thaJeztah · 2016-10-15T00:36:10Z

Is this happening with any container? What is running in the container, and what options do you use to start the container? (e.g. are you using bind-mounted directories, are you using docker exec to start additional processes in the container?)

ceecko · 2016-10-15T08:11:22Z

We run all containers in pretty much the same way and it happens randomly on any one of them.
We don't use docker exec, don't bind-mount any directories.
Here's config of one of the dead containers:

[
    {
        "Id": "ce2ea989895b7e073b9c3103a7312f32e70b5ad01d808b42f16655ffcb06c535",
        "Created": "2016-10-13T09:14:52.069916456Z",
        "Path": "/run.sh",
        "Args": [],
        "State": {
            "Status": "dead",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": true,
            "Pid": 0,
            "ExitCode": 143,
            "Error": "",
            "StartedAt": "2016-10-13T18:05:50.839079884Z",
            "FinishedAt": "2016-10-14T01:49:22.133922284Z"
        },
        "Image": "sha256:df8....4f4",
        "ResolvConfPath": "/var/lib/docker/containers/ce2ea989895b7e073b9c3103a7312f32e70b5ad01d808b42f16655ffcb06c535/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/ce2ea989895b7e073b9c3103a7312f32e70b5ad01d808b42f16655ffcb06c535/hostname",
        "HostsPath": "/var/lib/docker/containers/ce2ea989895b7e073b9c3103a7312f32e70b5ad01d808b42f16655ffcb06c535/hosts",
        "LogPath": "",
        "Name": "/d9a....43",
        "RestartCount": 0,
        "Driver": "devicemapper",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "fluentd",
                "Config": {
                    "fluentd-address": "127.0.0.1:24224",
                    "fluentd-async-connect": "true",
                    "labels": "app_id",
                    "tag": "docker.{{if (.ExtraAttributes nil).app_id}}{{(.ExtraAttributes nil).app_id}}{{else}}{{.Name}}{{end}}"
                }
            },
            "NetworkMode": "default",
            "PortBindings": {
                "3000/tcp": [
                    {
                        "HostIp": "127.0.0.2",
                        "HostPort": ""
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "always",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": [
                "mongodb:10.240.0.2"
            ],
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "CgroupParent": "mygroup/d9...43",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "devicemapper",
            "Data": {
                "DeviceId": "29459",
                "DeviceName": "docker-8:1-34634049-8e884a263c75cfb042ac02136461c8e8258cf693f0e4992991d5803e951b3dbb",
                "DeviceSize": "107374182400"
            }
        },
        "Mounts": [],
        "Config": {
            "Hostname": "ce2ea989895b",
            "Domainname": "",
            "User": "app",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "3000/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PORT=3000",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/run.sh"
            ],
            "Image": "eu.gcr.io/reg/d9...43:latest",
            "Volumes": null,
            "WorkingDir": "/data/app",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "app_id": "d9...43"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "65632062399b8f9f011fdebcd044432c45f068b74d24c48818912a21e8036c98",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": null,
            "SandboxKey": "/var/run/docker/netns/65632062399b",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "59d8aa11b92aaa8ad9da7f010e8689c158cad7d80ec4b9e4e4688778c49149e0",
                    "EndpointID": "",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": ""
                }
            }
        }
    }
]

ceecko · 2016-10-15T17:32:09Z

I've just noticed that this happens only on servers with this filesystem Backing Filesystem: ext4
The issue does not seem to occur on servers running xfs as backing filesystem.

thaJeztah · 2016-10-15T18:33:03Z

@ceecko thanks, that's interesting

@rhvgoyal is this a known issue on your side?

ceecko · 2016-10-17T09:57:25Z

This hits us hard in production :/ Any hints how to remove the dead containers?

rhvgoyal · 2016-10-17T13:39:51Z

@thaJeztah Strange that this will happen only with ext4 and not xfs. I am not aware of any such thing.

In general people have reported device being busy and there can be so many reasons for that.

@ceeko first of all make sure that docker daemon is running into a slave mount namespace of its own and not host mount namespace. So that mount points don't leak and chances of getting such errors are less. If you are using a systemd driven docker, there should be docker unit file and it should have MountFlags=slave.

ceecko · 2016-10-17T14:16:42Z

@rhvgoyal The MountFlags=slave seems to resolve the issue so far. The containers created before the change are still an issue but new containers do not seem to trigger the error so far. I'll get in touch in case anything changes.

Btw it may be worth updating the storage driver docs to recommend this as a best practice in production since I couldn't find any reference.

Thank you for your help.

thaJeztah · 2016-10-17T18:08:44Z

This was changed a while back; 2aee081#diff-ff907ce70a8c7e795bde1de91be6fa68 (#22806), per the discussion, this may be an issue if deferred removal is not enabled; #22806 (comment)

Should we change the default back? @rhvgoyal

rhvgoyal · 2016-10-17T18:24:28Z

@thaJeztah I think it might be a good idea to change default back to MountFlags=slave. We have done that.

Ideally deferred removal and deferred deletion features should have taken care of this and there was no need to use MountFlags=slave. But deferred deletion alone is not sufficient. Old kernels are missing a feature where one can remove a directory from a mount namespace even if it is mounted on in a different mount namespace. And that's one reason container removal can fail.

So till old kernels offer that feature, it might be a good idea to run docker daemon in a slave mount namespace.

ceecko · 2016-10-20T10:20:28Z

@rhvgoyal the errors started to appear again even with MountFlags=slave. We'll try the deferred removal and delete and will get back to you.

ceecko · 2016-10-22T10:48:23Z

We have just experienced the same error on xfs as well.
Here's the docker info

Containers: 52
 Running: 52
 Paused: 0
 Stopped: 0
Images: 9
Server Version: 1.12.2
Storage Driver: devicemapper
 Pool Name: data-docker_thin
 Pool Blocksize: 65.54 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 13 GB
 Data Space Total: 107.1 GB
 Data Space Available: 94.07 GB
 Metadata Space Used: 19.19 MB
 Metadata Space Total: 268.4 MB
 Metadata Space Available: 249.2 MB
 Thin Pool Minimum Free Space: 10.71 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2016-06-09)
Logging Driver: journald
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host overlay bridge null
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.10.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.389 GiB
Name: ip-172-31-25-29.eu-west-1.compute.internal
ID: ZUTN:S7TL:6JRZ:HG52:LDLZ:VR5Q:RWVV:IP7E:HOQ4:R55X:Z7AI:P63R
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

ceecko · 2016-10-27T11:53:43Z

I confirm that the error still occurs on 1.12.2 even with MountFlags=slave and dm.use_deferred_deletion=true and dm.use_deferred_removal=true even though less frequently than before.

ceecko · 2016-10-29T10:29:16Z

Here's more info from the logs re 1 container which could not be removed:

libcontainerd: container 4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543 restart canceled
error locating sandbox id c9272d4830ba45e03efda777a14a4b5f7f94138997952f2ec1ba1a43b2c4e1c5: sandbox c9272d4830ba45e03efda777a14a4b5f7f94138997952f2ec1ba1a43b2c4e1c5 not found
failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543/shm: invalid argument
devmapper: Error unmounting device ed06c57080b8a8f25dc83d4afabaccb26d72009dad23a8e87310b873c226b905: invalid argument
Error unmounting container 4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543: invalid argument
Handler for DELETE /containers/4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543 returned error: Unable to remove filesystem for 4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543: remove /var/lib/docker/containers/4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543/shm: device or resource busy

rhvgoyal · 2016-10-31T13:48:46Z

Following message suggests that directory removal failed.

remove /var/lib/docker/containers/4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543/shm: device or resource busy

And in older kernel it can fail because directory is mounted on in some other mount namespace. If you disable deferred deletion feature, this message will stop coming. But it will become some other error message.

Core of the issue here is that container is either still running or some of its mount points have leaked into other some mount namespace. And if we can figure out which mount namespace it has leaked into and how it got there, we could try fixing it.

Once you run into this issue, you can try doing find /proc/*/mounts | xargs grep "4d9bbd9b4da95f0ba1947055fa263a059ede9397bcf1456e6795f16e1a7f0543"

And then see which pids have mounts related to containers leaked into them. And that might give some idea.

ceecko · 2016-10-31T14:26:18Z

I have tried four containers which are all dead and cannot be removed due to device being busy and got nothing :/

# find /proc/*/mounts | xargs grep -E "b3070ef60def|62777ad2994f|923a6d20506d|f3e079a9721c"
grep: /proc/9659/mounts: No such file or directory

ceecko · 2016-10-31T14:29:35Z

Now I'm getting actually a slightly different error message:

# docker rm b3070ef60def
Error response from daemon: Driver devicemapper failed to remove root filesystem b3070ef60deffc0e496631ed6e058c4569d6233bb6947b27072a70c663d9e579: remove /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd: device or resource busy

rhvgoyal · 2016-10-31T14:32:36Z

Same thing. this directory can't be deleted because it is mounted on in some other mount namespace. Try to search in /proc//mounts and grep for this id 527ae5 and see which pid is seeing this mount point. We need to figure out that in your setup why container rootfs mount point is leaking into other mount namespace.

ceecko · 2016-10-31T14:43:29Z

Here we go:

# find /proc/*/mounts | xargs grep -E "527ae5"
grep: /proc/10080/mounts: No such file or directory
/proc/15890/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
/proc/23584/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
/proc/31591/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
/proc/4194/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
/proc/4700/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
/proc/4701/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
/proc/8858/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
/proc/8859/mounts:/dev/mapper/docker-253:1-1050933-527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd /var/lib/docker/devicemapper/mnt/527ae5985b1b730a05a667d147ce15abcbfb950a334aea4b673a413b6b21c4dd xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0

nginx     4194  0.0  0.0  55592 10520 ?        S    11:55   0:06 nginx: worker process is shutting down
nginx     4700  2.3  0.0  55804 10792 ?        S    11:58   3:52 nginx: worker process is shutting down
nginx     4701  1.8  0.0  55800 10784 ?        S    11:58   3:04 nginx: worker process is shutting down
nginx     8858  2.4  0.0  55560 10720 ?        S    14:05   0:59 nginx: worker process
nginx     8859  3.1  0.0  55560 10700 ?        S    14:05   1:15 nginx: worker process
root     15890  0.0  0.0  55004  9524 ?        Ss   Oct29   0:05 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx    23584  0.0  0.0  55576 10452 ?        S    09:17   0:00 nginx: worker process is shutting down
nginx    31591  0.9  0.0  63448 18820 ?        S    09:46   2:53 nginx: worker process is shutting down

rhvgoyal · 2016-10-31T14:46:42Z

what processes these pids map to? Try cat /proc/<pid>/comm or ps -eaf | grep <pid>

ceecko · 2016-10-31T14:50:59Z

These are all nginx worker processes shutting down after a config reload (see edited comment above). I'm wondering why they block the mounts since the containers do not bind any volumes.

rhvgoyal · 2016-10-31T14:56:27Z

So nginx process is running in another container? Or it is running on host?

rhvgoyal · 2016-10-31T15:00:03Z

Can you do following.

ls -l /proc/<docker-daemon-pid>/ns/mnt
ls -l /proc/<nginx-pid>/ns/mnt
Run a bash shell on host and run ls -l /proc/$$/ns/mnt

And paste output. here.

ceecko · 2016-10-31T15:04:08Z

nginx runs on the host.

docker-pid

# ls -l /proc/13665/ns/mnt
lrwxrwxrwx. 1 root root 0 Oct 31 15:01 /proc/13665/ns/mnt -> mnt:[4026531840]

nginx-pid

# ls -l /proc/15890/ns/mnt
lrwxrwxrwx. 1 root root 0 Oct 31 15:01 /proc/15890/ns/mnt -> mnt:[4026533289]

ls -l /proc/$$/ns/mnt
lrwxrwxrwx. 1 root root 0 Oct 31 15:02 /proc/10063/ns/mnt -> mnt:[4026531840]

rhvgoyal · 2016-10-31T15:10:32Z

You docker-pid and host both seem to be sharing same mount namespace. And that means docker daemon is running in host mount namespace. And that probably means that nginx started at some point after container start and it seems to be running in its own mount namespace. And at that time mount points leaked into nginx mount namespace and that's preventing deletion of container.

Please make sure MountFlags=slave is working for you. Once it is working, /proc//ns/mnt will give different output for docker daemon and bash shell running in host mount namespace.

cpuguy83 · 2017-09-06T18:44:53Z

@NeckBeardPrince Please don't waste our time with such pointless commentary.
If you'd like to help solve it, great. If you'd like to report some more data about the problem, great.

Other than that, there are a couple of ways of getting around this issue that have been posted here.

chasebolt · 2017-09-06T21:24:17Z

systemd unit file does not ship with MountFlags=slave

Server Version: 17.06.1-ce
CentOS Linux release 7.3.1611 (Core)

[root@dokken /]# systemctl show docker | grep Private
PrivateTmp=no
PrivateNetwork=no
PrivateDevices=no
[root@dokken /]# systemctl show docker | grep Mount
MountFlags=0

xeor · 2017-09-08T09:19:00Z

Last time I had this problem, it was ntpd that was holding the mounts.
Today, I got the same problem, and this time, it was a mariadb instance running on the host that was the reason.

docker-engine-17.05.0.ce-1.el7.centos.x86_64
mariadb-server-5.5.56-2.el7.x86_64

Example for finding the proc holding the mounts....

# container with the problem
docker rm efad7...
Error response from daemon: Driver devicemapper failed to remove root filesystem efad7...: remove /var/lib/docker/devicemapper/mnt/9bd66290ee...: device or resource busy

# Grep after parts of the mountpoint
grep docker /proc/*/mountinfo | grep 9bd66290ee
/proc/9736/mountinfo:776 427 253:24 / /var/lib/docker/devicemapper/mnt/9bd66290e...
/proc/9910/mountinfo:776 427 253:24 / /var/lib/docker/devicemapper/mnt/9bd66290e...

# Find who the pid's belongs to
ps aux | grep -E "9736|9910"
mysql     9736  0.0... /usr/bin/mysqld_safe --basedir=/usr
mysql     9910  9.8 ... /usr/libexec/mysqld --base...

# Do some extra research on one of the pids
grep docker /proc/9736/mountinfo | wc -l
70

grep docker /proc/9736/mountinfo | grep -o "/run/docker/netns/" | wc -l
17

grep docker /proc/9736/mountinfo | grep -o "/var/lib/docker/containers/" | wc -l
18

grep docker /proc/9736/mountinfo | grep -o "/var/lib/docker/devicemapper/mnt/" | wc -l
33

After restarting mariadb, it let go of the mountpoints, however, it grabbed a lot of them when it started.

grep docker /proc/16367/mountinfo | wc -l
52

rhvgoyal · 2017-09-08T12:40:01Z

Most of the removal failures are due to mount point (hence device ) being busy in some other mount namespaces. I think following proposed PR will help with this problem if kernel is new enough.

#34573

If you are running old kernel, then we have written a plug-in call oci-umount to reduce mount leaking problems.

https://github.com/projectatomic/oci-umount

bjonen · 2017-10-12T23:45:27Z

@rhvgoyal Do you have a plan on which release of docker to include this PR? We are still dealing with the driver "devicemapper" failed to remove root filesystem on a regular basis.

fitz123 · 2017-10-24T17:28:09Z

CentOS Linux release 7.4.1708 (Core)
3.10.0-693.5.2.el7.x86_64
17.06.2-ce

LOOKS LIKE IT IS FINALLY FIXED

bjonen · 2017-10-25T10:10:10Z

We are running Docker version 17.09.0-ce and still face the same issue.

the-nw1-group · 2017-10-27T09:06:55Z

We are occasionally hitting this issue on Oracle Linux:, with docker version 17.03.1-ce (From Oracle's repos)

Linux server 4.1.12-103.3.8.1.el7uek.x86_64 #2 SMP Fri Sep 15 17:23:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

The above is all fixed by the project's TDA, so we can't change any of it for they time being.

90% of our other environments are Centos 7.3/7.4, and we've not seen the issue there.

JulianKlug · 2017-10-29T18:24:03Z

Just managed to solve an instance of this issue with Docker 17.05 on arch Linux on 4.11.9
by

docker rm -f [myContainer] (failing with the driver "devicemapper" failed to remove root filesystem as usual)
ls /var/lib/docker/devicemapper/mnt/

This made the container finally disappear (not sure why though).

Xophe · 2017-11-03T16:01:52Z

@MonsieurWave as incredible as it looks, the "ls" trick worked perfectly for me when everything else did not !

bmitchboxboat · 2017-11-03T16:22:24Z

The docker rm -f [container] will report a failure but eventually cleanup the container and filesystem. The ls command is a red herring, all you really need is to wait a few seconds. But better than that is to use MountFlags=slave. And best is to switch off of devicemapper and use overlay2 instead.

esabol · 2017-11-04T06:11:09Z

And best is to switch off of devicemapper and use overlay2 instead.

We've been using Docker on CentOS 7.x (currently at 7.4) for over a year now. When we first installed Docker, everything and everyone said you had to use devicemapper with direct-lvm for the best performance and stability. https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/ still says you have to use devicemapper on CentOS with Docker EE. Fortunately, we use Docker CE, so we could switch to overlay2. I feel like the Docker folks slipped in the change in the default from devicemapper to overlay2 on CentOS in v1.13.0/1 with little fanfare or discussion. Is there any solid information on performance/stability of overlay2 versus devicemapper (direct-lvm) on CentOS 7? My googling hasn't found much....

rsanders · 2017-11-04T16:48:24Z

We had a very bad time with stock CentOS 7.2 kernels (their 3.10.x frankenstein). Lots of crashes. We were running Kubernetes in a dev env, so the churn of our containers was very high, but even in relatively quiet installations we found the stock CentOS+overlay combo very unstable. Running a 4.10+ upstream kernel with overlay2 is much better. Haven't tried a newer CentOS release.

You will need to either use an underlying filesystem that is ext4 or XFS formatted with "-n ftype=1". Docker will run if you have an improperly formatted XFS, but the results will be unpredictable.

SEAPUNK · 2017-11-04T23:30:53Z

Yeah, I've long since switched to overlay2, and recommend anyone who is still using devicemapper who can use overlay2 to switch, since even this issue aside, I've read that devicemapper is a very poor storage driver for docker in general.

sharksforarms · 2017-11-06T15:28:49Z

Restarting ntpd fixed the issue I was having... so confusing. Is there any "recommended" daemon.json configuration for docker on Centos7?

cpuguy83 · 2017-11-06T16:53:39Z

Some improvements are coming down the pipeline.

Specifically the issue with these other system services appears to be a race condition with setting up mount namespaces (for those other system services) and docker's attempt to keep it's own mounts private... the intention is for Docker to keep it's mounts from leaking into containers, unfortunately it's causing leakages elsewhere and actually end up holding private references to those mountpoints which means they can't be unmounted in those namespaces except either manually or when the process restarts.

In addition there's been some recent changes to deal with race conditions with using MS_PRIVATE mount propagation in both runc and docker.
Will the next version be perfect? Probably not... but I do expect this to get better.

FelikZ · 2017-11-14T12:45:55Z

I got exactly same thing as @ceecko with docker 12.1.1 , no chance to update now. Is it fixed later somewhere? Quick fix is to kill processes and restart docker service, but..

fitz123 · 2017-11-14T14:57:27Z

These versions completely fix the issue for me, including --live-restore

CentOS 7.4.1708 (3.10.0-693.5.2.el7.x86_64)
Docker 17.09.0-ce

jcberthon · 2017-11-30T20:09:52Z

@esabol we have evaluated switching to overlay2 after we upgraded to CentOS 7.4. Sadly it is too much work. The partitions we could use for storing the data is XFS and before 7.4, CentOS default XFS formatting option missed one parameter (I forgot which one) to be able to support overlay2 on top. So it means we would have to reformat the partition in order to be able to use overlay2 on top of XFS. That's when the switch to overlay2 is going to cost us too much work to avoid downtime, and the latest 7.4 kernel + Docker 17.09 and the above recommendations for the LVM configuration helped a lot avoiding the problem most of the time.

Note: docker info shows a big fat warning that running overlay2 over XFS without this specific options is not supported and will be removed in a future release. That did not sound too enticing for us...

srinivassurishetty · 2018-03-01T11:45:21Z

#34573 fix released in 17.09.1-ce, 17.12.0-ce versions

esabol · 2018-03-01T16:00:58Z

@jcberthon We recently bit the bullet and made the transition to overlay2, and I'm so glad we did! Performance improved 40% in the benchmarks of our unit tests that do docker run --rm. The final straw for us for devmapper was issue #20401. Switching to overlay2 wasn't very hard, but we have plenty of free disk space. I wrote a script to docker save all of our images to tarballs and another script to docker load all of the tarballs. We were done in 2-3 hours. I know it seems like a hassle and it can be if you don't have enough disk space, but it will be worth it in the long run, I think. Good luck!

cpuguy83 · 2018-03-12T17:29:57Z

This is fixed in 17.12.1

Thanks all.

elonmia · 2018-07-11T03:58:22Z

before the fiexed release, rebooting the physical node will solve the problem

MohdAhmad · 2018-09-05T00:35:45Z

@ravilr @KevinTHU regarding your comment #27381 (comment) and #27381 (comment) I've observed that changing the docker unit file on RHEL to PrivateTmp=true fixes the issue as well. Any chance you've seen something similar?

KevinTHU · 2018-09-19T01:44:37Z

@MohdAhmad have never try that, but I think this maybe ok, as PrivateTmp=true in docker unit file is for docker only, maybe fix this problem better even.

libaojie · 2019-04-18T02:32:44Z

I find the same issue. Because I open the folder,close the window to solve it.

GordonTheTurtle added the version/1.12 label Oct 14, 2016

thaJeztah added area/storage/devicemapper status/more-info-needed labels Oct 15, 2016

thaJeztah removed the status/more-info-needed label Oct 15, 2016

ceecko mentioned this issue Oct 18, 2016

Unable to remove filesystem - device or resource busy #20560

Closed

thaJeztah added the status/needs-attention Calls for a collective discussion during a review session label Oct 27, 2016

thaJeztah mentioned this issue Dec 4, 2017

docker rm -f <container> fails if there are two containers mounting the same directory from the host filesystem. #35673

Closed

malex984 mentioned this issue Jan 16, 2018

"hilbert stop micro" fails: "Driver devicemapper failed to remove root filesystem" hilbert/hilbert-cli#45

Open

rageshkrishna mentioned this issue Jan 19, 2018

Custom nodes are "Waiting" Shippable/support#4072

Closed

cpuguy83 closed this as completed Mar 12, 2018

Driver devicemapper failed to remove root filesystem. Device is busy #27381

Driver devicemapper failed to remove root filesystem. Device is busy #27381

Comments

ceecko commented Oct 14, 2016

thaJeztah commented Oct 15, 2016

ceecko commented Oct 15, 2016

ceecko commented Oct 15, 2016

thaJeztah commented Oct 15, 2016

ceecko commented Oct 17, 2016

rhvgoyal commented Oct 17, 2016

ceecko commented Oct 17, 2016

thaJeztah commented Oct 17, 2016

rhvgoyal commented Oct 17, 2016

ceecko commented Oct 20, 2016

ceecko commented Oct 22, 2016

ceecko commented Oct 27, 2016

ceecko commented Oct 29, 2016

rhvgoyal commented Oct 31, 2016

ceecko commented Oct 31, 2016

ceecko commented Oct 31, 2016

rhvgoyal commented Oct 31, 2016

ceecko commented Oct 31, 2016 • edited

rhvgoyal commented Oct 31, 2016

ceecko commented Oct 31, 2016

rhvgoyal commented Oct 31, 2016

rhvgoyal commented Oct 31, 2016

ceecko commented Oct 31, 2016

rhvgoyal commented Oct 31, 2016

cpuguy83 commented Sep 6, 2017

chasebolt commented Sep 6, 2017

xeor commented Sep 8, 2017

rhvgoyal commented Sep 8, 2017

bjonen commented Oct 12, 2017

fitz123 commented Oct 24, 2017

bjonen commented Oct 25, 2017

the-nw1-group commented Oct 27, 2017

JulianKlug commented Oct 29, 2017

Xophe commented Nov 3, 2017

bmitchboxboat commented Nov 3, 2017

esabol commented Nov 4, 2017

rsanders commented Nov 4, 2017

SEAPUNK commented Nov 4, 2017 • edited

sharksforarms commented Nov 6, 2017

cpuguy83 commented Nov 6, 2017

FelikZ commented Nov 14, 2017

fitz123 commented Nov 14, 2017 • edited

jcberthon commented Nov 30, 2017

srinivassurishetty commented Mar 1, 2018 • edited

esabol commented Mar 1, 2018

cpuguy83 commented Mar 12, 2018

elonmia commented Jul 11, 2018

MohdAhmad commented Sep 5, 2018

KevinTHU commented Sep 19, 2018

libaojie commented Apr 18, 2019

ceecko commented Oct 31, 2016 •

edited

SEAPUNK commented Nov 4, 2017 •

edited

fitz123 commented Nov 14, 2017 •

edited

srinivassurishetty commented Mar 1, 2018 •

edited