New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker does not free up disk space after container, volume and image removal #32420
Comments
I should also mention that based on #24023 I've switched to running overlay2 (instead of the default of overlay 1 on CentOS 7). The issue exists against both overlay and overlay2, so I think it's docker internals and not storage driver specific. I've configured overlay2 by modifying
and yes I cleaned docker before starting up docker with the new driver by running |
Some quick questions;
Also note, that removing containers with |
No. Fairly vanilla builds - mostly with Rocker but nothing special at run time (we haven't even switched to data volumes yet as we only just upgraded to a version of docker that has the
Generally we shouldn't actually have any containers to remove (as all containers are run with Got a link to the relevant bugs? |
Ok, just double checked what we're doing and the logic has been:
For the container bit the "clean up everything" has been doing (well something slightly more complicated as we were feature detecting the
This doesn't look to ever find anything in nightly runs. We also do:
Which obviously does find stuff, but never errors - maybe the problem is with our image clean up though? |
Could you please let me know of any actual bug numbers relating to this? |
@thaJeztah Any update on this? |
@Kenji-K Not exactly. I stop the docker service nightly and |
Don't use |
@cpuguy83 @thaJeztah So just to clarify - is there actually some known safe way to clean up container and image data in any version of docker atm? Because atm I'm stopping the service and just rm'ing stuff under the hood - but even with that I end up with overlay mounts dangling every now and then and have to actually reboot the box. |
@neerolyte if you have mounts hanging around (and are on a kernel > 3.15), most likely you have run a container where the docker root has been mounted into a container and it is holding onto the mount. |
I assume "most likely you have run a container where the docker root has been mounted into a container and it is holding onto the mount" would require running Docker-in-docker or something similar? I'm not doing any complex containers at all, all data is housed entirely inside the container, we're not even using volumes. The kernel is a little older because that's all we can get on CentOS7 - 3.10.0-514.21.1.el7.x86_64 - do you have any reference to the specific bug (sometimes RedHat backport in to EL)? |
@neerolyte I don't have a reference to a specific bug, but it's fixed in the upstream kernel at around 3.15. Supposed to be fixed in the upcoming RHEL 7.4 kernel. One potential way to fix the issue is to use Also using deferred device removal/deletion (devicemapper config). This doesn't really fix it as it will still get the busy error, but it will not return the error to the user and instead keep trying to remove it periodically in the background until it is successful. |
Ok I'll recheck when that's available.
I'm using overlay2, not devicemapper. |
I have the problem as well. It seems that when Docker encounters a "no space left on device error", it is no longer able to reclaim space. |
The only possible solution is to stop the service, then to delete the |
@milipili did you try cpuguy83's comment? |
@ripcurld0 We're running Ubuntu 16.04 w/ 4.9.0-040900-generic kernel, still seeing this issue. |
@jostyee FYI Ubuntu 16.04.3 LTS with 4.4.0-1030-aws kernel with aufs instead of overlay/overlay2 seems to run stable.
|
@markine @ripcurld0 Sorry my bad, it was a non-related issue for us, Overlay2 is fine here. |
@ripcurld0 I don't know. We don't use CentOS or RHEL (ubuntu 16.04 latest patches), It happens with both AuFS and devicemapper (overlay don't know) every time the partition goes out of space. We never use |
Same issue, but directory is |
|
I found a lot of unused images causing this issue, resolved running docker rmi $(docker images -q) |
@wannymiarelli that's expected; images, containers, and volumes take up space. Have a look at |
@thaJeztah sure! just saying that using docker rmi cleaned up the overlay2 folder correctly as @jostyee i'am sorry, actually i have no issue with the overlay folder |
At some point this situation has stabilised a lot for us, but I'm not entirely sure when. We're running We only clean up containers and untagged images regularly, I suspect our only blowout now is when we're changing something in the underlying stack (which generates new image tags). I'd still appreciate docs somewhere on what different parts under TLDR - happy for this to close. |
Thanks @neerolyte, let me go ahead and close this one 👍 |
@chr0n1x The only mounts that should exist in |
@cpuguy83 good to know, thanks for clearing that up for me |
@cpuguy83 We just tried upgrading 17.3.2 to 17.12.1-ce and we're still seeing this issue even after cleaning up layers leaked by prior versions. |
@j-kaplan please explain "we are still seeing this issue" |
@cpuguy83 It appears that docker is somehow holding onto data on the filesystem and the only way to clear the space is with a reboot.
I can't even seem to locate the open files with
Edit: I dragged a coworker into this and they discovered it seems to be related to loopback devices:
Running this has cleared up all the unaccounted for disk space:
Edit 2: To anybody else who might stumble upon this thread, if you're running Concourse in k8s and your k8s nodes are leaking loopback devices like I saw above, it is most likely concourse. It appears that this leaking happens when a Concourse worker pod that was using a baggageclaim driver of |
@j-kaplan The space used in your It also seems like you have These loopback devices are not related to docker. |
I wasn't sure what was creating the loopback devices since these nodes are running docker for use as Kubernetes nodes. I will dig into that side of the world to see whats going on there. Thanks for your help. |
We are running into this issue too:
I think this issue should be reopened. Let me know if there is any other info I can provide to debug this issue. |
@arturopie Do you happen to be running Concourse in k8s? We're in the middle of tracking down a lead but it appears that is what is leaking the loopback devices. |
@j-kaplan we are not running k8s. I don't think we have any loopback device:
|
Just like @arturopie, I am running in the same issue:
I'm also not running k8s and I don't have any loopback devices.
What seems to be keeping this zombie disk space in use is that I have about 1500 mounts from elements in docker's overlay filesystem. Here's an exerpt:
Unfortunately, even after cleaning up the mounts, no disk space is being freed and I still have to do a manual cleanup. The reboot doesn't help.
|
In my case, I don't have any overlay mount:
|
If you still have layers sitting in the graph driver directory (and no images) then most likely these are from older versions of docker. This would have happened from an older version of Docker when doing "docker rm -f ", and the layer could not be removed due to a mount leak. |
@cpuguy83 we never upgrade Docker on the same machine, we create a new machine with an empty disk when we upgrade Docker, so I'm sure that's not the issue in our case. |
I don't know if everyone is gone, but here are some tips and tricks. just make the docker system cleanup a chronjob: https://nickjanetakis.com/blog/docker-tip-32-automatically-clean-up-after-docker-daily Start again by finding which is the culprit directory: 'df -hx --max-depth=1 /' for me the culprit was docker: /var/lib/docker/overlay2/ Long term: insert this: |
We're seeing this with docker 18.09.6 on a fairly recently built server where docker definitely hasn't been upgraded. Why has this not been re-opened? The OP only agreed for it to be closed because the issue "went away", there are plenty of perfectly valid subsequent reports. |
Just encountered this same issue on Ubuntu Disco Dingo. Issue was related to an running container not being stopped until it was forced stopped. After force stopping the container I was then able to delete the image and then docker system prune -v -f ran and cleaned up all the overlay2 bloat |
@mjramtech Everyone thinks they have the same issue but it's clear a lot of the noise is from completely unrelated issues e.g @bmulcahy prune removes unused data, if you have a running container, the associated image is not unused. If someone can come up with a repro case with open data, it's worth opening a new issue imo. I can't repro any more and I could never repro with data I could share. |
@Dmitry1987 I notice the same issue. FS size is 58 GB and overlay2 is 60GB+ |
@thaJeztah it feels like the issue should be reopened :D |
Docker version 19.03.5 is fairly new but still have the problem reported in 2017... |
I don't want to reopen this issue, because became somewhat of a "kitchen sink" of "possibly related, but could be different issues". Some issues were fixed, and other reports really need more details to investigate if there's still issues to address. It's better to start a fresh ticket (but feel free to link to this issue). Looking at your earlier comment
First of all, I really don't recommend manually removing directories from under I see you're mentioning you're running node-exporter (https://github.com/prometheus/node_exporter), which can bind-mount the whole system's filesystem into the container. (Possibly depending on mount-propagation settings), this can be problematic. If you bind-mount As to differences between I see some mention of snaps and lxc in your comment; this could be unrelated, but if you installed docker using the snap packages, those packages are maintained by Canonical, and I've seen many reports of those being problematic; I'd recommend (if possible) to test if it also reproduces on the official packages (https://docs.docker.com/engine/install/ubuntu/) (per earlier comments above); it's possible that files are still in use (which could be by stopped containers or untagged images); in that case, the disk use may be legitimate; if possible, verify if disk space does go down after removing all images and containers. If you think there's a bug at hand, and have ways to reproduce the issue so that it can be looked into, feel free to open a new ticket, but when doing so;
|
Thank you for the suggestions @thaJeztah , it does make sense to report new specific cases with full details. The mount namespaces information is new for me, I will try to learn more about that. But what do you mean by
is it that all mounts defined by all containers will be locked by the one which mounts all |
I'm a bit "hazy" on the exact details (I know @kolyshkin and @cpuguy83 dove more deeply into this when debugging some "nasty" situations), but my "layman explanation" is that "container A" has mounts in it's own namespace (and thus only visible within that namespace), now if "container B" mounts those paths, those paths can only be unmounted if both "container B" and "container A" unmount them. But things can become more tricky than that; if "container A" has mounts with mount-propagation set (slave? shared?), the mounts of "container B" will also propagate to "container A", and now there's an "infinite loop" (container B's mounts cannot be unmounted until container A's mounts are unmounted, which cannot be unmounted until container A's mounts are unmounted). |
I was able to recover 32 million inodes and 500GB of storage in
|
Similar to #21925 (but didn't look like I should post on there).
Description
I have some docker hosts running CI builds, nightly all docker data is removed from them, but
/var/lib/docker/overlay2
keeps consuming more space.If I remove all docker data, e.g I just did:
There's still a few GB tied up at
/var/lib/docker/overlay2
:These files are not left over from a prior upgrade as I upgraded and
rm -rf /var/lib/docker/*
yesterday.Steps to reproduce the issue:
Unfortunately I don't have a simple set of steps to reproduce this that are fast and shareable - fortunately I can reliably check our CI nodes each morning and they are in this state, so with some help we can probably get to a repro case.
Describe the results you received:
More space is consumed by
/var/lib/docker/overlay2
over time despite all attempts to clean up docker using its inbuilt commands.Describe the results you expected:
Some way to clean out image, container and volume data.
Additional information you deem important (e.g. issue happens only occasionally):
There's obviously some reference between
/var/lib/docker/image
and/var/lib/docker/overlay2
, but I don't understand exactly what it is.With docker reporting no images:
I can see an ID for one of the base images we built a lot of stuff on top of:
If I run something in that image, the output is weird:
Weird things about that output:
89afeb2e357b
already existsIf I then delete all images again:
Disable the current overlay2 dir with docker stopped:
It does indeed error out looking for the overlay2 counterpart:
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
oVirt VM in a company cloud running stock CentOS 7 and SELinux. Docker installed from docker.com packages.
The text was updated successfully, but these errors were encountered: