Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker does not free up disk space after container, volume and image removal #21925

Open
stouf opened this issue Apr 11, 2016 · 148 comments
Open

Comments

@stouf
Copy link

stouf commented Apr 11, 2016

Versions & co

Docker

Docker version

$ docker version
Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Docker info:

$ docker info
Containers: XXX
Images: XXX
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: XXX
 Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-26-generic
Operating System: Ubuntu 14.04.3 LTS
CPUs: 1
Total Memory: XXX GiB
Name: XXX
ID: XXXX:XXXX:XXXX:XXXX

Operating system

Linux 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:09:17 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Issue

Here is how I currently deploy my application:

  1. Build a new image based on a new version of my application code
  2. Up a new container based on the image created in 1
  3. Remove the previous container and its volume with the command docker rm -v xxxxx
  4. Remove all the unused images with docker rmi $(docker images -q)

However, little by little, I'm running out of disk space. I made sure I don't have any orphan volumes, unused containers and images, etc...

I found a post on a forum telling the following

It's a kernel problem with devicemapper, which affects the RedHat family of OS (RedHat, Fedora, CentOS, and Amazon Linux). Deleted containers don't free up mapped disk space. This means that on the affected OSs you'll slowly run out of space as you start and restart containers.

The Docker project is aware of this, and the kernel is supposedly fixed in upstream (#3182).

My machine being a Linux hosted on AWS, I wonder if the kernel I'm using could be related to the issue I referenced above ?
If not, does any one has an idea about what could be the origin of this problem ? I spent the whole day looking for a solution, but could not find any so far :(

@thaJeztah
Copy link
Member

Did you previously run using a different storage driver? If you did, it's possible that /var/lib/docker still contains files (images/containers) from the old storage driver.

Note that the devicemapper issue should not be related to your situation, because according to your docker info, you're using aufs, not devicemapper.

@stouf
Copy link
Author

stouf commented Apr 12, 2016

Did you previously run using a different storage driver?

Nop, it has always been AUFS.

Note that the devicemapper issue should not be related to your situation, because according to your docker info, you're using aufs, not devicemapper.

Yep, I realized after posting here that issue is only related to Devisemapper, sorry ^^

@thaJeztah
Copy link
Member

Might be worth checking if it's actually /var/lib/docker that's growing in size / taking up your disk space, or a different directory. Note; to remove unused ("dangling") images, you can docker rmi $(docker images -aq --filter dangling=true)

@stouf
Copy link
Author

stouf commented Apr 12, 2016

Might be worth checking if it's actually /var/lib/docker that's growing in size / taking up your disk space, or a different directory.

Yep, I already confirmed that :( To be more accurate, the folders growing in size are /var/lib/docker/aufs/diff and /var/lib/docker/aufs/mnt. The size of any other folder under /var/lib/docker is not really significant.

Note; to remove unused ("dangling") images, you can docker rmi $(docker images -aq --filter dangling=true)

Thanks. I'm already doing that. On each deployment, I:

  1. remove any exited containers with the -v option to also remove the associated volume
  2. remove all the unused images through that command.

Which is the reason why I don't understand why my disk space is decreasing over time :(

@thaJeztah
Copy link
Member

Do the daemon logs show anything interesting (e.g. Docker failing to remove containers?). You've X-ed the amount of containers and images in your output, is that number going down after your cleanup scripts have run? Also note that you're running an outdated version of docker; if you want to stay on docker 1.8.x, you should at least update to docker 1.8.3 (which contains a security fix)

@stouf
Copy link
Author

stouf commented Apr 12, 2016

Do the daemon logs show anything interesting (e.g. Docker failing to remove containers?)

No, everything seems to be normal. Plus, I keep loosing disk space while containers are up and running and without even deploying new containers.

You've X-ed the amount of containers and images in your output, is that number going down after your cleanup scripts have run?

Ah yeah, sorry for X-ing those numbers. They don't change at all, as I always deploy the same containers and clean the old ones each time I deploy. So, the number of containers and number of images remain the same as expected.

Also note that you're running an outdated version of docker; if you want to stay on docker 1.8.x, you should at least update to docker 1.8.3 (which contains a security fix)

Yep, I'm better update, indeed. I was planning on updating to the latest version soon, but I will have to do it in the next 48 hours because my server is now running out of disk space :(
After the update, I'll keep monitoring the disk space everyday and report my observations here. I really hope it's just a version problem.

@stouf
Copy link
Author

stouf commented Apr 13, 2016

Hi guys,

Update to Docker 1.10 done. I used another instance to deploy my infra on top on Docker v1.10. Thus, I took that chance to investigate a little deeper on this space disk issue on the old server; the problem came from something within my infra, unrelated to Docker containers... Sorry for bothering :(

@stouf stouf closed this as completed Apr 13, 2016
@thaJeztah
Copy link
Member

@stouf good to hear you resolved your issue

@stouf
Copy link
Author

stouf commented Apr 13, 2016

Thanks a lot for the support :)

@awolfe-silversky
Copy link

This issue and #3182 are marked as closed. However just today another user reported the problem remains. Please investigate.

@stouf
Copy link
Author

stouf commented Jun 24, 2016

@awolfe-silversky Could you please describe the issue? As I said above, my problem wasn't related to containers or Device Mapper. It was a container in my infrastructure silently generating tons of logs that were never removed.

@groyee
Copy link

groyee commented Oct 18, 2016

I have the same issue.

I stoped all docker containers, however when I run this command:

sudo lsof -nP | grep '(deleted)'

I get:

screen shot 2016-10-19 at 2 09 47

ONLY when I do sudo service docker restart, only then it frees the space.

Here is the best picture to describe it:

screen shot 2016-10-19 at 2 15 11

@stouf
Copy link
Author

stouf commented Oct 19, 2016

@groyee I gave it a try on my side and had the same results; I only got 500MB freed by restarting the Docker daemon, but I have less than 10 containers running on the server I was testing.
I think we should create a new dedicated issue, as it seems to be different from what this issue was originally about.

@gsccheng
Copy link

gsccheng commented Mar 7, 2017

I have a similar problem where clearing out my volumes, images, and containers did not free up the disk space. I traced the culprit to this file, which is 96 gb. /Users/MyUserAccount/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2

However, looks like this is a known issue for Macs:
docker/for-mac#371
#23437

@Zokormazo
Copy link

I'm suffering similar problem on debian jessie. I freed ~400MB with a service restart, but have 2,1GB of old container garbage inside /var/lib/docker/aufs with just one container running

@mbana
Copy link

mbana commented Apr 9, 2017

confirming this issue.
can you folks at least not attach a warning when you start taking up too much space.
i do something like this and it becomes very noticeable fairly quickly what the issue is:

function usagesort {
  local dir_to_list="$1"
  cd "$dir_to_list"
  du -h -d 1 | sort -k 1,1 -g
}
...
$ usagesort "$HOME/Library/Containers" | grep -i docker
43G	./com.docker.docker
276K	./com.docker.helper

is there an official work-around ot this issue or better yet when are you planning to actually fix it?

@thaJeztah
Copy link
Member

@mbana @gsccheng on OS X, that's unrelated to the issue reported here, and specific to Docker for Mac, see docker/for-mac#371

@caneraydinbey
Copy link

What is the solution here?

root@vegan:/var/lib/docker# du -shc *|grep "G"|sort -n 29G aufs 135G containers 164G total root@vegan:/var/lib/docker# cd containers/ root@vegan:/var/lib/docker/containers# du -shc *|grep "G"|sort -n 134G 11a36e593a91c4677482ec49e7asfasfasf0e306732c16073d0c241a82acfa325bf03a1a 135G total

@HWiese1980
Copy link

Is there already a solution for this issue?

root@xxx:/var/lib/docker# du -shc *
84G	aufs
4,0K	containers
2,6M	image
80K	network
4,0K	nuke-graph-directory.sh
20K	plugins
4,0K	swarm
4,0K	tmp
4,0K	tmp-old
4,0K	trust
36K	volumes

/var/lib/docker/aufs takes a damn lot of space on my disk. There are no images and containers left anymore:

root@xxx:/var/lib/docker# docker images -a
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
root@xxx:/var/lib/docker# 

root@xxx:/var/lib/docker# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
root@xxx:/var/lib/docker# 

I don't get rid of it... without manually deleting it which I'm afraid of doing because I don't know what of that data is still needed.

@thaJeztah
Copy link
Member

@HWiese1980 docker (up until docker 17.06) removed containers when docker rm --force was used; even if there was an issue with removing the actual layers (which could happen if the process running in a container was keeping the mount busy); as a result, those layers got "orphaned" (docker no longer had a reference to them), thus got left around.

docker 17.06 and up will (in the same situation), keep the container registered (in "dead" state), which allows you to remove the container (and layers) at a later stage.

However if you've been running older versions of docker, and have a cleanup script that uses docker rm -f, chances are those layers accumulated over time. You can choose to do a "full" cleanup (you'll loose all your local images, volumes, and containers, so only do this if there's no important information currently) to do so, stop the docker service, and rm -rf /var/lib/docker. Alternatively, you can stop the docker service, move the directory (as a backup), and start the service again.

In your situation, it looks like there's no (or very little) data in the volumes directory, so if there's no images or containers on your host, it may be "safe" to just remove the /var/lib/docker directory.

@eoglethorpe
Copy link

I can't add anything too intelligent to this but after a good amount of build testing my local storage became full so I tried to delete all images and containers and they were gone from Docker however the space wasn't reclaimed.

/var/lib/docker/ was the main culprit and consuming my disk space. I'm on 17.06.1-ce, build 874a737... not sure if I can provide anything else

@tshirtman
Copy link

tshirtman commented Aug 18, 2017

I think i got hit by the same thing, installed docker earlier today on this new laptop, so it was clean before, and built a few images to test, getting low on space, i took care on calling docker rm on any stopped docker (produced by my builds, never used -f to remove them), and then docker rmi on all untagged images, currently i have this

gabriel@gryphon:~> sudo docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
buildozer           latest              fbcd2ca47e0b        3 hours ago         4.19GB
ubuntu              17.04               bde41be8de8c        4 weeks ago         93.1MB
19:22:44 18/08/17 red argv[1] 100% 59
gabriel@gryphon:~> sudo df -h /var/lib/docker/aufs
Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur
/dev/nvme0n1p5     114G    111G     0 100% /var/lib/docker/aufs
19:23:08 18/08/17 red argv[1] 100% 25
gabriel@gryphon:~> sudo du -sh /var/lib/docker/aufs/diff
59G	/var/lib/docker/aufs/diff
19:23:25 18/08/17 red argv[1] 100% 6115
gabriel@gryphon:~> sudo docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
buildozer           latest              fbcd2ca47e0b        3 hours ago         4.19GB
ubuntu              17.04               bde41be8de8c        4 weeks ago         93.1MB
19:23:30 18/08/17 red argv[1] 100% 46
gabriel@gryphon:~> sudo docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
19:23:33 18/08/17 red argv[1] 100% 43
gabriel@gryphon:~> sudo ls /var/lib/docker/aufs/diff|head
04fd10f50fe1d74a489268c9b2df95c579eb34c214f9a5d26c7077fbc3be0df4-init-removing
04fd10f50fe1d74a489268c9b2df95c579eb34c214f9a5d26c7077fbc3be0df4-removing
050edba704914b8317f0c09b9640c9e2995ffa403640a37ee77f5bf219069db3
059f9eee859b485926c3d60c3c0f690f45b295f0d499f188b7ad417ba8961083-init-removing
059f9eee859b485926c3d60c3c0f690f45b295f0d499f188b7ad417ba8961083-removing
09425940dd9d3e7201fb79f970d617c45435b41efdf331a5ad064be136d669b2-removing
0984c271bf1df9d3b16264590ab79bee1914b069b8959a9ade2fb93d8c3d1d9b-init-removing
0984c271bf1df9d3b16264590ab79bee1914b069b8959a9ade2fb93d8c3d1d9b-removing
0b082b302e8434d4743eb6e0ba04076c91fbd7295cc524653b2d313186d500fa-removing
0b11febcb2332657bd6bb3feedd404206c780e65bc40d580f9f4a77eb932d199-init-removing
19:23:57 18/08/17 red argv[1] 100% 35
gabriel@gryphon:~> sudo ls /var/lib/docker/aufs/diff|wc -l
256

already restarted docker, didn't change anything, i think i'll remove everything ending with -removing in the diff/ directory, thankfully nothing important depends on the docker images in this laptop, but still, wouldn't like for this to happen on a server.

@eoglethorpe
Copy link

eoglethorpe commented Aug 18, 2017 via email

@stouf
Copy link
Author

stouf commented Aug 19, 2017

Have you tried docker system prune?
Also, when you remove a container, do you use the -v option? It seems that volumes are not removed by default when containers are removed via docker rm.

@MartinThoma
Copy link

I have a similar issue: https://stackoverflow.com/q/45798076/562769

@alexanderadam
Copy link

It happened again for us.
Docker filled the disk up until all processes stopped writing. If I sum the numbers that tools like ncdu show, everything also add up to the "correct" (but terrible) numbers.

@mythz
Copy link

mythz commented Dec 28, 2020

I've resolved this issue (when running WSL2) by manually recompacting the WSL2 ext4.vhdx size:

@1nfility
Copy link

I'm also having this issue on macOS Catalina 10.15.7 using centOS image.

@fierman333
Copy link

fierman333 commented Jul 15, 2021

Similar issue with containers which were deleted, but that containers overlay2 directory still exists on the node.
After running:

docker system prune

This deleted all not used docker volumes and stopped containers including they overlay2 volume data.
But for containers which were previously deleted their not removed overlay2 volume data still exists on the node:

ls -1d /var/lib/docker/overlay2/*/diff/terraform/.terraform
/var/lib/docker/overlay2/002e10c118eb70be534df06fd28f25de24ca7df829f18628db14506bd05afba9/diff/terraform/.terraform
/var/lib/docker/overlay2/223f221234df763184ef68e94e3d8d06789d6daa69701aee2d3d4f31e6ffc8c0/diff/terraform/.terraform
/var/lib/docker/overlay2/269087abc4bb7be89b49e710294a2228b15b1b337944abebcac53c5e00d3e4e0/diff/terraform/.terraform
/var/lib/docker/overlay2/2c9e27b5c934b9cc751d83529b398c954bdb9f45cc2cfbb8e1285f43149d9dae/diff/terraform/.terraform
/var/lib/docker/overlay2/31b290ce67ab081c0880b4bd51432de18b5bed82aedeed91e2265299523f320d/diff/terraform/.terraform
/var/lib/docker/overlay2/3b95919d10d2183e53999f4ce5cc94cebaed75979c9b54be51f10d46628a23d3/diff/terraform/.terraform
/var/lib/docker/overlay2/40779ab5db32584371805b85d3708ed28450bbce564612c42b5e7fe48c328e98/diff/terraform/.terraform
/var/lib/docker/overlay2/60017bc36cae31db6f5e2012bbf5194eefce6f79ddecffaf91ad33c7732a05f1/diff/terraform/.terraform
/var/lib/docker/overlay2/6bfa24bf150b5b17df5615bff5b49e6c483dd9c5fcb9eb9ffb37abe5eaa127d3/diff/terraform/.terraform
/var/lib/docker/overlay2/70fe595aa1f2a885f8ba865132769eb3ccc24785a8ce57767866fbe5a72c3a58/diff/terraform/.terraform
/var/lib/docker/overlay2/76a371eb3c44f17d4fb57a35e6ad19c47e8fe2c72f1e3dc1d5fb2292f78c088c/diff/terraform/.terraform
/var/lib/docker/overlay2/85aadea8b3fc121e4850ea605c649989c7119be8094435c0742d360d991d1490/diff/terraform/.terraform
/var/lib/docker/overlay2/9b99f4532dceb2fee40aaf01e1388fc7c565021b1b65d57a8ab0f3dd4facd7ff/diff/terraform/.terraform
/var/lib/docker/overlay2/9d419697df5227d0ea0530475b5a18c39453c1d00caa78d6547bbc6e1045c4e8/diff/terraform/.terraform
/var/lib/docker/overlay2/b11e76d71a201d264dd67185b19a696443e5db922fe1762a2052dc0fa5704275/diff/terraform/.terraform
/var/lib/docker/overlay2/b3af26bd08e037dc1539062a6be66a6988844de4d7d06fe5e6fa5ea870c5bf85/diff/terraform/.terraform
/var/lib/docker/overlay2/c7791741aa4ec9c60ba24b1dcb4cd9bd8f212f9f94c80955fe2062df6643d114/diff/terraform/.terraform
/var/lib/docker/overlay2/dcece0e1b00810fda06188de172c19a35e7ab1366935166e19cdbc65abf7a406/diff/terraform/.terraform
/var/lib/docker/overlay2/f8fdbf37b4182d91b3e4b382917ec5e986bbf8f6d63a8615b1500c802f3eccb5/diff/terraform/.terraform
/var/lib/docker/overlay2/ffa458d125e2073351e851f205e8a2013bfcc1d48ab9bbeeb001aa263229498a/diff/terraform/.terraform

We are using kubernetes which manages pods. In this case these are expired kubernetes jobs. Which in general deletes pods, hence containers, so I do not think that the problems is in kubernetes and rather this should be related to docker pods lifecycle.

@thaJeztah
Copy link
Member

@fierman333 does restarting the daemon in such a situation clean up those files? My best guess would be if files were in use (or if there's mounts shared between namespaces), the daemon wasn't able to remove them. Possibly they're garbage-collected when the daemon is restarted (not 100% sure though). If mounts leaked to other namespaces (I know of situations where cAdvisor was a culprit there), things sometimes get nasty and (IIRC) only a restart of the host can release such mounts.

@fierman333
Copy link

@thaJeztah Tried it didn't help. Still that volume directory are in place for deleted containers.

ls -ld /var/lib/docker/overlay2/3cebbb19dfaee0151ac7f4c7a4ac062c3b197fc25e95afdd945338dd39a3f625/diff/terraform/.terraform
drwxr-xr-x. 3 root root 23 Jul 13 12:05 /var/lib/docker/overlay2/3cebbb19dfaee0151ac7f4c7a4ac062c3b197fc25e95afdd945338dd39a3f625/diff/terraform/.terraform

volume="3cebbb19dfaee0151ac7f4c7a4ac062c3b197fc25e95afdd945338dd39a3f625"
for c in $(docker ps -a --format={{.ID}}); do
  docker inspect $c | jq -r '.[].GraphDriver.Data.WorkDir' | grep -q "$volume"
  if [[ $? -eq 0 ]]; then
    contatiner_meta=($(docker inspect $c | jq -r '.[].Config.Labels | ."io.kubernetes.pod.namespace", ."io.kubernetes.pod.name"'))
    pod_namespace=${contatiner_meta[0]}
    pod_name=${contatiner_meta[1]}
    echo "pod namespace is: ${pod_namespace}"
    echo "pod name is: ${pod_name}"
  fi
done

-> NULL

That pods are kubernetes jobs which are simple pods, without some volume configurations. Just do some job and stop the pod. We keep last 1 job execution, so on the next job execution the last one is deleted by Kubernetes. We notice this also on other nodes, not only this one:

docker version
Client:
 Version:           19.03.13-ce
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46
 Built:             Mon Oct 12 18:51:20 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.13-ce
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46
  Built:            Mon Oct 12 18:51:50 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

@albertca
Copy link

albertca commented Nov 19, 2021

I find the same problem in our systems using Debian 11 with Docker version 20.10.10, build b485636.

I had to stop docker, remove /var/lib/docker/* and start again to clean more than 800GB of garbage that none of the "docker * prune" commands managed to remove.

Also tried @thaJeztah suggestion of restarting without deleting in between and didn't work either.

/var/lib/docker/overlay2 was taking most of the space.

This is a CI server which creates thousands of containers and tens of images every day in parallel.

@albertca
Copy link

albertca commented Dec 7, 2021

Found a slightly less radical way of cleaning up space. It seems it does not require reinstalling docker-ce (maybe not even restart), but it still requires removing all images and containers as well as manually removing directories and files:

docker rm $(docker ps -a -q)
docker system prune --all --force
docker volume prune --force
rm -r /var/lib/docker/overlay2/*
mkdir /var/lib/docker/overaly2/l
rm -r /var/lib/docker/image/overlay2/layerdb/sha256/*
rm -r /var/lib/docker/image/overlay2/distribution/diffid-by-digest/sha256/*
rm -r /var/lib/docker/image/overlay2/distribution/v2metadata-by-diffid/sha256/*

Not 100% sure "l" directory needs to exist or overlay2 storage backend will recreate it if it does not exist.

If sha256 hashes are not removed "docker pull" will fail with

Error pulling index.docker.io/xxx - code: None message: failed to register layer: open /var/lib/docker/overlay2/ec0a92fdc4ceed583ec2c6c58cb8d6901248ccce1cfb3bd940b556cbdb8cc337/committed: no such file or directory

It looks like even if all images are removed hashes still exist. Maybe prune is failing to remove some image layers. Also tried with "docker image prune --force" and those hashes were not removed.

Now everything is cleaned up but will do some more testing in a few hours/days when the problem will appear again.

@ahmafi
Copy link

ahmafi commented Jun 4, 2022

I faced exactly a similar issue as @albertca but on ubuntu 22.04 and smaller scaled project, docker system prune -a -f doesn't delete everything, there is still a huge data in /var/lib/docker. Mostly on overlay2 (more than 1 GB) and buildkit (300 MiB) directories.

I have also another issue that might be related. docker system df throws an error:

Error response from daemon: error getting build cache usage: failed to get usage for wyflkzrds66xcz15asaar2oy2: snapshot m2pikxmra2nh862h3r8pwsshe not found

@imcom
Copy link

imcom commented May 24, 2023

I faced exactly a similar issue as @albertca but on ubuntu 22.04 and smaller scaled project, docker system prune -a -f doesn't delete everything, there is still a huge data in /var/lib/docker. Mostly on overlay2 (more than 1 GB) and buildkit (300 MiB) directories.

I have also another issue that might be related. docker system df throws an error:

Error response from daemon: error getting build cache usage: failed to get usage for wyflkzrds66xcz15asaar2oy2: snapshot m2pikxmra2nh862h3r8pwsshe not found

I am having the same issue with ubuntu 2004

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 23.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.4.0-147-generic
 Operating System: Ubuntu 20.04.4 LTS
 OSType: linux
 Architecture: x86_64

I suspect it is some undocumented mechanism from buildx that causes this. Like during the build, I bind mount or copy from a multi-stage build and then there is the diff left over cannot be deleted

@KES777
Copy link

KES777 commented May 27, 2023

I noticed strange behavior ("data-root": "/mnt/docker-overlay"):

# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           776M  3.6M  773M   1% /run
/dev/sda1        38G  8.4G   28G  24% /
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      253M  5.3M  247M   3% /boot/efi
/dev/sdb         40G   39G     0 100% /mnt/docker-overlay
/dev/sdc         20G   18G  1.1G  95% /mnt/docker-volumes
...

You can notice, that there is 0 Avail

After this command:

# docker builder prune --filter type=exec.cachemount
WARNING! This will remove all dangling build cache. Are you sure you want to continue? [y/N] y
Deleted build cache objects:
1erw53zpoava37g22r9l9b3ag
lxfv2owqupjekbzwcvq62ksty
zp77r9amevqqlp9m7w8l8hmd0

Total reclaimed space: 929.5MB

We can see that only 929MB reclamed, but!

# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           776M  3.8M  773M   1% /run
/dev/sda1        38G  8.4G   28G  24% /
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      253M  5.3M  247M   3% /boot/efi
/dev/sdb         40G   30G  7.8G  80% /mnt/docker-overlay
/dev/sdc         20G   18G  1.1G  95% /mnt/docker-volumes

df shows that now I have 7GB of additional space.

Here is du for overlay2 before and after command:
du.txt
du 2.txt

git diff -b -w --ignore-blank-lines /home/kes/r/du.txt
diff --git a/du.txt b/du.txt
index b753b3f04..5af3f0630 100644
--- a/du.txt
+++ b/du.txt
@@ -2,11 +2,7 @@
 36K ./afafa2df34f1e1ee3529efba45efca32fb3ddde21ba171f8de83666bb461c298
 274M    ./97bd1a33d95ad9faa4b8695f5751c6056f604976af451de51e254bb43656054d
 28K ./w89fq6lb51enzz44ooirhax3f
-18M ./1erw53zpoava37g22r9l9b3ag
 32K ./cf173d7b2715d6f88c2558532929b5aa5fcf2973050adaefa4e3dd57f8f4f839
 2.9G    ./1f26b6eca2af95e68624cf80efc3a0b48bc5a83f64347344158727fa4f70fbde
 128K    ./ta9t03096ng8mahyt21pdbn8o
 28K ./kiov5piz7hlqrq24lrvtb5ssw
@@ -169,7 +165,7 @@
 48K ./v2r9tep119wljbnbarc5bj6df
 32K ./2c476b69f8085c1c24c578219ed137a7465d4007b48f4efe18596e78782627fa
 62M ./1dfb0682550ca44f0884cfb5580c2af83e844c4296212e74d302b798ffb46448
-24K ./6b38cace61dd44e28747c234f8c3245cfba0fba9cc69878ec09b4c6d5b4bbc5c
+223M    ./6b38cace61dd44e28747c234f8c3245cfba0fba9cc69878ec09b4c6d5b4bbc5c
 224K    ./8bb868f1f3d94ce7d1641d060f748a5edce01dbdfa9ef29085a28548653fafc1
 64K ./a6b0daee090aefd11407fc966e54e9979fa933186c57f8c9f5f65faac671518b
 160M    ./1903d6fb405e1a7f64f48ee90f0e594e35a7784d5d33da8b78112aec4a61137d
@@ -281,7 +277,6 @@
 20K ./67fefb5ee5505d7175913134ae0cb73103185e02eefa5d7c5c6d3c3983693a5c
 16M ./5453a3aa27d82db717c4c7cf04b0c1b6f767563cdf5402d672aeaba80d03b164
 24K ./36690386f4a11ecab43a5bcc442dd89ecd738aed963384e77704afe0be55a2ad
-1.2G    ./zp77r9amevqqlp9m7w8l8hmd0
 24K ./7b8a08ce5b4922843340e26dcec4c4459a3b5ba0bf8348c2723b56f87c52fe53
 264M    ./8380d2297d2e68076c2d7f645059292171519f845a8124563e98099cb5193bdb
 32K ./a486836d2db6e8006a929d92c2ec09639b18dbc3f25829b3a0bfe72d947d4b75
@@ -507,7 +502,6 @@
 420K    ./xiiutzpxb6a3vfuzdu9qvsf3x
 28K ./lur1d5zutmqgnhf3lvnzq03ji
 16K ./lehrnp7y0uc0dmznyn896j07g
-20K ./lxfv2owqupjekbzwcvq62ksty
 337M    ./u6wrw63os2x42zzeeozsv6qjs
 44K ./8d8edbe58d74dd0836c77a27cd0ae7c418aa0d2368bcaa32105d40d62f4a4a5a
 160M    ./a7a39c59ffbd966b2b2bb35b91ef0fd6db85133546146570a134eefce62340d7
@@ -613,7 +607,4 @@
 1.1M    ./3xh2ylhfnx4ock8rin6wlvlzj
 56K ./p9qvw7zvuutnndwbqd9icbz9c
 1.7M    ./f4d374bc5324a832406d4919a3ca3d730b96612d04fc22527ca6f8fde21fa2b3
-33G .
+32G .

Also I noticed that Local Volumes size was changed:

# docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          39        30        13.47GB   2.93GB (21%)
Containers      30        30        604.8MB   0B (0%)
Local Volumes   6         2         8.55GB    1.288MB (0%)
Build Cache     334       0         3.633GB   3.633GB

# docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          39        30        13.47GB   2.93GB (21%)
Containers      30        30        604.8MB   0B (0%)
Local Volumes   6         2         465.8MB   1.288MB (0%)
Build Cache     331       0         2.704GB   2.704GB

But /mnt/docker-overlay/volumes did not changed in size:

/mnt/docker-overlay/volumes# du -h -d 1
3.2G	./pgdb_pgdb-data
20K	./nginx_html
8.0K	./runner-xiutsrjr-project-4-concurrent-0-cache-3c3f060a0374fc8bc39395164f415a70
445M	./be0c9786ab3db975be61f39b577d99f3b8af9efc126d0a16e1d1c6261064d3d5
1.8G	./office_gitlab-data
52K	./mail_mail-config
80K	./nginx_certs
8.0K	./runner-xiutsrjr-project-4-concurrent-1-cache-3c3f060a0374fc8bc39395164f415a70
768K	./mail_mail-data
1.5M	./runner-xiutsrjr-project-4-concurrent-1-cache-c33bcaa1fd2c77edfc3893b41966cea8
88K	./pdns_pdns-data
116K	./office_gitlab-config
24K	./office_gitlab-runner
1.6M	./runner-xiutsrjr-project-4-concurrent-0-cache-c33bcaa1fd2c77edfc3893b41966cea8
8.0K	./d25b66146a4d83102e0308c585555cded66dea7661694e68368f87003d5ab1cb
16K	./pdns_pdns-run
4.2M	./monaas_grafana-data
12M	./mail_mail-state
610M	./office_gitlab-logs
12K	./monaas_alertmanager-data
16K	./nginx_vhost.d
184K	./nginx_acme.sh
8.5M	./portainer_portainer-data
449M	./monaas_victoria-data
16K	./office_redmine-plugins
12G	./monaas_loki-data
2.7M	./office_redmine-files
328K	./office_openvpn-data
116K	./mail_mail-logs
18G	.

@rhoban13
Copy link

Seeing the same. I've no running containers, and after running every variant of docker system prune -a -f, docker image prune -a the overlay2 directory is not cleaned up:

$ sudo docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          0         0         0B        0B
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B
$ sudo ls /var/lib/docker/overlay2 | wc -l
30

The only workaround I've found is to stop the docker daemon, forcibly remove /var/lib/docker and restart the daemon.

@jackgray
Copy link

jackgray commented Nov 28, 2023

Even after a complete reinstallation of docker, I am continuously running into space issues. Drive filled up and services crashed 3 times this week. I am developing a rather large VM container and the build cache does not clear (removing images does not free space either). Docker v24 / Ubuntu 20 Focal Fossa (5.15.0-89-generic #99~20.04.1-Ubuntu
Docker version 24.0.7, build afdd53b)

TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 60 11 45.24GB 39.42GB (87%)
Containers 11 10 33.41GB 652.4MB (1%)
Local Volumes 49 1 10.92GB 10.92GB (100%)
Build Cache 2180 0 284.4GB 284.4GB

$ du -sh /var/lib/docker

440GB

Unfortunately, sudo service docker restart does not free the space either, even though I just ran docker image rm on about 60GB of images.

docker buildx prune -f seems to be helping in removing the build cache, but it isn't ideal that I have to remove the only 20GB I care about (and wait hours to rebuild images I'm actively developing) to remove the 98% of garbage I don't want.

@thaJeztah
Copy link
Member

thaJeztah commented Nov 28, 2023

@jackgray your issue does not look directly related to this ticket. This ticket is about cases where storage remains to be used after content is pruned / deleted (so docker system du shows "no space taken", but checking the storage shows that there's still content present.

Discussing all options would be out of scope for this bug report, but with BuildKit as builder, "build-cache" is separate from the image store itself, so removing images won't clean up the build cache (docker system prune, docker builder prune or docker buildx prune will clean those up).

You also may be interested in the daemon configuration for garbage-collecting of build-cache and retention policies that can be configured; https://docs.docker.com/build/cache/garbage-collection/

(but more in-depth discussion would probably be better for a discussion, either in the BuildKit repository (https://github.com/moby/buildkit/discussions) or this repository (https://github.com/moby/moby/discussions)

@jackgray
Copy link

@thaJeztah that was extremely thoughtful and helpful, thank you. I learned a lot about docker mechanics through this :)

@doy-materialize
Copy link

i'm still also seeing this:

$ docker system prune -af --volumes
Total reclaimed space: 0B
$ docker image prune -af
Total reclaimed space: 0B
$ docker volume prune -af
Total reclaimed space: 0B
$ docker builder prune -af
Total:  0B
$ docker buildx prune -af
Total:  0B
$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          0         0         0B        0B
Containers      0         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B
$ sudo systemctl restart docker
$ sudo du -sh /var/lib/docker/overlay2
114G    /var/lib/docker/overlay2

is it possible that regularly running out of disk space might cause this? i'm working with dockerfiles that produce a huge amount of data and my machine runs out of disk space a lot, and i have to regularly prune the docker cache, but maybe docker is losing track of image data when it runs out of disk space during an operation?

@traliotube
Copy link

Similar issue here, I have ran docker system prune -a -f multiple times and have maybe cleared 1-2 gb at max,

root@xxxx:/var/lib/docker# docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          18        18        5.787GB   82.15MB (1%)
Containers      20        20        305.7MB   0B (0%)
Local Volumes   12        10        1.776GB   47.72MB (2%)
Build Cache     0         0         0B        0B

Adds up to 7.8GB

root@xxxx:/var/lib/docker# du -shc *
1.3M    buildkit
1.3M    containers
4.0K    engine-id
22M     image
284K    network
14G     overlay2
16K     plugins
4.0K    runtimes
4.0K    swarm
4.0K    tmp
1.8G    volumes
16G     total
root@fabdocker:/var/lib/docker# du -sh overlay2/
14G     overlay2/

The /diff paths add up to 6.8GB but the total disk usage is still 9GB more than I would like it to be.
I have also gone through #33775 and cleared as much as possible but still there is unnecessary disk usage.

root@xxxx:/var/lib/docker# docker version
Client:
 Version:           24.0.5
 API version:       1.43
 Go version:        go1.20.3
 Git commit:        24.0.5-0ubuntu1~22.04.1
 Built:             Mon Aug 21 19:50:14 2023
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          24.0.5
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.3
  Git commit:       24.0.5-0ubuntu1~22.04.1
  Built:            Mon Aug 21 19:50:14 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.2
  GitCommit:
 runc:
  Version:          1.1.7-0ubuntu1~22.04.2
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:

I am using ubuntu 22.04 LTS if it changes anything.

@thaJeztah
Copy link
Member

Does that machine run many docker builds? If so, are you able to reproduce on v26.0.0 (assuming starting from a fresh state)? I recall #46136 was only partially fixed (in BuildKit) for older versions, but there was a follow-up fix in v26.0.0; #46136 (comment). From the output of your docker version I think you may be running the distro-built packages of docker (which may differ from Docker's packages), and not sure if they provide current versions, so if you have a test-environment in which you can reproduce, you can try the installation instructions from https://docs.docker.com/engine/install/ubuntu/

@traliotube
Copy link

Thanks for the quick reply, I ran a build just once on this machine, never after.

I installed using apt install docker.io, which I assume is the distro-built package, unfortunately I do not have a test environment to run a different install one.

@thaJeztah
Copy link
Member

Oh, right, I just notice that in your case, there's still content in use (as reported by docker system df). Trying to get the same size reported as docker (through linux du) can be complicated, when taking overlay / copy-on-write filesystems into account (e.g. a layer may have files that are removed in another layer, but with a copy-on-write filesystem, the removed files are still stored (just inaccessible from layers on top).

@traliotube
Copy link

So, do you mean, I do not have any excess disk space being used? Sorry for any misunderstandings!

@thaJeztah
Copy link
Member

Hard to tell. There for sure have been some cases where content wasn't properly cleaned up in some situations (the ticket I linked in my previous comment being one of them). docker system df also doesn't list all content (container log-files being one that should still be considered, as it can be relevant for some containers with high-volume logs, and no log-rotation configured for their logging-driver).

That said, du can easily show incorrect values in many situations; also see #38848 (comment) and #38848 (comment)

@fayak
Copy link

fayak commented Apr 15, 2024

I've written and used this small bash script to try to detect layers in overlay2 directory that survived docker system prune -af --volumes even if they are not used, while keeping the layers actually used by running containers and stored images.

I've also had the issue on many machines, even in some cases where docker volume ls, docker container ps -aq and docker image ls -a showed literally no output but there were still dozens of GiB in overlay2 dir

I'm not 100% sure it doesn't cause any trouble tho, so be careful while using it. I'd like to have a feedback on this as well

@fayak
Copy link

fayak commented Apr 15, 2024

I'm also hitting a case on a server where docker system df reports reclaimable space, but docker image prune -af does nothing:

/!\ root:/srv/docker/lib# docker image prune -af
Total reclaimed space: 0B
/!\ root:/srv/docker/lib# docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          66        66        17.6GB    3.759GB (21%)
Containers      119       119       34.9MB    0B (0%)
Local Volumes   1         1         2.829MB   0B (0%)
Build Cache     0         0         0B        0B

@pjonsson
Copy link

pjonsson commented Apr 16, 2024

@thaJeztah after 6 days with a freshly installed Docker 26.0.0 on Ubuntu 22.04 LTS, there's still ~20G difference between what docker system df reports and the size of /var/lib/docker (almost all those gigabytes in overlays2). See moby/buildkit#3635 (comment) for technical details.

Edit: the machines have nightly cron jobs that run docker image prune -af --filter "until=30h", but no images are usually built when that job runs. There is also a cache-retention policy in daemon.json that keeps size of the things Docker is aware of in check:

  "builder": {
    "gc": {
      "enabled": true,
      "defaultKeepStorage": "40GB"
    }
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests