Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devmapper snapshotter does not release disk space when a container exits #5691

Closed
Kern-- opened this issue Jul 6, 2021 · 5 comments · Fixed by #5756
Closed

Devmapper snapshotter does not release disk space when a container exits #5691

Kern-- opened this issue Jul 6, 2021 · 5 comments · Fixed by #5756
Labels

Comments

@Kern--
Copy link
Contributor

Kern-- commented Jul 6, 2021

Description
When using the devmapper snapshotter, any disk space used by a container during it's execution will remain allocated in the underlying device after the container exits. In development environments where the devmapper snapshotter docs suggest loopback devices, users will see what appears to be disk space leak in their root filesystem when space is not freed after exiting a container.

As an example, I ran a test where I launched a container, wrote 500MB of random data, and saw that the 500MB remained allocated after the container exited (details in steps to reproduce)

# Before the container is run, the host has 5 GB of space available
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme4n1p1   16G  9.9G  5.0G  67% /

# After the container exits, the host has 4.6 GB 
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme4n1p1   16G   11G  4.6G  70% /

The underlying space does get reused by the thin pool - if I repeatedly launch a container, write 500MB, then exit the container, no additional space is allocated in the underlying device.

The space can be returned to the root filesystem by completely removing the thin-pool and loopback devices, but that means deleting all containerd data as well.

It looks like docker had this same issue in moby/moby#3182 which was tracked down to be that dmsetup does not issue discards to the underlying device when a thin device is removed (https://bugzilla.redhat.com/show_bug.cgi?id=1043527). The docker workaround was to issue BLKDISCARD ioctls to thin devices on removal. Would a similar approach be appropriate here?

Steps to reproduce the issue:

  1. Setup a new thin-pool using the loopback device script provided by the devmapper snapshotter readme
  2. Start containerd with the following config
oom_score = -999

[debug]
        level = "debug"

[metrics]
        address = "127.0.0.1:1338"

[plugins.linux]
        runtime = "runc"
        shim_debug = true

[plugins.devmapper]
	pool_name = "devpool"
	root_path = "/var/lib/containerd/devmapper"
	base_image_size = "10GB"
	async_remove = false
  1. Pull an image
$ sudo ctr image pull --snapshotter devmapper docker.io/library/busybox:latest
  1. Check available host disk space
$ df -h /
  1. run a container
sudo ctr  run --snapshotter devmapper --rm --tty --net-host docker.io/library/busybox:latest busybox-test
  1. inside the container, write 500MB of random data to a file
# head -c 500000000 </dev/urandom > random.txt
  1. outside the container verify that 500 MB of disk space have been utilized
$ df -h /
  1. exit the container
  2. verify that the 500 MB of used disk space was not released
$ df -h /

Describe the results you received:
The devmapper snapshotter did not release disk space when the container exited.

Describe the results you expected:
I expected the devmapper snapshotter to release disk space when the container exited.

What version of containerd are you using:

$ containerd --version
containerd containerd.io 1.4.6 d71fcd7d8303cbf684402823e425e9dd2e99285d

Any other relevant information (runC version, CRI configuration, OS/Kernel version, etc.):

runc --version
$ runc --version
runc version 1.0.0-rc95
commit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
spec: 1.0.2-dev
go: go1.13.15
libseccomp: 2.3.3
uname -a
$ uname -a
Linux cloudinstance 4.19.0-17-cloud-amd64 #1 SMP Debian 4.19.194-1 (2021-06-10) x86_64 GNU/Linux
@kzys
Copy link
Member

kzys commented Jul 6, 2021

Can you try that with non-loopback devices? I'd like to know whether the issue is depending on that or not.

@Kern--
Copy link
Contributor Author

Kern-- commented Jul 7, 2021

I'm not entirely sure how to perform the test without a loopback device. I have a setup with an LVM thin-pool backed by separate ssd attached to my EC2 instance, but I'm not sure how to check if exiting the container releases the physical disk sectors.

The LVM tooling (lvs, pvs, vgs) shows me thin-pool information, but not information about the physical device.

@mxpv
Copy link
Member

mxpv commented Jul 7, 2021

@Kern-- does containerd removes the snapshot when you exit container? (you should see a log message in debug mode about removal - https://github.com/containerd/containerd/blob/main/snapshots/devmapper/snapshotter.go#L274).

@mxpv
Copy link
Member

mxpv commented Jul 7, 2021

Also have you tried any workarounds by calling dmsetup manually? (clear, pause/resume, etc) ?

@Kern--
Copy link
Contributor Author

Kern-- commented Jul 8, 2021

Yes, I do see the snapshot being removed:

Jul 08 01:30:48 ip-172-31-26-39 containerd[54552]: time="2021-07-08T01:30:48.052784172Z" level=debug msg=remove key=default/3/busybox-test
Jul 08 01:30:48 ip-172-31-26-39 containerd[54552]: time="2021-07-08T01:30:48.197308324Z" level=debug msg="removed snapshot" key=default/3/busybox-test snapshotter=devmapper

Just now I tried suspend/resume and clear with no change. I expect Avail to go back to 5.0G if the space is freed. Is this what you meant?

$ sudo dmsetup suspend /dev/mapper/devpool 
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme4n1p1   16G   11G  4.5G  71% /
$ sudo dmsetup resume /dev/mapper/devpool 
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme4n1p1   16G   11G  4.5G  71% /
$ sudo dmsetup clear /dev/mapper/devpool 
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme4n1p1   16G   11G  4.5G  71% /

What does consistently work is blkdiscard on the thin device before exiting the container (of course this also breaks the running container, but the use-case I'm looking at would be just before the thin device is removed anyway):

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme4n1p1   16G   11G  4.5G  71% /
$ sudo blkdiscard /dev/mapper/devpool-snap-4 
$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme4n1p1   16G   10G  5.0G  67% /

It does seem that the thinpool is reusing the physical space before allocating more. E.g. The following will also recover the disk space:

  1. Launch container (call it's thin device snap-1)
  2. write 500MB
  3. exit container
  4. launch container (call it's thin device snap-2)
  5. write 500MB
  6. blkdiscard snap-2
  7. exit container.

Kern-- added a commit to Kern--/containerd that referenced this issue Jul 21, 2021
dmsetup does not discard blocks when removing a thin device. The unused blocks
are reused by the thin-pool, but will remain allocated in the underlying
device indefinitely. For loop device backed thin-pools, this results in
"lost" disk space in the underlying file system as the blocks remain allocated
in the loop device's backing file.

This change adds an option, discard_blocks, to the devmapper snapshotter which
causes the snapshotter to issue blkdiscard ioctls on the thin device before
removal. With this option enabled, loop device setups will see disk space
return to the underlying filesystem immediately on exiting a container.

Fixes containerd#5691

Signed-off-by: Kern Walster <walster@amazon.com>
fahedouch pushed a commit to fahedouch/containerd that referenced this issue Oct 15, 2021
dmsetup does not discard blocks when removing a thin device. The unused blocks
are reused by the thin-pool, but will remain allocated in the underlying
device indefinitely. For loop device backed thin-pools, this results in
"lost" disk space in the underlying file system as the blocks remain allocated
in the loop device's backing file.

This change adds an option, discard_blocks, to the devmapper snapshotter which
causes the snapshotter to issue blkdiscard ioctls on the thin device before
removal. With this option enabled, loop device setups will see disk space
return to the underlying filesystem immediately on exiting a container.

Fixes containerd#5691

Signed-off-by: Kern Walster <walster@amazon.com>
yylt pushed a commit to yylt/containerd that referenced this issue Jun 14, 2022
dmsetup does not discard blocks when removing a thin device. The unused blocks
are reused by the thin-pool, but will remain allocated in the underlying
device indefinitely. For loop device backed thin-pools, this results in
"lost" disk space in the underlying file system as the blocks remain allocated
in the loop device's backing file.

This change adds an option, discard_blocks, to the devmapper snapshotter which
causes the snapshotter to issue blkdiscard ioctls on the thin device before
removal. With this option enabled, loop device setups will see disk space
return to the underlying filesystem immediately on exiting a container.

Fixes containerd#5691

Signed-off-by: Kern Walster <walster@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants