Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker's SizeRw does not get updated by runsc #10256

Open
LarsSven opened this issue Apr 8, 2024 · 6 comments
Open

Docker's SizeRw does not get updated by runsc #10256

LarsSven opened this issue Apr 8, 2024 · 6 comments
Labels
type: bug Something isn't working

Comments

@LarsSven
Copy link

LarsSven commented Apr 8, 2024

Description

Docker's inspect has a field called SizeRw (https://docs.docker.com/reference/cli/docker/inspect/#size) that tracks the amount of bytes changed compared to the base image. This field does not seem to be properly handled by runsc. While doing things like inserting files into the container seems to still properly update this, even when running runsc, having programs write to disk does not seem to be properly tracked by runsc

Steps to reproduce

Use this Python script:

import os

# Function to write 10MB of data to disk
def write_data_to_disk(filename):
    data = b'0' * 10 * 1024 * 1024  # 10MB of data
    try:
        with open(filename, 'wb') as file:
            file.write(data)
        print("Data written successfully.")
    except OSError as e:
        print("Error: There isn't enough space on disk to write the data.")
        exit(1)

if __name__ == "__main__":
    write_data_to_disk("test_data.bin")

In this container:

FROM python:3.12.2-alpine3.19

COPY write_to_disk.py /write_to_disk.py

CMD ["python", "/write_to_disk.py"]

After having run the container, you can observe the SizeRw using docker inspect --size <container_id>, where you can retrieve the container id from docker ps -a. The field SizeRw of the response then should be around 10MB, which it is not for runsc.

runsc version

runsc version release-20240401.0
spec: 1.1.0-rc.1

docker version (if using docker)

Docker version 26.0.0, build 2ae903e

uname

Linux 6.5.0-26-generic #26-Ubuntu SMP PREEMPT_DYNAMIC x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

@LarsSven LarsSven added the type: bug Something isn't working label Apr 8, 2024
@kevinGC
Copy link
Collaborator

kevinGC commented Apr 12, 2024

My guess: gVisor is using an internal overlay filesystem, so writing data modifies process memory instead of the host filesystem. And when docker inspect runs, it's getting SizeRw from the host (maybe the size of the container's mount namespace). You might be able to get an accurate size from within gVisor (not sure whether we track it), but docker inspect will unfortunately not know how to get that number anyways.

@LarsSven
Copy link
Author

Might there be any way that we could write data in a way that docker inspect would see it? Like some overlay filesystem configuration?

@EtiennePerot
Copy link
Contributor

I've walked through Docker's code that computes this field and it appears to eventually end up in this calcSize function which adds up the size of files on the host filesystem. Since gVisor's "top" part of the overlay lives in gVisor memory only, it's not on the host filesystem anywhere, so there's no way for this Docker code to count it, short of changing gVisor to actually write the overlay contents to the host filesystem (which would reduce I/O performance, and use extra disk space for no reason other than accounting).

If you do want the top overlay layer to live on the host filesystem, you can set the overlay outside of gVisor or Docker, and then expose that to the sandbox as a bind mount. (Of course Docker won't know about it either, but you can then manually track usage because you know where the top of the overlay is.)

@ayushr2
Copy link
Collaborator

ayushr2 commented Apr 30, 2024

Since gVisor's "top" part of the overlay lives in gVisor memory only, it's not on the host filesystem anywhere, so there's no way for this Docker code to count it

The overlay upper layer (gVisor-internal tmpfs) has a file backend (called "filestore") which lives on the host and hence is scannable by Docker. See "Self-Backed Overlay" section in https://gvisor.dev/blog/2023/05/08/rootfs-overlay/.

The overlay filestore is basically a really large-sized file which holds all the pages used by the upper layer. It is a sparse file (it is empty and is populated on demand). When the application creates a new file and writes to it, the size of the filestore file does not change, but its disk usage does. This observable by looking at stat.Blocks. stat.Size remains the same. The filestore file is resized only when stat.Blocks == stat.Size and more file size is needed for further allocations.

I think the issue is that the calcSize function is using stat.Size to calculate disk usage. It should use stat.Blocks. This is the same issue that occurred in containerd and was fixed by using stat.Blocks: containerd/continuity@bc5e3ed. So this need to be fixed in Moby.

@LarsSven
Copy link
Author

While the above are definitely good points, for anyone who runs into similar issues, setting --overlay2=none on runsc makes Docker properly track SizeRw again. Of course this has some performance caveats, but for our usecase this was actually quite a fitting resolution as we want our container disk performance to mimick real disk performance as much as possible

@ayushr2
Copy link
Collaborator

ayushr2 commented Apr 30, 2024

Actually https://docs.docker.com/reference/cli/docker/inspect/#size intends to track the size of files, not disk usage. So it is unclear whether a similar containerd fix (containerd/continuity@bc5e3ed) should be applied in this case as well. In the containerd case, we wanted disk usage (as you can see, the functions that were updated were diskUsage() and diffUsage(). And the disk usage stats were being used to impose storage limits. But Docker does not document that it wants the disk usage.

Not sure how useful the size stats are in themselves (since as described above you can have sparse files which can make the container filesystem look really large). But assuming Docker wants that, the --overlay2= setting breaks those size stats. By default, --overlay2=root:self. As mentioned above, if docker inspect --size is important to you, --overlay2=none will turn off the overlay optimizations and restore correct behavior for size stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants