Cache --mount=bind output smartly #2821

zen0wu · 2022-04-23T21:41:50Z

When I saw --mount=bind, I assume it's a (much) better version of COPY/ADD and I assume what happens is, it smartly figures out which files my later command "touches", and only put those files in the cache layer.

This is useful for example, in a monorepo, I want to copy a small subset of files in all my modules into my docker context. I can just run a cp with a glob, and then only those files would be copied. There's some caveat here, but I assume cp won't actually read the file content (only stat them).

Currently what happens is, the entire mount source is considered as the cache and will bust the layer cache whenever any of the files change there (even tho it's not what's being copied into the image), as mentioned here: moby/moby#15858 (comment)

This would completely solve moby/moby#15858, in a much elegant/automatic way, and allow arbitrary linux commands.

The other way of solving this, is "not bust the cache" if the produced layer ends up exactly the same as before. I'm not sure how docker internal works to comment whether this is possible at all.

The text was updated successfully, but these errors were encountered:

zen0wu · 2022-04-24T04:44:31Z

Here's a quick repro demonstrating what I mean

Dockerfile

# syntax=docker/dockerfile:1.4

FROM alpine

RUN --mount=type=bind,target=/test,source=./test cp /test/a /a

ADD ./test/x /x

commands:

mkdir test
touch test/a test/x

# initial build
DOCKER_BUILDKIT=1 docker build .

# subsequent build without changing anything
# output says "CACHED [stage-0 2/3]" and "CACHED [stage-0 3/3]"
DOCKER_BUILDKIT=1 docker build . 

# add a file under test that's not being used by the build
touch test/y

# build again, cache is busted
DOCKER_BUILDKIT=1 docker build .

tonistiigi · 2022-04-26T02:37:47Z

Something like this was indeed discussed early on as one possible smarter caching strategy. I guess it isn't completely clear how many cases would benefit from it to justify the additional maintenance/development complexities. It also has some overhead. Doing it only for the user-defined mounts is an interesting twist on the original idea that might help with the overhead issue.

There are multiple components needed to make this work that are currently missing. When running the process we would need to capture what files the container accesses. Probably the only option in here would be to add a bunch of seccomp notifiers.

The other part is that this requires a different cache-key logic. After the process has completed we would need to store both the cache-key (digest computed from the files content) as well as the list of files that were accessed. This list could be quite big that may become an issue for remote cache backends. When checking cache matches before running process for the second time we can't just compute the content digest again but need to ask "what are the possible sets of files that we have existing cache keys for". Then we need to compute the cache key for all of these sets to see if any of them matches (I guess. How to make it not grow out of control?). The current cache logic only computes the cache key before running the process and can directly check if any records exist matching that key in any backend.

zen0wu changed the title ~~Make --mount=bind only consider actually files that are opened~~ Cache --mount=bind output smartly Apr 23, 2022

tonistiigi added kind/enhancement exp/expert help wanted labels Apr 26, 2022

DYefimov mentioned this issue Jun 3, 2022

COPY vs RUN --mount type=bind #2893

Closed

vilicvane mentioned this issue May 19, 2023

Optionally cache --mount=type=cache #3880

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache --mount=bind output smartly #2821

Cache --mount=bind output smartly #2821

zen0wu commented Apr 23, 2022 •

edited

zen0wu commented Apr 24, 2022

tonistiigi commented Apr 26, 2022

Cache --mount=bind output smartly #2821

Cache --mount=bind output smartly #2821

Comments

zen0wu commented Apr 23, 2022 • edited

zen0wu commented Apr 24, 2022

tonistiigi commented Apr 26, 2022

zen0wu commented Apr 23, 2022 •

edited