New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache --mount=bind output smartly #2821
Comments
Here's a quick repro demonstrating what I mean Dockerfile
commands:
|
Something like this was indeed discussed early on as one possible smarter caching strategy. I guess it isn't completely clear how many cases would benefit from it to justify the additional maintenance/development complexities. It also has some overhead. Doing it only for the user-defined mounts is an interesting twist on the original idea that might help with the overhead issue. There are multiple components needed to make this work that are currently missing. When running the process we would need to capture what files the container accesses. Probably the only option in here would be to add a bunch of seccomp notifiers. The other part is that this requires a different cache-key logic. After the process has completed we would need to store both the cache-key (digest computed from the files content) as well as the list of files that were accessed. This list could be quite big that may become an issue for remote cache backends. When checking cache matches before running process for the second time we can't just compute the content digest again but need to ask "what are the possible sets of files that we have existing cache keys for". Then we need to compute the cache key for all of these sets to see if any of them matches (I guess. How to make it not grow out of control?). The current cache logic only computes the cache key before running the process and can directly check if any records exist matching that key in any backend. |
When I saw
--mount=bind
, I assume it's a (much) better version of COPY/ADD and I assume what happens is, it smartly figures out which files my later command "touches", and only put those files in the cache layer.This is useful for example, in a monorepo, I want to copy a small subset of files in all my modules into my docker context. I can just run a
cp
with a glob, and then only those files would be copied. There's some caveat here, but I assume cp won't actually read the file content (only stat them).Currently what happens is, the entire mount source is considered as the cache and will bust the layer cache whenever any of the files change there (even tho it's not what's being copied into the image), as mentioned here: moby/moby#15858 (comment)
This would completely solve moby/moby#15858, in a much elegant/automatic way, and allow arbitrary linux commands.
The other way of solving this, is "not bust the cache" if the produced layer ends up exactly the same as before. I'm not sure how docker internal works to comment whether this is possible at all.
The text was updated successfully, but these errors were encountered: