New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New feature request: Selectively disable caching for specific RUN commands in Dockerfile #1996
Comments
I think the way to combat this is to take the point in the Dockerfile you do want to be cached to and tag that as an image to use in your future Dockerfile's |
But wouldn't this limit interleaving cached and non-cached commands with ease ? For e.g. lets say I want to update my repo and wget files from a server and perform bunch of steps in between - e.g. install software from the repo (that could have been updated) - perform operations on the downloaded file (that could have changed in the server) etc. What would be ideal is for a way to specify to docker in the Dockerfile to run specific commands without cache every time and the only reuse previous image if there is no change (for e.g no update in repo). Wouldn't this be useful to have ? |
What about CACHE ON and CACHE OFF in the Dockerfile? Each instruction would affect subsequent commands. |
Yeah, I'm using |
Can a container ID be passed to 'docker build' as a "do not cache past this ID" instruction? Similar to the way in which 'docker build' will cache all steps up to a changed line in a Dockerfile? |
I agree we need more powerful and fine-grained control over the build cache. Currently I'm not sure exactly how to expose this to the user. I think this will become easier with the upcoming API extensions, specifically naming and introspection. |
Would be a great feature. Currently I'm using silly things like |
Getting better control over the cache would make using docker from CI a lot happier. |
What about changing
|
I agree and suggested this exact feature on IRC. Except I think to preserve reverse compatibility we should create a new flag (say "--uncache") so we can keep --cached as a (deprecated) bool flag that resolves to "--uncache .*" On Fri, Feb 7, 2014 at 9:17 AM, Michael Crosby notifications@github.com
|
What does everyone else think about this? Anyone up for implementing the feature? |
I'm up for having a stab at implementing this today if nobody else has started? |
I've started work on it - wanted to validate the approach looks good.
One thing: as far as I can see, the flag/mflag package doesn't support string flags without a value, so I'll need to do some extra fiddling to support both |
I really think this ought to be a separate new flag. The behavior and syntax of Anyways, IANTM (I am not the maintainer) so these are just my personal thoughts. :) |
@tianon
|
Right, that's all fine. The problem is that
I also think we'd be doing ourselves a disservice by making "true" and "false" special case regex strings to solve this, since that will create potentially surprising behavior for our users in the future. ("When I use |
+1 for @wagerlabs approach |
@crosbymichael, @timruffles Wouldn't it be better if the author of the Dockerfile decides which build step should be cached and which should not? The person that creates the Dockerfile is not necessarily the same that builds the image. Moving the decision to the docker build command demands detailed knowledge from the person that just want to use a specific Dockerfile. Consider a corporate environment where someone just want to rebuild an existing image hierarchy to update some dependencies. The existing Dockerfile tree may be created years ago by someone else. |
+1 for @wagerlabs approach |
+1 for @wagerlabs approach although it would be even nicer if there was a way to cache bust on a time interval too, e.g.
I appreciate this might fly against the idea of containers being non deterministic, however it's exactly the sort of thing you want to do in a continuous deployment scenario where your pipeline has good automated testing. As a workaround I'm currently generating cache busters in the script I use to run docker build and adding them in the dockerfile to force a cache bust
|
I'm looking to use containers for continuous integration and the ability to set timeouts on specific elements in the cache would be really valuable. Without this I cannot deploy. Forcing a full rebuild every time is much too slow. My current plan to work around this is to dynamically inject commands such as |
+1 for the feature. |
I also want to vote for this feature. The cache is annoying when building parts of a container from git repositories which updates only on the master branch. |
@hiroprotagonist Having a |
@amarnus I've solved it similar to the idea @tfoote had. I am running the build from a jenkins job and instead of running the docker build command directly the job starts a build skript wich generates the Dockerfile from a template and adds the line 'RUN echo currentsMillies' above the git commands. Thanks to sed and pipes this was a matter of minutes. Anyway, i still favor this feature as part of the Dockerfile itself. |
I agree that this feature would be very helpful. At the moment, I use the solution suggested above using the
This works fine. But the problem with this solution is that it requires you to remember to increment the |
Why was this issue closed? |
You can use the $RANDOM env variable. |
Would love this feature. |
For anyone that has the luxury to automate their builds, this is what I like to do: I put a placeholder in the Dockerfile template like: https://github.com/zkscpqm/Car-Zix/blob/master/Dockerfile_template#L9 which dictates where my cache ends. I then spawn a unique Dockerfile each time I do a build and I replace the placeholder with some hash: Hope this helps)) |
Nine years have there been any changes? In 2013, the CACHE ON and CACHE OFF commands were proposed. How is it now? |
Solutions I came up with in the interim:
or
I use these sorts of tricks a lot in my large Dockerfiles repo containing lots of different apps and builds, including packaging my GitHub repos tools, scripts and dependencies: https://github.com/HariSekhon/Dockerfiles These and other tricks are most succinctly shown in my master Dockerfile template in my Templates repo which has templates for lots of the most popular DevOps technologies like Make, Jenkins, GitHub Actions, Docker, Kubernetes etc...: https://github.com/HariSekhon/Templates/blob/master/Dockerfile |
Neat! Does the Thank you so much! |
@MaxTranced I've used https://timeapi.io/swagger/index.html http://worldtimeapi.org/pages/examples The latter seems like it can do week of the year as http://worldtimeapi.org/pages/schema This one has an API to return just the week: eg.
but it unfortunately also returns the execution time which would bust the cache on every request because the millisecond timing would be different. You might want to contact them and see if there is an option to not do that and point them to this thread as the Dockerfile use case. Another solution is to wrap your |
Thank you so much for the suggestions! I did search for a while but did not find the |
Using the How can I prevent caching of certain |
I tried using the same trick as but got the following error when building the image for the second time. Any clue ? |
Is the |
Which version of Docker is that happening for you? I've definitely used that before... perhaps the behaviour has been changed to reference a cache key load but I'm unsure how that could be interpreted that way given this is the current output of the sample URL I gave above: $ curl https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/refs/heads/master
{
"ref": "refs/heads/master",
"node_id": "MDM6UmVmNDUwNDkwMjY6cmVmcy9oZWFkcy9tYXN0ZXI=",
"url": "https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/refs/heads/master",
"object": {
"sha": "dc4b1ce2b2fbee3797b66501ba3918a900a79769",
"type": "commit",
"url": "https://api.github.com/repos/HariSekhon/DevOps-Python-tools/git/commits/dc4b1ce2b2fbee3797b66501ba3918a900a79769"
}
} Are you querying a different URL that is returning only a hashref that Docker is interpreting differently or are you targeting a |
@HariSekhon I'm also using BuildKit, This is a private repo, so I pass my GitHub personal access token as well, but I don't think this explains the difference |
greetings from the future I'm surprised they can't or don't want to implement it |
Simply running git clone as a layer meant that the cached repository is always used, even when the bench_repo has been changed. To work around this, we use the GitHub refs API to see if the repo has changed, to decide whether to use the cached bench_repo or make a new clone of it. The trick here is from this GitHub [issue comment](moby/moby#1996 (comment)) Also, this commit adds support for specifying a specific branch of the bench_repo to use for running the benchmarks. The branch can be specified using the `/tree/<branch-name>` suffix in the bench_repo URL.
Simply running git clone as a layer meant that the cached repository is always used, even when the bench_repo has been changed. To work around this, we use the GitHub refs API to see if the repo has changed, to decide whether to use the cached bench_repo or make a new clone of it. The trick here is from this GitHub [issue comment](moby/moby#1996 (comment)) Also, this commit adds support for specifying a specific branch of the bench_repo to use for running the benchmarks. The branch can be specified using the `/tree/<branch-name>` suffix in the bench_repo URL.
* Skip installing opam dependencies when using separate bench_repo * Ensure cached bench_repo is used only when repo has no changes Simply running git clone as a layer meant that the cached repository is always used, even when the bench_repo has been changed. To work around this, we use the GitHub refs API to see if the repo has changed, to decide whether to use the cached bench_repo or make a new clone of it. The trick here is from this GitHub [issue comment](moby/moby#1996 (comment)) Also, this commit adds support for specifying a specific branch of the bench_repo to use for running the benchmarks. The branch can be specified using the `/tree/<branch-name>` suffix in the bench_repo URL.
This is all theoretically nice, but then you enter a real world use cases in for example kubernetes and you want to be able to run the same image as both a job and a service for example. Then nothing like this works well since you are then forced to keep a bunch of variables and arguments up to date in various configuration files (e.g., yaml). If you have multiple repositories and change stuff frequently (development on cloud with containers with 100GB+ RAM) you realize a theory and a practice all two different things. And the only thing you wanted was to have an up-to-date git repository clone. |
@Aiosa I agree that with git repo clones you want to get an up to date clone... did you see my solution for that above, I thought it was quite novel: |
Sad that there's no current solution :(. I love using dev containers, but I have a couple of build commands in the Dockerfile that I wish not to be cached whenever I change my code. It's annoying to find that Docker has cached a past build. If only there were a way to specify to Docker only to try caching up to a certain build stage. |
Setting no-cache to specific commands can be done with a
|
branching off the discussion from #1384 :
I understand -no-cache will disable caching for the entire Dockerfile. But would be useful if I can disable cache for a specific RUN command? For example updating repos or downloading a remote file .. etc. From my understanding that right now RUN apt-get update if cached wouldn't actually update the repo? This will cause the results to be different than from a VM?
If disable caching for specific commands in the Dockerfile is made possible, would the subsequent commands in the file then not use the cache? Or would they do something a bit more intelligent - e.g. use cache if the previous command produced same results (fs layer) when compared to a previous run?
The text was updated successfully, but these errors were encountered: