Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow files and paths to be ignored when uploading context #2224

Closed
jellybob opened this issue Oct 15, 2013 · 78 comments · Fixed by #6579
Closed

Allow files and paths to be ignored when uploading context #2224

jellybob opened this issue Oct 15, 2013 · 78 comments · Fixed by #6579

Comments

@jellybob
Copy link

I'm attempting to Dockerize a fairly large Rails application, with the intent that during development I can just run docker build locally, while for deployment the CI server will do so.

In either case though it currently picks up things like the log directory, which can grow to several gigabytes of large files which I'm never going to want included in an image.

Is it possible to tell Docker to ignore that directory when uploading the build context? If not, what would be involved in making it happen? I've never used Go, but given a pointer in the right direction I'm happy to give it a shot.

@crosbymichael
Copy link
Contributor

@jellybob Why is the log dir in the same path as your source code ? Is it a problem to move the dir out? How does the CI server get code ?

I guessing the fix for this is some type of .gitignore files for docker but any way around this could get messy.

@tianon
Copy link
Member

tianon commented Nov 28, 2013

This is fairly standard default behavior for web frameworks like this, so +1 for some method of ignoring specific files, even if it's just a "docker build" flag.

@jellybob
Copy link
Author

As @tianon pointed out this is the default setup for Rails. In production we symlink that directory to somewhere more sensible, but running locally its nice to have everything self contained.

The CI server obtains code by doing a git checkout, so wouldn't have all the logs sitting in the same directory, but I don't want to have to delete it every time I try building the image on my own machine to test changes.

@tianon
Copy link
Member

tianon commented Nov 29, 2013

So, after thinking about it more, this can't be a "docker build" flag for the same reason we only have one such flag right now - it breaks repeatability. So if we add such functionality, it'll have to be a file, and I certainly can't think of a good name that's consistent with Dockerfile, except something like .dockerignore, which looks and feels kind of lame, IMO. :)

@tianon
Copy link
Member

tianon commented Nov 29, 2013

(but I do still agree we need this feature somehow!)

@jpadvo
Copy link

jpadvo commented Dec 9, 2013

I'd love to see this feature! I've been keeping a Dockerfile in the root of a project, because that is the sensible place for this project. It needs to load the project into the context.

It bites though, because I'm using Git. So my entire git history gets loaded into the context. Not a dealbreaker, but it doesn't feel very clean.

@vreon
Copy link
Contributor

vreon commented Dec 9, 2013

If we want this to be repeatable across builds, I'd argue the place for it is in the Dockerfile -- though that means the client needs to parse it and cooperate.

I really like how this reads:

CONTEXT [<pattern>, ...]

but a blacklist approach, a la .gitignore, would be way better for maintainability:

EXCLUDE [<pattern>, ...]

@SvenDowideit
Copy link
Contributor

@vreon @tianon don't we already have this feature - what happens when you put a VOLUME ["log dir"] before you run?

(I suspect this will need its own issue, but:


docker@boot2docker:~/src/dockerfiles/test-volume$ docker run -t -i -rm svendowideit/test-volume ls /var/lib/twiki
AUTHORS                      lib
COPYING                      locale
COPYRIGHT                    pub
INSTALL.html                 pub-htaccess.txt
LICENSE                      readme.txt
TWiki-4.2.4.tgz              robots.txt
TWikiHistory.html            root-htaccess.txt
TWikiReleaseNotes04x02.html  subdir-htaccess.txt
TWikiUpgradeGuide.html       templates
bin                          tools
data                         twiki_httpd_conf.txt
index.html                   working
docker@boot2docker:~/src/dockerfiles/test-volume$ more Dockerfile 
#
# test if an image contains the files that are in a VOLUME (after the VOLUME)
#
# docker build -t svendowideit/test-volume .

FROM            busybox
MAINTAINER      Sven Dowideit <SvenDowideit@home.org.au>

#VOLUME ["/var/lib/twiki"]
ADD     TWiki-4.2.4.tgz /var/lib/twiki/
WORKDIR /var/lib/twiki
RUN     tar zxvf TWiki-4.2.4.tgz

but if you add the VOLUME, you get something different?

docker@boot2docker:~/src/dockerfiles/test-volume$ docker run -t -i -rm svendowideit/test-volume ls /var/lib/twiki
TWiki-4.2.4.tgz
docker@boot2docker:~/src/dockerfiles/test-volume$ more Dockerfile 
#
# test if an image contains the files that are in a VOLUME (after the VOLUME)
#
# docker build -t svendowideit/test-volume .

FROM            busybox
MAINTAINER      Sven Dowideit <SvenDowideit@home.org.au>

VOLUME  ["/var/lib/twiki"]
ADD     TWiki-4.2.4.tgz /var/lib/twiki/
WORKDIR /var/lib/twiki
RUN     tar zxvf TWiki-4.2.4.tgz

it seems that the ADD statement gets added to the underlying image, whereas the output of RUN goes into the VOLUME, and thus is not contained in the pushed image - see https://index.docker.io/u/svendowideit/test-volume/

@tianon
Copy link
Member

tianon commented Dec 11, 2013

Ok, let me paint a better picture. You've got a huge ruby on rails application, and you've been developing on it for a while, which involves running it and testing against it frequently. At this point, your application directory has a "log" folder in it that contains a very large development.log file. When you "docker build", you now get to wait while that "development.log" file gets uploaded to the backend, even though you're not going to want it included it in the image.

Another great example would be the associated SQLite database for that same application. Since you've been doing development, it's bloated with a lot of very large records so that you can easily test against many different scenarios. It is just a test database however, so you don't include it in your Docker image and you just get to suffer while it's uploaded, or move it outside the context temporarily every time you want to build (which can be even more annoying than just waiting, IMO).

@SvenDowideit
Copy link
Contributor

mmm, that means the client needs to parse the Dockerfile - perhaps we should collect reasons / use cases that would benefit from that change.

or could this be a docker build -exclude=logs parameter?

I'll move my VOLUME/ADD quandry to another issue.

@tianon
Copy link
Member

tianon commented Dec 12, 2013

I'd be fine with an -exclude or -ignore parameter, but that's certainly more tedious than is ideal, especially since the ignore is something that would probably apply to anyone using the Dockerfile.

@interlock
Copy link

Really new to docker, but this issue caught my eye. We are rolling up a project managed in git with many submodules. Looking at 300mb of source code and assets, but 1.1GB of .git object graphs/data. Definitely don't need those in the docker context.

My suggestion is a .dockerignore file, mirroring how git does it.

The work around for us right now will probably be a clean checkout, fetch the submodules and then clean out all the .git directories.

@sudosurootdev
Copy link
Contributor

I totally agree with a .dockerignore file like @interlock is talking about, which is exactly what I need. My reason is size and confidential files within my project, which I do not want to push to the public registry.

@interlock
Copy link

Looks like archive/archive supports passing an exclude list. I'll write up a simple dirty line reader and pass those along to the initial context Tar file creation. dives in

@aldanor
Copy link

aldanor commented Feb 14, 2014

+1, a .dockerignore would be the simplest solution to ignore the folder like .git, .vagrant, node_modules, bower_components etc which would otherwise be uploaded as context. Putting a Dockerfile in a separate subfolder works as a workaround but is a bit ugly.

@tmc
Copy link
Contributor

tmc commented Feb 17, 2014

Sorry about the noise, patch should be pretty close.

@danhixon
Copy link

+1 for this solution.
My rails app has a tmp that weighs 1.5GB folder that I'd love docker to ignore.

$ docker build danhixon/rails-app .
Uploading context 1.683 GB

YIKES!

tmc added a commit to tmc/docker that referenced this issue Jun 13, 2014
Fixes moby#2224

Docker-DCO-1.1-Signed-off-by: Travis Cline <travis.cline@gmail.com> (github: tmc)
@mascip
Copy link

mascip commented Jun 14, 2014

+1.

In the meantime I'm going to try @disposable-ksa98 's context/ folder for a new project.

@mascip
Copy link

mascip commented Jun 14, 2014

PS: in the end I have put the Dockerfile and all its context in ./docker, and I run

docker build -t some_name ./docker

Now "Uploading context" takes a fraction of a second, instead of more than a minute.

It seems to work alright. Let me know if there's reasons why it would be a bad thing to do.

@cressie176
Copy link
Contributor

It's not bad, but when you copy files into the context as part of a
build it will bust dockers cache. To get round this I've switched to using
rsync so I only copy files if they change. It's getting a bit like death by
1000 cuts.

On Saturday, 14 June 2014, Pierre Masci notifications@github.com wrote:

PS: in the end I have put the Dockerfile and all its context in ./docker,
and I run

docker build -t some_name ./docker

Now "Uploading context" takes a fraction of a second, instead of more than
a minute.

It seems to work alright. Let me know if there's reasons why it would be a
bad thing to do.


Reply to this email directly or view it on GitHub
#2224 (comment).

http://peaceoneday.org/

@domachine
Copy link

👍 for this feature!

@cscetbon
Copy link

+1 for this too. We could exclude a python virtual environment when uploading the context with that feature !

@benjamine
Copy link

+1 this is really important, uploading logs and .git folders on every build makes it terrible, unless you end up restructuring your project folders to workaround this.

@blueyed
Copy link

blueyed commented Jun 26, 2014

Being notified again because of a +1, here is a workaround (mentioned before).

It's rather trivial to use a Makefile or another build tool for this:

build_docker:
    mkdir -p build/dockercontext \
        && cp -a Dockerfile and what you need build/dockercontext \
        && cd build/dockercontext \
        && docker build ...
.PHONY: build_docker

Then make docker_build will build your docker image.

This also allows you to use a template for the Dockerfile, e.g. just via sed -i s/FOO/bar/ Dockerfile.template > build/dockercontext/Dockerfile.
Apart from using cp -a or rsync to keep timestamps and therefore the Docker cache, you can use cmp to only copy a processed Dockerfile.template in case it differs.

You could then also use a stamp file (instead of the phony target) and define your input files as prerequisites for it, allowing you to only (re)build the docker image if files have changed.

@cscetbon
Copy link

I understand the workaround. The +1 is here to get this feature included without having to modify our currently directories structure or using the trick you provide just to build the image

vieux pushed a commit to vieux/docker that referenced this issue Jun 26, 2014
Fixes moby#2224

Docker-DCO-1.1-Signed-off-by: Travis Cline <travis.cline@gmail.com> (github: tmc)
@cscetbon
Copy link

Thanks !

@shuhaowu
Copy link

shuhaowu commented Jul 6, 2014

Is it possible to docker ignore something like all of the pyc files in all subdirs? I did *.pyc but it only ignores the files i nthe top level directory.

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 7, 2014

@shuhaowu Yeah, you are right. Pretty annoying.

*/*.pyc
*/*/*.pyc

Only this way now :(

@jwaldrip
Copy link

Does */.pyc work?

@pstiasny
Copy link

pstiasny commented Sep 8, 2014

Does */.pyc work?

Nope.

@interlock
Copy link

Unfortunately the simple functionality to ignore is a pure pass through to tar, so smart matchers from VCS’s won’t work.

For more reference, look at the tar man page here: http://www.gnu.org/software/tar/manual/html_node/exclude.html

Notably, tar supports matching VCS pattern files but you have to explicitly tell it which kind. 

Sent from a hot air balloon

On Mon, Sep 8, 2014 at 7:32 AM, Paweł Stiasny notifications@github.com
wrote:

Does */.pyc work?

Nope.

Reply to this email directly or view it on GitHub:
#2224 (comment)

@dweinstein
Copy link

I always liked how mercurial included a syntax line that allowed either glob or regexp syntax. Perhaps there could be a line in the docker ignore that selected which tar VCS ignore format to expect.

@tianon
Copy link
Member

tianon commented Sep 8, 2014

This is actually using Go's glob support (see
http://golang.org/pkg/path/filepath/#Match).

@synotna
Copy link

synotna commented May 20, 2015

Following @interlock's suggestion, my workaround is tarball the code, making use of tar's much better exclude support, and ADD the tarball in the Dockerfile (which automatically extracts), i.e. https://gist.github.com/synotna/6c2ea7e4350160b22748

@interlock
Copy link

@synotna what version of docker are you using. I believe 1.4.1 and up have tar style ignores with .dockerignore. If you are stuck in a docker version lower than that, the tar method works great. You can also stream the tar in to docker build using - instead of . as the source.

tar --exclude-from=.gitignore -C code -c | docker build -t my-image -

Or something along those lines.

@synotna
Copy link

synotna commented May 20, 2015

antony@do:~/docker-testing$ docker --version
Docker version 1.6.2, build 7c8fca2

antony@do:~/docker-testing$ find | grep ".pyc"
./code/blah2.pyc
./blah.pyc

Dockerfile

FROM python:3.4.3
RUN mkdir -p /var/app/
WORKDIR /var/app/
ADD . .

.dockerignore

*.pyc

antony@do:/docker-testing$ docker build -t testing .
antony@do:
/docker-testing$ docker run --rm -it testing find | grep ".pyc"
./code/blah2.pyc

Dockerfile-tar

FROM python:3.4.3
RUN mkdir -p /var/app/
WORKDIR /var/app/
ADD docker.tar .

.gitignore

*.pyc

tar --exclude-from=.gitignore -cvf docker.tar .
antony@do:/docker-testing$ tar tvf docker.tar | grep "pyc"
antony@do:
/docker-testing$

@interlock
Copy link

@synotna If you must use the .gitignore, then thats the path you must take. If you don't mind copying/symlinking that .gitignore to .dockerignore you can save some commands.

@synotna
Copy link

synotna commented May 20, 2015

? My point is, .dockerignore is not as effective as .gitignore as it does not recurse subdirectories

The .dockerignore file only ignored one level of pyc file, .gitignore (which works with tar --exclude-from-file) ignored all levels

@duglin
Copy link
Contributor

duglin commented Feb 15, 2016

@synotna FYI you can now specify recursive patterns in .dockerignore
See: https://docs.docker.com/engine/reference/builder/#dockerignore-file

edited

@synotna
Copy link

synotna commented Feb 19, 2016

Awesome to hear, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.