Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple COPY statements in one layer is not possible #33551

Closed
loicgelle opened this issue Jun 6, 2017 · 11 comments
Closed

Multiple COPY statements in one layer is not possible #33551

loicgelle opened this issue Jun 6, 2017 · 11 comments
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/more-info-needed

Comments

@loicgelle
Copy link

loicgelle commented Jun 6, 2017

Hello all,

This issue because I am facing a problem that cannot be solved using the current features in Docker Edge. I am trying to use the new multi-stage build capabilities of Docker Edge to build complex software and I want a fine-grained way to extract files from that build.

For tracing purposes, I need to know exactly what is the output of my build. So supposing I want to extract a few headers files that I get from the build, this is not a satisfying solution:
COPY --from=foo /usr/include /usr/include

because I want to be able to do:

COPY --from=foo /usr/include/header_1.h /usr/include/header_1.h
...
COPY --from=foo /usr/include/header_n.h /usr/include/header_n.h

Of course the problem of this solution is twofold: it will create unnecessary layers, and it makes the build fails when Docker considers that the Dockerfile has too many statements.

It would be nice to have a way to perform a copy that is sematically equivalent to:

COPY src1 dest1 \
        src2 dest2 \
        ...
        srcn destn

In my case, precising the destination is even unnecessary because I want to replicate the directory structure created by previous build stages. Maybe an instruction tuple like EXPORT / IMPORT would be more convenient for these use cases.

@cpuguy83
Copy link
Member

cpuguy83 commented Jun 7, 2017

Suppose I have a Dockerfile like so:

FROM busybox AS main
RUN mkdir /test
WORKDIR /test
RUN let i=0; while true; do touch test$i; let i=i+1; if [ $i -eq 5 ]; then exit 0; fi; done
RUN let i=0; while true; do touch other$i; let i=i+1; if [ $i -eq 5 ]; then exit 0; fi; done
RUN ls -lh /test

FROM busybox
RUN mkdir /test
COPY --from=main /test/test* /test/
RUN ls -lh /test

The last line in the main lists all the file in /test as:

-rw-r--r--    1 root     root           0 Jun  7 13:42 other0
-rw-r--r--    1 root     root           0 Jun  7 13:42 other1
-rw-r--r--    1 root     root           0 Jun  7 13:42 other2
-rw-r--r--    1 root     root           0 Jun  7 13:42 other3
-rw-r--r--    1 root     root           0 Jun  7 13:42 other4
-rw-r--r--    1 root     root           0 Jun  7 13:42 test0
-rw-r--r--    1 root     root           0 Jun  7 13:42 test1
-rw-r--r--    1 root     root           0 Jun  7 13:42 test2
-rw-r--r--    1 root     root           0 Jun  7 13:42 test3
-rw-r--r--    1 root     root           0 Jun  7 13:42 test4

The last line of the Dockerfile lists all the files in /test as:

-rw-r--r--    1 root     root           0 Jun  7 13:42 test0
-rw-r--r--    1 root     root           0 Jun  7 13:42 test1
-rw-r--r--    1 root     root           0 Jun  7 13:42 test2
-rw-r--r--    1 root     root           0 Jun  7 13:42 test3
-rw-r--r--    1 root     root           0 Jun  7 13:42 test4

Does this work for you?

@loicgelle
Copy link
Author

Thanks for your proposal. Unfortunately it does not work for me, since I don't suppose anything about the multiple files that I copy: they do not necessarily share the same prefix, the same extension or the same directory.

It was probably unclear in the example that I gave you, sorry for that.

@cpuguy83
Copy link
Member

cpuguy83 commented Jun 7, 2017

@loicgelle In this case, I'd recommend grouping the files in the foo build using normal cp operations which can be chained together into a single RUN step and make the tree structure you want.

@loicgelle
Copy link
Author

@cpuguy83 I also considered this alternative option. But I think it is more like a workaround than a real long-term solution, for various reasons:

  • The main objective of the multi-stage COPY feature was precisely to limit orchestration overhead between the different build stages; it now appears to me that it is limited to certain use cases under its actual form, and could be extended to other interesting applications.
  • Grouping the files would probably suppose to rename them to avoid collisions, and would thus add overhead in terms of orchestration.
  • Your solution supposes that both stages contain a cp operation, although in my case I would like to start the last stage from scratch, precisely to make the separation clear between my build and how it could be used. I think that copying files should be doable without making any assumption about the tools that are actually present in the container.

@thaJeztah
Copy link
Member

thaJeztah commented Jun 8, 2017

I think the proposal for RUN --mount may fit your use case (see #32507), something like:

FROM busybox AS builder
RUN (steps that are done during the build stage)

FROM busybox

# Mount the image from "builder" stage at `/artifacts` during this run
RUN --mount=src=/,dest=/artifacts,from=builder \
   cp /artifacts/usr/include/header_1.h /usr/include/header_1.h \
&& cp /artifacts/usr/include/header_n.h /usr/include/header_n.h

@thaJeztah thaJeztah added the kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. label Jun 8, 2017
@loicgelle
Copy link
Author

@thaJeztah Thanks for your reply. Thing is I have no "cp" command during the last stage of the build since I start from scratch. Once again, I think that what could be done in terms of primitive operations (like RUN, COPY, ENV...) should not depend on the contents of the container itself.

@loicgelle
Copy link
Author

@cpuguy83 @thaJeztah I really believe that operations like COPY should not be limited by syntax or lack of binaries into the container (cp in this case). A syntax like proposed in my first post, that allows copying several files with fine-grained destination control in one layer, would be very helpful in the case of multi-stage builds.

@thaJeztah
Copy link
Member

Sorry for my late reply, I missed the part about from scratch in your earlier comment.

I see what you're asking, but really not sure we should add too much options to the COPY command. Let me expand on that; the COPY command originally was implemented to do very simple operations; add some local files to the build context.

Additional features were added (e.g. .dockerignore, wildcard support), but in many situations such features are not a "complete" match to user's expectations; .dockerignore can be slow; matching isn't 100% equal to (e.g.) .gitignore.

Likewise, the COPY statement itself has its limitations, such as:

While having multiple copy statements may be a limited scope change, it may be a sliding slope, people will demand more ("wildcards are broken", "cannot use .dockerignore").

Although the RUN --mount syntax is more complex/verbose, it also provides a lot more flexibility, and can address most of the above, given that it gives users full access to the tools they need (a regular shell, regular cp command, or any alternative they like).

For your situation, I think the build can be seen as three stages;

  • Stage 1: Add build tools, and compile/generate artifacts
  • Stage 2: Collect artifacts (could be combined with Stage 1)
  • Stage 3: Bundle / Ship

Which translates to something like:

FROM busybox AS builder
RUN (steps that are done during the build stage)

FROM busybox AS stage2

# Mount the image from "builder" stage at `/artifacts` during this run
RUN --mount=src=/,dest=/artifacts,from=builder \
   cp /artifacts/usr/include/header_1.h /release/usr/include/header_1.h \
&& cp /artifacts/usr/include/header_n.h /release/usr/include/header_n.h \
&& chown -r 123:456

# Multiple copy statements also should not be an issue in this stage
COPY --from=foo /usr/include/header_1.h /release/usr/include/header_1.h
COPY --from=foo /usr/include/header_n.h /release/usr/include/header_n.h

# Package
FROM scratch

COPY --mount=src=/release,dest=/release,from=stage2 \
   cp -r /release /

@loicgelle
Copy link
Author

@thaJeztah Looks like a good compromise! Thank you for your help, that closes the issue for me.

I still think that there should be a more flexible way of "exporting/importing" build artifacts between images, but this is more related to #32868

@thaJeztah
Copy link
Member

@loicgelle thanks! Yes, more improvements are coming, and don't hesitate to open issues if you have suggestions for improvement.

We can still reconsider changes in future, but with the changes mentioned in my comment, we expect many use cases will be covered.

@JanSurft
Copy link

JanSurft commented Nov 15, 2021

There were some typos in the suggestion by @thaJeztah

This worked for me, so for further reference:

FROM busybox AS builder
RUN (steps that are done during the build stage)

FROM busybox AS stage2

# Mount the image from "builder" stage at `/artifacts` during this run
RUN --mount=src=/,dst=/artifacts,from=builder \
   cp /artifacts/usr/include/header_1.h /release/usr/include/header_1.h \
&& cp /artifacts/usr/include/header_n.h /release/usr/include/header_n.h \
&& chown -r 123:456

# Multiple copy statements also should not be an issue in this stage
COPY --from=foo /usr/include/header_1.h /release/usr/include/header_1.h
COPY --from=foo /usr/include/header_n.h /release/usr/include/header_n.h

# Package
FROM scratch

RUN --mount=src=/release,dst=/release,from=stage2 \
   cp -r /release/* /

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/more-info-needed
Projects
None yet
Development

No branches or pull requests

5 participants