Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARG before FROM in Dockerfile doesn't behave as expected #34129

Closed
Benjamin-Dobell opened this issue Jul 16, 2017 · 31 comments
Closed

ARG before FROM in Dockerfile doesn't behave as expected #34129

Benjamin-Dobell opened this issue Jul 16, 2017 · 31 comments

Comments

@Benjamin-Dobell
Copy link

Benjamin-Dobell commented Jul 16, 2017

Description

It's documented that ARG can appear before FROM, so that arguments may be substituted into image names etc.

Rather than having some ARG before and some ARG after FROM, for consistency I attempted to place all my ARG before FROM. However, to my surprise (after a lot of debugging) I determined that my arguments are always blank after FROM.

I believe the meta-arg functionality/refactoring may somehow be responsible:

239c53b

Steps to reproduce the issue:

  1. Produce a Dockerfile such as:
ARG environment
FROM alpine:3.5
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment
  1. Build the image and run the image, printing the value of environment ARG (stored in /value_of_environment):
docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment

Describe the results you received:

development

Describe the results you expected:

production

Additional information you deem important (e.g. issue happens only occasionally):

Altering the Dockerfile such that ARG comes after FROM i.e.

FROM alpine:3.5
ARG environment
ENV ENVIRONMENT=${environment:-development}
RUN echo "$ENVIRONMENT" > /value_of_environment

then running again:

docker run $(docker build -q --build-arg environment=production .) cat /value_of_environment

gives the expected output of production.

Output of docker version:

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:31:53 2017
 OS/Arch:      darwin/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:51:55 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 59
 Running: 0
 Paused: 0
 Stopped: 59
Images: 370
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 457
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.31-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 5.818GiB
Name: moby
ID: BCV5:MEMK:BYKI:I2IU:QY2V:5DRM:F2FP:JFAG:SM46:M2WJ:73YV:3KLP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 20
 Goroutines: 40
 System Time: 2017-07-16T19:58:09.054157098Z
 EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
@boaz0
Copy link
Member

boaz0 commented Jul 17, 2017

@thaJeztah correct me if I'm wrong.

@Benjamin-Dobell after investigating this, 239c53b is not the origin of this behavior.

Basically, after the FROM instruction all the build arguments are reset and thus aren't available in the Dockerfile.

From what I found the purpose of ARG before FROM is to use it inside the FROM instruction #31352

@thaJeztah
Copy link
Member

Yes, this doesn't look like a bug; see this pull request, which adds some more information docker/cli#333

@boaz0
Copy link
Member

boaz0 commented Jul 17, 2017

@thaJeztah I guess we can close this

@Benjamin-Dobell
Copy link
Author

Irrespective of whether this was implemented this way intentionally or it's a bug; I think it's a bit of a usability nightmare.

It's not clearly documented that this is the expected behaviour, and it makes for messy Dockerfile. But more importantly, it opens a pandora's box of confusing edge-cases.

What if I intend to use an ARG in both my FROM statement and after it? Am I expected to have multiple ARG statements referring to the same build-arg?

What happens if I use default value syntax ARG argument=some_value before FROM and just ARG argument after FROM? What is the expected value of argument after FROM if no argument build-arg was passed?

@thaJeztah
Copy link
Member

What is the expected value of argument after FROM if no argument build-arg was passed?

The same as it would be if you're not using multi-stage build; empty / no value set

@Benjamin-Dobell
Copy link
Author

Benjamin-Dobell commented Jul 17, 2017

@thaJeztah I know that's true now, I've experimented with it. The issue is that it's hugely non-obvious.

If this is expected behaviour and no-one is willing to change it. Then at the very least ARG ought to be deprecated (before FROM) and instead when used prior to FROM the syntax should be FROMARG (which must come before FROM).

@thaJeztah
Copy link
Member

ARG is reset after each FROM. If this is documented; why would ARG before FROM have to be deprecated?

/cc @tonistiigi @dnephin

@Benjamin-Dobell
Copy link
Author

Improved documentation is always appreciated, and would have saved me some time. However, just because behaviour is documented doesn't preclude the behaviour itself from scrutiny.

ARG has too much complexity to it. I'd argue this functionality shouldn't have been added to the ARG keyword in the first place, it's effectively been repurposed and its behaviour is now far to nuanced. A new keyword FROMARG from the on-set would have made a lot more sense.

@Benjamin-Dobell
Copy link
Author

Benjamin-Dobell commented Jul 17, 2017

I should note, that I'm not actually an advocate of expanding the grammar when the usage of the existing grammar can be expanded.

However, in this particular instance ARG has had its existing semantics altered; the behaviour is not additive. Previously whenever you referenced an ARG defined argument you'd have access to the value as expected. Now argument interpolation is much more context aware.

It's extremely confusing in single stage builds, and perhaps more-so in multi-stage ones. If arguments really are tied to build stages (although I must confess I'm not sure why this is desirable), then you've suddenly a need to look at the previous "stage", beyond the FROM verb.

Realistically, you can't pass different arguments to different build stages (they're typically provided as CLI arguments). So there's no legitimate reason to scope arguments to build stages. Additionally:

a “cache miss” occurs upon its first usage, not its definition

So there is zero incentive to intersperse ARG definitions through-out a file. Therefore, the most logical behaviour would be to encourage all ARG definitions to be placed at the top of a file (where they can clearly be seen) and then update the behaviour to ensure there's no funny business with build stages.

@tonistiigi
Copy link
Member

However, in this particular instance ARG has had it's semantics altered. Previously whenever you referenced an ARG defined argument you'd have access to the value as expected. Now argument interpolation is much more context aware.

The new ARG features are 100% backward compatible. No previous Dockerfile needs any changes.

then you've suddenly a need to look at the previous "stage", beyond the FROM verb.

It's the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.

a “cache miss” occurs upon its first usage, not its definition

All args are used in every RUN command. If argument changes it breaks all cache from the very first time RUN is used.

@dnephin
Copy link
Member

dnephin commented Jul 17, 2017

However, in this particular instance ARG has had it's semantics altered.

The semantics changed with multi-stage builds. The change doesn't really have anything to do with ARG in FROM. It just happens they came out in the same release.

If arguments really are tied to build stages, then you've suddenly a need to look at the previous "stage", beyond the FROM verb.

I think you're misunderstanding the scope. They are only scoped to the stage where they are declared.

(although I must confess I'm not sure why this is desirable) ... you can't pass different arguments to different build stages (they're typically provided as CLI arguments). So there's no legitimate reason to scope arguments to build stages

The use cases supported by a Dockerfile expanded quite a bit with multi-stage builds. It's no longer the case that a single Dockerfile will produce a single image. You can use --target to run different stages. At this time the build is still sequential but in the future we should be able to build more optimally. Not every build stage will run on every build.

In this context the design should make more sense. Although the values might not change, which lines actually run will change depending on the --target, which means the args must be defined in each stage, not in the meta section before a FROM.

@Benjamin-Dobell
Copy link
Author

All args are used in every RUN command. If argument changes it breaks all cache from the very first time RUN is used.

Yikes! That also needs documenting... and changing.

It's the opposite. Build args are defined by stage so you only need to look at the args for the current stage. Whatever you define in other stages has no effect on the current stage.

When looking at a Dockerfile, what syntax marks the beginning of a new build stage?

FROM does, and yet, somehow it accesses ARG defined prior to this line.

@tonistiigi
Copy link
Member

I'm was just clarifying what "first use" means. You use an ARG by executing a RUN command. No changes from the time ARG was introduced.

FROM defines a stage. What do you mean by accessing ARG?
There is a specific syntax that can be used to avoid redefining a default value for ARG multiple times in same file (something that you asked in #34129 (comment) btw). That requires both places to define that they want to share it. No ARG defined before FROM accidentally leaks into any build stage.

@Benjamin-Dobell
Copy link
Author

Benjamin-Dobell commented Jul 17, 2017

To be clear, I'm not saying I don't understand how the current implementation works, what has been written in this issue explains it clearly enough. I'm suggesting the implementation itself is non-ideal and confusing; after all, I read the existing docs and literally cloned Docker compose, Docker client and finally Docker before working out what was going on - at which point I opened this issue.

It's just too complicated. Adding so much complexity to the Dockerfile syntax and the corresponding documentation is simply not sustainable.

The semantics changed with multi-stage builds. The change doesn't really have anything to do with ARG in FROM. It just happens they came out in the same release.

I don't think this is necessarily 100% accurate that multi-stage and ARG in FROM are independent, they should have been independent, but I think the existence of multi-stage impacted the implementation of ARG in FROM.

The properties of ARG were:

  1. It may appear after FROM.

  2. The argument defined by ARG may be used on any line following the definition.

(2. is the way Dockerfiles always worked, sequential, state is additive, never subtractive).

A feature request comes along:

I'd like to use arguments in FROM.

Reasonable enough, the two previously defined properties still hold if implemented. We now have a third property:

  1. ARG may appear before FROM.

This can cleanly be implemented, without any backwards compatibility issues. Except, it wasn't; it could have been, but it wasn't.

Instead, property 2. was violated, suddenly ARG can't always be used after its defined. If it appears before FROM, then it can only be used in FROM, not on all subsequent lines.

That's changing the semantics of ARG, hence why I'm suggesting it should have been FROMARG, a keyword that can only appear in the "meta section" prior to FROM.

Mind you, this constraint is artificial in nature, there's zero reason 3. shouldn't have been implemented cleanly. The only reason the current implementation was deemed acceptable is because multi-stage builds were also coming, and it was also violating 2., albeit in a (roughly) well-defined fashion.

Anyway, my issue is complexity; that's subjective and given I'm not a maintainer, not for me to decide. Documentation is certainly better than nothing, so this issue may be closed if you see fit.

@ferrouswheel
Copy link

As a new user of ARG it was very unintuitive why my ARG was empty. I saw someone use an example of ARG in a Dockerfile, but they were using it in the FROM line. For me it makes sense to define any parameterisation of a Dockerfile at the very top, so I didn't question it. Only upon rereading the docs after reading this issue do I understand why.

I would suggest a warning that ARG gets reset after FROM in the documentation, as not everyone is up to speed on multistage builds.

@shaunc
Copy link

shaunc commented Sep 20, 2017

@Benjamin-Dobell I wanted to use build-args in multistage builds to pass secure keys to intermediate build stages which would then disappear. I haven't completely got confirmation that this is secure, but I was actually happy to see your issue.

For the record, aside from implementation details which respondents seem to be burdening you with, clearing build args -- at least so they can't be read from the build history -- seems IMO to be a very important feature... well worth the complexity.

UPDATE -- sigh ... I guess I spoke prematurely. Multistage builds don't help with the fact that args are written to build history.

@tonistiigi
Copy link
Member

@shaunc Are you saying that build-arg defined for an intermediate stage is visible in the history of the final stage? This should not happen if you use COPY --from.

@lucendio
Copy link

lucendio commented Dec 7, 2017

I ran into the same issue and in order to underline the impact of that behaviour, I want so share my example here, whos cause took a significant amount of time to figure out. Still it's totally unexpected and I wont exactlly call that user experience.
Please, if you don't see the necessity to change that bahaviour, then at least document it as the creator of this issue suggested, so that people can stumble upon this.

docker image build \
        --build-arg NODE_VERSION="4.8.3" \
        --build-arg NPM_VERSION="4.5.0"

Works not as expected. NPM_VERSION holds "latest".

ARG NODE_VERSION="latest"
ARG NPM_VERSION="latest"
FROM node:${NODE_VERSION}-alpine

RUN npm install -g npm@${NPM_VERSION}
...

Works as intended. NPM_VERSION holds "4.5.0".

ARG NODE_VERSION="latest"
FROM node:${NODE_VERSION}-alpine

ARG NPM_VERSION="latest"
RUN npm install -g npm@${NPM_VERSION}
...

@tonistiigi
Copy link
Member

Please, if you don't see the necessity to change that bahaviour, then at least document it so that people can stumble upon this.

https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact
https://docs.docker.com/engine/reference/builder/#scope

If this is a common pattern a PR would probably be accepted that detects this case (at least for variable substitution) and shows a warning about possible misuse.

@nik-shornikov
Copy link

As far as this keyword behaves with multiple FROM statements, in "multi-stage" builds, ARG lets you specify different defaults for different stages, but there is no way (nor should there be) to pass different values explicitly to different stages. That's far more convoluted than having ARGs go into effect from the keyword down, across any number of stages/FROMs.

@ClementWalter
Copy link

If you want to use the same ARG before and after FROM, simply re-declare it after, e.g.:

ARG my_arg
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"

@himslm01
Copy link

himslm01 commented Dec 9, 2018

simply re-declare it

This is an over simplification. You are not considering default values and the programming rule of one single source of truth.

ARG my_arg="default"
FROM my_image:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg="default"
# This should not be empty
RUN echo "my_arg is $my_arg"

We now have the arg's default value defined twice in one file - we have lost the single source of truth.

@thaJeztah
Copy link
Member

This is an over simplification. You are not considering default values

The example given actually takes care of default values;

docker build --no-cache -<<'EOF'
ARG my_arg=latest
FROM busybox:$my_arg
# This should be empty
RUN echo "my_arg is $my_arg"

# Re-declare
ARG my_arg
# This should not be empty
RUN echo "my_arg is $my_arg"
EOF

Sending build context to Docker daemon  2.048kB
Step 1/5 : ARG my_arg=latest
Step 2/5 : FROM busybox:$my_arg
 ---> 59788edf1f3e
Step 3/5 : RUN echo "my_arg is $my_arg"
 ---> Running in 029ff9c3cdc8
my_arg is 
Removing intermediate container 029ff9c3cdc8
 ---> f9135f511c84
Step 4/5 : ARG my_arg
 ---> Running in 7c9616537324
Removing intermediate container 7c9616537324
 ---> 35ccdf7ea0a9
Step 5/5 : RUN echo "my_arg is $my_arg"
 ---> Running in 1e712eef0399
my_arg is latest
Removing intermediate container 1e712eef0399
 ---> 56c25e303cb9
Successfully built 56c25e303cb9

I also posted some examples in #37622 (comment), #37345 (comment)

@varnav
Copy link

varnav commented Jul 3, 2019

I lost couple of hours to this. Intuitively I was expecting that ARG before FROM in multistage build will be a global ARG (for all stages). In simply gets cleared instead.

@AntonioCS
Copy link

This is horrible to way with something that seems to be a global value.
I have a dockerfile with multiple FROM statements and things are breaking because I can't pass the arg values as I originally thought. Sure, maybe I should read the documentation a bit more but it seems I am not alone in expecting this behaviour (ARG being global) so maybe things should work as the MAJORITY think it should?

@bitdancer
Copy link

I have a reverse twist on this. I remembered from the docs that ARG had to appear before FROM in order to be used in FROM, so I put an ARG before the FROM of my second builder declaration. And got an invalid-format error on the FROM line, because that ARG appeared after the first FROM in the file, and so was ignored when processing the second FROM line. So ARG-before-the-first-FROM is global for all FROM lines and not used in any other lines, while ARG-after-FROM is used only between that FROM and the next FROM. It is consistent in a way, but completely non-intuitive, so really the ARG-before-FROM ought to be named FROMARG as suggested earlier in this thread, because otherwise it just breaks expectations left and right.

bchrobot added a commit to politics-rewired/Spoke that referenced this issue Apr 7, 2020
The ARG Dockerfile keyword has a long history of not behaving as expected. See this thread for more
detail: moby/moby#34129
@darrahts
Copy link

Docker version 19.03.6, build 369ce74a3c
Linux 5.3.0-46-generic #38~18.04.1-Ubuntu

ARG VERSION="kinetic"

FROM ros:${VERSION}-ros-base

RUN apt-get update && apt-get install -y \
    ros-${VERSION}-ros-tutorials \
    ros-${VERSION}-common-tutorials \
    && rm -rf /var/lib/apt/lists/

results in:

E: Unable to locate package ros--ros-tutorials
E: Unable to locate package ros--common-tutorials

Why isn't this global? What is the fix? It seems to me from reading this and other threads that a global arg is the desired and expected...

@Tob1as
Copy link

Tob1as commented Apr 27, 2020

@darrahts see https://docs.docker.com/engine/reference/builder/#understand-how-arg-and-from-interact

mseemann added a commit to solosTec/segw-build that referenced this issue Aug 8, 2020
mseemann added a commit to solosTec/segw-build that referenced this issue Aug 8, 2020
pmatos pushed a commit to pmatos/jsc32-fuzz-setup that referenced this issue May 10, 2021
See moby/moby#34129
So many hours wasted... :(
@baishuotong
Copy link

Sorry to bother you! @darrahts @Tob1as @bitdancer @AntonioCS @varnav @thaJeztah @himslm01 @ClementWalter @nik-shornikov @tonistiigi @Benjamin-Dobell @boaz0 @ferrouswheel @lucendio @shaunc @darrahts @

We are a software engineering team from the Jilin University of China, and we are currently carrying out research oriented to issue discussions in GitHub. In GitHub issue system, GitHub developers can leave comments to others for discussing and solving specific technology problems, which can strengthen the development of projects and cooperation among developers. However, compared with other traditional social platforms, the discussion structure of GitHub Issue is linear, and the comments are sorted in timelines. Besides, many important contents of GitHub issue discussions are flooded by the new and crowding incoming comments, making it difficult for new-comer developers search demanded information. The current discussion status reveals it is difficult for GitHub developers to quickly grasp the main content of the discussion, search for useful information, and locate responders.

Thus, to overcome those challenges, we aim at proposing an automatically approach to re-build the dialogue structure of GitHub discussions and extract the key information under different discussion topics, as shown in the following Figure . The root node represents the developer who proposed the issue, and each node represents the comment with its number and the commenter. Meanwhile, the edge reflects the reply relationship, that is, each children responds to its parent node. The key information of each subtree is displayed in the lower corner of the Figure.

截屏2022-06-13 15 08 26

At your convenience, could you please browse the Figure and take 5 minutes to answer 3 questions?

1: Do you think this tree-structure dialogue structure can well reflect the reply relationship of GitHub issues, and it will facilitate your understanding of the overall issue content?

2: As for your comments, do you think our approach find the correct user who you want to reply?

3: Can you tell us the key information we extracted from sub-trees can reflect the important topics of each sub-tree or not?
(please use score(0-5) to evaluate this question. 0 means the worst and 5 means the best.)

Greatly appreciate any assistance you can provide and we are looking forward to your reply! (We guarantee that your responses will not be changed. The survey data is only utilized for this GitHub recommendation research and not for other purposes. Please feel free to fill it out.)

Have a nice day!

@bitdancer
Copy link

It would take too long to figure out what those bubbles are referring to, so if you are asking if this by itself would help me understand the issue, the answer is no. If you want real feedback I suspect you'll need something that actually displays the content.

What I'd really like in long github issues is a way to see the "current status" at a glance: which PR closed the issue, which PRs are related to the issue if it is open, what the last message is from the development team regarding the issue.

Another side observation: you have a lists of keywords associated with the subtree, but posters will often @ mention the github name of the person to whom they are replying. Have you considered using that as part of your algorithm?

@baishuotong
Copy link

baishuotong commented Jun 14, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests