Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an UNVOLUME or UNSET VOLUME statement to cancaled mount the default parent volume #4802

Open
alexeyp0708 opened this issue Mar 27, 2024 · 10 comments

Comments

@alexeyp0708
Copy link

alexeyp0708 commented Mar 27, 2024

This topic has been before discussed since 2014:

Created a new topic based on recommendations moby/moby#8177 (comment)

Target: Add UNVOLUME /target/directory (or UNSET VOLUME) statement in Dockerfile.
Behavior: Removes volume metadata that is added by the parent image. If there is no metadata to delete, a warning is issued. Mounting and unmounting volumes must be applied sequentially according to the image inheritance order.

The problem exists and people suffer from it and ask for something to be done.

Reasons for change:

  1. The child image does not always need the volumes of the parent image.
  2. When changing daemon settings of parent image, developers sometimes prefer to specify their own volumes for worker processes (For example, specify a directory in a shared volume for containers.) .
    As a result, parent volumes hang like extra garbage and take up disk space (often data from the container is copied to volumes by default).
  3. The developer would prefer to have access to the folder in the container that the volume overlaps.
    But the default volume always overlaps this folder in the container and the user does not have access to it.

Please study the topic moby/moby#3465 and you will understand that there are problems. It’s just that not everyone can or wants to describe the situation that the developers encountered.

I think there are enough legal grounds to consider this problem seriously and include any option for canceling the parent default volume.

Some links:
moby/moby#33842 (comment) - VOLUME behavior in buildkit - VOLUME is now metadata
moby/moby#3465 (comment) - unmount volume via ENTRYPOINT

moby/moby#3465 (comment) - Rebuilding the image with new settings

moby/moby#3465 (comment) - Changing metadata in a built image

(UPDATED)
I have a request to the participants - reproduce your problems in detail.
If possible with examples.
It is better not to discuss unless necessary, so as not to clutter the topic.

@alexeyp0708
Copy link
Author

alexeyp0708 commented Mar 27, 2024

I can suggest a design option
VOLUME ["!/volume/directory"] where the exclamation mark (" ! ") disables mounting of the specified parent volume.
moby/moby#3465 (comment)

@tonistiigi
Copy link
Member

tonistiigi commented Mar 27, 2024

Maybe smth like

FROM --reset=cmd,entrypoint,volume,env,onbuild,workdir,expose,all img

@felipecrs
Copy link

Maybe smth like

FROM --reset=cmd,entrypoint,volume,env,onbuild,workdir,all img

I think it looks great! Maybe it can also support an extended syntax like to allow filtering which specific volume and env to reset:

FROM --reset=volume=/var/lib/docker,env=MYVAR,env=MYOTHERVAR img

On a note, CMD and ENTRYPOINT can already be reset with CMD [] and ENTRYPOINT [].

@tonistiigi
Copy link
Member

like to allow filtering which specific

I'm more thinking it from the perspective of "I don't want let changes in the base image control this part of my build". If I already know a specific volume or other parameter by name that I want to keep then I can reset all and re-add missing manually.

Otoh. the extended syntax does not look too awful, but I think individually it makes more sense to for example define env that I want to keep, rather than the ones I want to unset.

@alexeyp0708
Copy link
Author

alexeyp0708 commented Mar 27, 2024

Maybe smth like

FROM --reset=cmd,entrypoint,volume,env,onbuild,workdir,all img

I think this will require a lot of refactoring.
Moreover, a large-scale reset will affect the details. Let's imagine that you have 5 levels of inheritance. You will simply break everything with such a reset.

If you reset Environment Variables then you can influence the parent entry point through a script file.
You can reset variables via.
RUN unset ENV_VAR

(updated) or ENV ENV_VAR=
or ENTRYPOINT "unset ENV_VAR"

ENTRYPOINT or CMD you always can to override
WORKDIR can also be overridden . And resetting WORKDIR is redundant.
https://docs.docker.com/reference/dockerfile/#workdir

If the WORKDIR doesn't exist, it will be created even if it's not used in any subsequent Dockerfile instruction.

We are now discussing VOLUME here because parent volumes create difficulties.
All you offer is syntactic sugar.

@felipecrs
Copy link

You can reset variables via.
RUN unset ENV_VAR

Really? I don't think so.

@tonistiigi
Copy link
Member

You can reset variables via.
RUN unset ENV_VAR

No, that is not how it works. That only resets the variable for your current running process inside a container.

(updated) or ENV ENV_VAR=

That isn't exactly equivalent. The environment variable would still be in the list of defined variables but the value would be empty.

We are now discussing VOLUME here because parent volumes create difficulties.

The problem with the older attempts with this seems to be more that there wasn't a solution that fixed the general problem (or that the general problem was not well-defined) and nobody wanted to add extra hacky one-off command, that patched one case but left many open questions to be figured out later. Adding this kind of reset functionality to one command or to all applicable commands doesn't necessarily mean a big difference in development complexity.

Two things have changed since that old discussion in earlier issues:

  • We now have many commands that take additional flags for configuring their functionality (run, copy, add, from, healthcheck)
  • We have a way to add and test new functionality in labs channel https://docs.docker.com/build/dockerfile/frontend/#labs-channel without worrying that every possible mistake would need to be supported forever.

So from my side, I would be open to adding FROM --reset from #4802 (comment) that I think provides the solution for controlling metadata set by the base image, but as we already have many examples of build flags I don't mind defining a way for resetting them individually by name as well.

In that case I think the syntax that best aligns with existing commands would be.

ENV --unset FOO
ARG --unset FOO
VOLUME --unset /foo/bar
LABEL --unset my.label

These are the commands that allow defining named keys. Other commands should not need --unset. Compared to FROM --reset we could make a case that sometimes you may want to define a ENV/ARG only for a specific RUN command and then remove it. This does not apply to VOLUME/LABEL, though, as they only have an effect on the resulting image - these are added just for consistency.

My take is that both of these could be added to the labs channel https://docs.docker.com/build/dockerfile/frontend/#labs-channel

@alexeyp0708
Copy link
Author

alexeyp0708 commented Apr 9, 2024

In that case I think the syntax that best aligns with existing commands would be.
ENV --unset FOO
ARG --unset FOO
VOLUME --unset /foo/bar
LABEL --unset my.label

This syntax would be a good idea

So from my side, I would be open to adding FROM --reset from #4802 (comment)

A global reset will have consequences and will therefore be used extremely rarely.

At build time, the parent image needs the environment variables that it declares, since the entrypoint startup script depends on this.

unset all env
I assume that developers only need to unset variables after executing the entrypoint script.

You can reset variables via.
RUN unset ENV_VAR

No, that is not how it works. That only resets the variable for your current running process inside a container.

(updated) or ENV ENV_VAR=

That isn't exactly equivalent. The environment variable would still be in the list of defined variables but the value would be empty.

I explained this earlier, but deleted some of my comments so as not to clutter up the topic.

How I do it
ENV ENV_VAR= In case the value of the variable is not important
and scripts can process empty values

RUN unset ENV_VAR ; my_scripts - If during the build this variable bothers me

ENTRYPOINT 'entrypoint_parent.sh&; unset var; jobs;' - I f I need to remove a variable after the entry point. This is shown as an example only, This solution is only in theory. In my case, I usually write my entry point script. But all this will not globally unset the variables because I will have access to them via docker exec

If people care about the need to reset variables after the entry point, then I think you can implement this:

entrypoint --after_clear_env

or

entrypoint --reset  env, volume ....

In this case, when resetting the environment variables
It is recommended to reset them after installing them docker run -e MY_VAR and execution of the entrypoint script,
To reset volumes, it is recommended to only reset the default volumes (those installed previously by the Dockerfile)

But unfortunately in this case you will have to redefine the parent ENTRYPOINT.
The solution is a separate directive, for example
CLEAR or RESET which will be affter applied when ENTRYPOINT is executed - CLEAR env, value
or shift the responsibility to CMD CMD --reset 'env,volume ' optinons_entrypoint . But in this case, the CMD override should not reset the parent ENTRYPOINT.
(As far as I know, if you try to override CMD, the parent’s ENTRYPOINT will be reset)

@alexeyp0708
Copy link
Author

alexeyp0708 commented Apr 9, 2024

By the way . Sometimes there is a lack of a directive that would be executed after the entrypoint is runed. But these are my wishes.

Example :
Running the script every time after the script entrypoint is runned
CMD --after_entrypoint 'myscripts'
CMD --after_entrypoint --security=insecure 'myscripts'
The equivalent of the command is docker exec id my_script

As an addition, you can implement this
Running the script every time before the script entry point is running
CMD --before_entrypoint 'myscripts'
CMD --before_entrypoint --security=insecure 'myscripts'

Today I am forced to write my entry point and override the parent entry point

Why is the command CMD --before_entrypoint and not RUN ? - because when running RUN, the volumes are not mounted (docker run --mount ), the network is not connected(docker run --network my_net), command variables are not declared (docker run -e my_env)

Rewriting the entrypoint that is declared by the parent is a crutch that interferes with life.
It’s easier to declare/extension scripts that will be executed before the entrypoint and after the entrypoint. Then in most cases there is no need to override the parent entrypoint. (Open/Closed concept - SOLID)
Extension through inheritance implementing the stack structure (Last to come, first to runned) for --before_entrypoint and implementation of the queue structure (Last to come, last to runned) for --after_entrypoint
Such scripts need a separate variable environment through which they can communicate (A script before an entry point can affect entry point variables, and a script after an entry point can change entry point variables.).

Previously I expressed my thoughts about privileges moby/moby#47632
As part of the development concept, I can duplicate it in mody/buildkit

Such ideas would allow developers to remove restrictions and inconveniences.

But I don’t want to once again flood with new topics, so I expressed my ideas here.

@alexeyp0708
Copy link
Author

alexeyp0708 commented Apr 9, 2024

As a future strategy, I would recommend idea defining the principles of Covariance and contravariance in inheritance.
Covariance - This is when image A can be replaced by image B (extends A), where the behavior of image A is preserved.
Contravariance is when image B (extends A), preserved the behavior of image A . It can be overridden by image A.

For example,
when implementing this behavior, the parent image has the right to frozen some directives (ENTRYPOINT --frozen ENV --frozen VAR .... COPY --frozen ) which guarantees that everything that is frozen cannot be overridden or unset by the child image.
In this case, the developer will spend less effort replacing containers in the future, without additional compatibility checks.
In the same concept, you can implement the concept of abstraction (ENV --required VAR COPY --required /path/file ENTRYPOINT --required ... , RUN --required script_check or RUN --defer_after_event event_name='description child run' script_this_run_for_event For child RUN(or COPY) --event=event_name script_child). Where the parent image (or running a container ) will require a certain child implementation. Abstraction will allow you to write your own interfaces.
And to determine compatibility, then declare a additional directive INTERFACE imageA, subImageB, abstractImage - Which declares that this image is compatible for replacing images imageA and subImageB and implements the interface abstractImage.
I think for this will have to introduce a strict mode (USE STRICT or USE STRICT INHERIT MODE) in which there will be strict expansion rules.

I understand that there are many factors against this. But this is my idea that you can take as perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants