Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes are created in container with root ownership and strict permissions #2630

Closed
carlossg opened this issue Nov 26, 2014 · 237 comments
Closed
Labels
area/security area/usability kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@carlossg
Copy link

The emptyDir volumeMount is owned by root:root and permissions set to 750
hostDir is the same but with 755 permissions

Containers running with a non-root USER can't access the volumes

Related discussion at https://groups.google.com/forum/#!topic/google-containers/D5NdjKFs6Cc
and Docker issue moby/moby#9360

@thockin thockin added priority/backlog Higher priority than priority/awaiting-more-evidence. area/docker and removed area/docker labels Nov 26, 2014
@thockin
Copy link
Member

thockin commented Nov 26, 2014

hostDir should get the same permissions as the existing host entry, though I am not sure we ensure a host direct exists before using hostDir

Part of the problem here is that different containers can run as
different users in the same pod - which user do we create the volume
with? what we really need is a way to tell docker to add supplemental
group IDs when launching a container, so we can assign all containers
in a pod to a common group.

I filed moby/moby#9360

@carlossg
Copy link
Author

Would it be reasonable to add user and/or permissions option to volumeMounts or emptyDir to explicitly force it?

@thockin
Copy link
Member

thockin commented Nov 26, 2014

I don't think that we want that in the API long-term, so I'd rather apply a
hidden heuristic like "chown to the USER of the first container that mounts
the volume" or even "ensure that all VolumeMounts for an emptyDir Volume
have the same USER, else error". Do you think such heuristics would hold?

On Wed, Nov 26, 2014 at 9:07 AM, Carlos Sanchez notifications@github.com
wrote:

Would it be reasonable to add user and/or permissions option to
volumeMounts or emptyDir to explicitly force it?

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@carlossg
Copy link
Author

That sounds good to me

@thockin
Copy link
Member

thockin commented Dec 1, 2014

This is a good starter project

@saad-ali saad-ali assigned saad-ali and unassigned thockin Dec 12, 2014
@saad-ali
Copy link
Member

Background
Inside a docker container, the primary process is launched as root by default . And, currently, docker containers can not be run without root privileges (Once docker supports the user namespace, a process inside a container can run as root, and the container root user could actually be mapped to a normal, non-privileged, user outside the container). However, even today, inside a docker container a process can be run under a non-privileged user: the Docker image can create new users and then force docker to launch the entry point process as that user instead of root (as long as that user exists within the container image).

When an external volume is mounted it’s permissions are set to ROOT (UID 0), therefore unless the process inside the container is launched as root, it won’t have permission to access the mounted directory.

Proposed Workarounds on the Kuberentes side

  1. While creating pod, if it requires an EmptyDir volume, before starting containers, retrieve the USER from each container image (introspect JSON for each container image), if any of the containers are launching their main process as non-root, fail pod creation.
  2. While creating pod, if it requires an EmptyDir volume, before creating shared volume, Chown it to the USER of the first container that mounts the volume.
  • Problems with this approach:
    1. With Kubernetes a pod can contain multiple containers that share a volume, but each container could potentially run their processes with different users inside, meaning even if the owner of a volume was changed, unless the owner was changed to a group that all containers were aware of (and all relevant users were part of), the problem would still exist.
    2. Another interesting dimension to the problem is that running CHOWN on a shared volume from outside the containers could fail if the host machine does not have the same user as inside the container (container images can create a new user, that the host is unaware of, and have the entry process run as that user, but since that user does not exist on the host, CHOWN to that user from the host will fail).
      One work around for this is to share the etc/passwd file between the host and the container, but that is very limiting.
      Another potential workaround would be for the host to some how reach inside the container during initialization (before the shared volume is mounted), read the USER that the main process will start with and use the image “/etc/passwd” file to map the USER to UID, and CHOWN the shared volume on the host to that UID (CHOWN on the host would only fails if it doesn’t find a user string because it uses /etc/passwd to find the mapping, but it always succeeds with UIDs because it just sets the uint value directly without any lookup).

Both approaches feel to me like they are breaking a layer of abstraction by having Kubernetes reach into the container to figure out what user the main process would start as, and doing something outside the container with that information. I feel like the right approach would be for the containers themselves to CHOWN any "mounted volumes" during setup (after creating and setting user).

Thoughts?

@saad-ali
Copy link
Member

@thockin, after talking to some folks, I think @carlossg's approach of explicitly specifying the user in the API would be the cleanest work around. I don't thing we can apply "hidden heuristics" without doing icky violation of abstractions (like reaching in to a container to figure out what username to use and then mounting the container's /etc/passwd file to figure out the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the directory to that UID and set the permissions to 750 (User: rwx, Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner, but set the permissions to 757 (User: rwx, Group: r-x, World: rxw), i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories are not created by Kubernetes.
    • Require UID instead of username string so there are no problems if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607, @dchen1107, @lavalamp

@thockin
Copy link
Member

thockin commented Dec 22, 2014

I think adding UID to volumes is a hack and redundant. I'd rather we do
the right thing and get Docker to support supplemental group IDs.

moby/moby#9360

On Thu, Dec 18, 2014 at 3:13 PM, Saad Ali notifications@github.com wrote:

@thockin https://github.com/thockin, after talking to some folks, I
think @carlossg https://github.com/carlossg's approach of explicitly
specifying the user in the API would be the cleanest work around. I don't
thing we can apply "hidden heuristics" without doing icky violation of
abstractions (like reaching in to a container to figure out what username
to use and then mounting the container's /etc/passwd file to figure out
the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes
    to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the
      directory to that UID and set the permissions to 750 (User: rwx,
      Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner,
      but set the permissions to 757 (User: rwx, Group: r-x, World: rxw),
      i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories
      are not created by Kubernetes.
    • Require UID instead of username string so there are no problems
      if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607 https://github.com/bgrant0607, @dchen1107
https://github.com/dchen1107, @lavalamp https://github.com/lavalamp

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@LuqmanSahaf
Copy link

@saad-ali I think HostDir should not be left untouched. Let's consider this: Hadoop on restart, restores the blocks from the directory it stores data in. If we use emptyDir, the container which restarted will get another directory and the previous data will be lost. And Hadoop requires the permissions and ownership of directory to be set to the user starting Hadoop (hdfs). If HostDir is not allowed to change permissions as per user, then similar use cases to this cannot be achieved. Please, comment.

@thockin
Copy link
Member

thockin commented Jan 14, 2015

Define restart? Do you mean the container crashed and came back, or do you
mean the machine rebooted and a new pod was scheduled and expects to be
able to reclaim the disk space used by the previous pod? Or something else?

On Mon, Jan 12, 2015 at 11:56 PM, Luqman notifications@github.com wrote:

@saad-ali https://github.com/saad-ali I think HostDir should not be
left untouched. Let's consider this: Hadoop on restart, restores the blocks
from the directory it stores data in. If we use emptyDir, the container
which restarted will get another directory and the previous data will be
lost. And Hadoop requires the permissions and ownership of directory to be
set to the user starting Hadoop (hdfs). If HostDir is not allowed to change
permissions per user, then similar use cases to this cannot be achieved.
Please, comment.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@LuqmanSahaf
Copy link

@thockin Restart could be anything. It could be after pod failure or container failure. Or the container could be restarted, after changing some configurations (Hadoop needs to be restarted after changes in configs). Does that answer?

@LuqmanSahaf
Copy link

This document mentions that when a pod is unbound, the emptyDir is deleted. In use case of Hadoop, the data might be essential and might be required when another pod of Hadoop comes back (or the container restarts). So, HostDir must be used to persist data even the pod is unbound. But Hadoop requires permissions to be set for the user for the data directory. Hope this explains.

@saad-ali
Copy link
Member

With docker-archive/libcontainer/pull/322, docker containers now allow specifying AdditionalGroups (supplementary group GIDs). So an updated proposal to handle shared volumes amongst different containers in a pod:

  • When creating EmptyDir, GitRepo, or GCEPersistentDisk volumes for a new pod, Kubelet will:
    1. Create a new linux group for the pod on the host machine
      • Group is created with the next available Group ID number (GID)
    2. Change the group of the new directory (on the host machine) to the newly created group.
    3. Set the permissions of the new directory (on the host machine) to 770 (User: rwx, Group: rwx, World: ---).
    4. For each docker container, pass in the GID of the new group as AdditionalGroups via docker container configs.
  • When creating HostDir volumes for a new pod, Kubelet will:
    • Leave the volume untouched, since those directories are not created by Kubernetes.
    • @LuqmanSahaf: this is up for debate, but my thinking is that since Kubernetes does not create the HostDir, and since it may contain existing data, Kubernetes should not get in to the business of modifying it. We should leave it up to the creator and maintainer of the HostDir to modify it's ownership or permissions to allow containers to access it.

@thockin
Copy link
Member

thockin commented Jan 15, 2015

There's an important distinction between a container restarting and a pod
being removed. When a container restarts, the data in a normal emptyDir
volume is safe. when a pod is removed, it should be GONE. Leaving
Hostdata and expecting it to be there at some later point in time is
awkward at best.

All of this is more complicated as soon as user namespaces land.

On Wed, Jan 14, 2015 at 12:04 AM, Luqman notifications@github.com wrote:

This document
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/volumes.md#emptydir,
mentions that when a pod unbound, the emptyDir is deleted. In use case of
Hadoop, the data might be essential and might be required when another pod
of Hadoop comes back (or the container restarts). So, HostDir must be used
to persist data even the pod is unbound. But Hadoop requires permissions to
be set for the user for the data directory. Hope this explains.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@bgrant0607 bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 5, 2015
@bgrant0607 bgrant0607 modified the milestone: v1.0 Feb 6, 2015
@bgrant0607 bgrant0607 added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Feb 28, 2015
@brendandburns brendandburns removed this from the v1.0 milestone Apr 28, 2015
@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 11, 2022
@rhzs
Copy link

rhzs commented Mar 11, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 11, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2022
@rhzs
Copy link

rhzs commented Jun 9, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2022
@mryellow
Copy link

mryellow commented Jun 9, 2022

Bump. God I hate bots which come and hide information from developers. No, if it's not solved it's still an issue stale or not.

@jingxu97
Copy link
Contributor

jingxu97 commented Jun 9, 2022

Thanks, @mryellow for bringing it up. We will try to talk about this issue in storage meeting and see whether we can come up with some solution. Please share any details about the issue you encounter and any thoughts about how to solve it. Thanks!

@djschny
Copy link

djschny commented Jun 9, 2022

@jingxu97 Are the last 7.5 years of details shared not good enough? lol

@xhejtman
Copy link

xhejtman commented Jun 9, 2022

Thanks, @mryellow for bringing it up. We will try to talk about this issue in storage meeting and see whether we can come up with some solution. Please share any details about the issue you encounter and any thoughts about how to solve it. Thanks!

I think proper solution is to finish implementation of fsUser attribute similar to fsGroup. It seems to be partly implemented in kubelet (but not complete), also not in API at all (you cannot specify fsUser currently).

@mryellow
Copy link

mryellow commented Jun 9, 2022

@jingxu97 Are the last 7.5 years of details shared not good enough? lol

Is the issue no longer an issue?

@rhzs
Copy link

rhzs commented Jun 10, 2022

@jingxu97 Are the last 7.5 years of details shared not good enough? lol

Is the issue no longer an issue?

I think it's becoming a must have feature when we use k8s.

@ebuildy
Copy link

ebuildy commented Jul 27, 2022

A broken use-case:

"Using a volume to mount a git repository"

Will give error:

-> cd /opt/zeppelin/notebook

-> ls -alh

drwxrwsrwx    4 root     zeppelin    1.0K Jul 27 14:09 .
drwxr-xr-x    1 zeppelin root        4.0K Jul 27 14:03 ..
drwxrwsr-x    7 zeppelin zeppelin    1.0K Jul 12 12:10 .git

-> git status

fatal: unsafe repository ('/opt/zeppelin/notebook' is owned by someone else)

@ttj4
Copy link

ttj4 commented Oct 12, 2022

will it be addressed anytime soon?

@mlehotsky
Copy link

any update please?

@Roboman341
Copy link

will it be addressed anytime soon?

@sftim
Copy link
Contributor

sftim commented Dec 21, 2022

It would be good to capture why an init container approach isn't good enough here.

(I don't think it is, but we haven't yet captured why not - at least, not in enough detail).

@xhejtman
Copy link

In case you are using restricted security polices, e.g., the ones that forbid runasroot, init container is nothing that can help here.

@sftim
Copy link
Contributor

sftim commented Dec 21, 2022

/remove-area docker

@thockin
Copy link
Member

thockin commented Dec 21, 2022

This is a VERY old issue which has largely been resolved. If I am understanding correctly, what the discussion has become is a request for a NEW feature - is that correct?

Can we maybe open a new issue with details about that new feature and an appropriate title?

@sftim
Copy link
Contributor

sftim commented Dec 22, 2022

@xhejtman perhaps you'd be willing to file that new issue? (anyone else is also welcome to)

@xhejtman
Copy link

so actually to file a new issue about adding fsUser option to securityContext?

@thockin
Copy link
Member

thockin commented Dec 22, 2022 via email

@mryellow
Copy link

Awwww but it was JUST getting interesting!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security area/usability kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests