Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: way to set up a "install root" within a buildah task #1014

Open
owtaylor opened this issue May 15, 2024 · 7 comments
Open

RFE: way to set up a "install root" within a buildah task #1014

owtaylor opened this issue May 15, 2024 · 7 comments

Comments

@owtaylor
Copy link
Contributor

In various contexts, we want to install packages into a sub directory of the root directory, and then use them as the root directory of a subsequent stage. Examples include:

In all of these cases, we really don't want to install into a bare directory - this will result in weirdness like redirections to /dev/null going into a actual file. The directory at least needs a skeleton /dev populated and would ideally have /proc and /sys as well.

There are two basic ways that this could be enabled:

  • Allow running buildah build with --cap-add=all --security-opt=label=disable ; this is sufficient to allow the Containerfile to create its own mount namespace and populate the root directory itself.

  • Instead specify the install directory as a task parameter, and have the buildah task add something like:

           --device=/dev/null:"$INSTALL_ROOT"/dev/null
           --device=/dev/random:"$INSTALL_ROOT"/dev/random
           --device=/dev/urandom:"$INSTALL_ROOT"/dev/urandom
           --device=/dev/zero:"$INSTALL_ROOT"/dev/zero
    

The advantage of this is that it is more flexible and satisfies some other needs for nested sandboxing1. It may the only way that actually works for the bootc case [@cgwalters - is that right?]. On the other hand, the second approach is more obviously secure. 2

Footnotes

  1. To delve some into this, if we were using buildah with user namespaces, then --cap-add=all would not be necessary since the Containerfile could create a nested user namespace where it had the SYS_ADMIN capability, and then create a mount namespace and set things up there. But with --isolation=chroot the run commands are being run in an active chroot, and the Linux kernel disables user namespace creation when a chroot is active, so we have to actually give the SYS_ADMIN capability to the subprocess so it can create a mount namespace

  2. --cap-add=all will not compromise overall system security, since we're counting on the pod running the task for that, and buildah --isolation=chroot is already not that strong. And of course, an actual malicious component could specify their own pipeline with its own build steps already. The main weakness it introduces is that it could make it easier for malicious content pulled into the Containerfile during the build process to mess with the build artifacts and build metadata.

owtaylor added a commit to owtaylor/build-definitions that referenced this issue May 15, 2024
In certain cases, having a context set could change the meaning of a
Containerfile. See:

konflux-ci#1014

For details of a situation affecting Flatpaks and bootc images. This
applies to both to any context from the CONTEXT parameter, and the
implicit context from SOURCE_CODE_DIR. To avoid any possible problems,
just change to the context directory and provide use a context
argument of .
owtaylor added a commit to owtaylor/build-definitions that referenced this issue May 15, 2024
In certain cases, having a context set could change the meaning of a
Containerfile. See:

konflux-ci#1014

For details of a situation affecting Flatpaks and bootc images. This
applies to both to any context from the CONTEXT parameter, and the
implicit context from SOURCE_CODE_DIR. To avoid any possible problems,
just change to the context directory and use a context argument of .
owtaylor added a commit to owtaylor/build-definitions that referenced this issue May 16, 2024
This corresponds to 'buildah build --cap-add=<value>'

One use case is when we want to do nested sandboxing within
the build, and need to set up a new mount namespace - in this case
we need ADD_CAPABILITIES=SYS_ADMIN. (Since we are running
buildah using --isolation=chroot.)

Related: konflux-ci#1014
@chmeliik
Copy link
Contributor

Allow running buildah build with --cap-add=all --security-opt=label=disable ;

--cap-add=all will not compromise overall system security, since we're counting on the pod running the task for that, and buildah --isolation=chroot is already not that strong.

Apologies in advance, I don't know enough about namespaces and container engines. Does --cap-add=all allow the buildah build to gain capabilities that the pod itself - and the unshare namespace in that pod - didn't have? If yes, how does that stay secure? If not, how does it help the bootc and flatpak use cases?

In https://issues.redhat.com/browse/STONEBLD-660, there was strong pushback against adding capabilities, but that was on the Pod level, so IIUC much worse than this. Still, I would like someone more knowledgeable than me to check this - maybe @ralphbean @adambkaplan @nalind

@owtaylor
Copy link
Contributor Author

In a quick experiment, a basic chroot escape like:

FROM fedora:40

COPY <<EOF /tmp/foo.py
#!/usr/bin/python3

import os

os.mkdir("chroot")
os.chroot("chroot")
for i in range(0, 1000):
    os.chdir("..")
os.chroot(".")
with open("/I_WAS_HERE", "w") as f:
     pass
EOF
RUN python3 /tmp/foo.py

escapes from buildah --isolation=chroot not just with --cap-add=SYS_ADMIN, but also without. This is testing locally with:

podman run  --rm -it --device=/dev/fuse --cap-add=SETFCAP --rm quay.io/buildah/stable

it's possible that the particular escape above is blocked within the Konflux buildah pods because the outer pod is set up slightly differently. But certainly, the message has always been that --isolation=chroot is not very good sandboxing, and we're counting on the outer container to achieve security.

I think there's consensus on the eventual right direction - to use usern namespaces to make buildah's other isolation modes work rather than --isolation=chroot. And the process of getting there seems to be making its way along - see https://kubernetes.io/blog/2024/04/22/userns-beta/. That's one reason why I lean toward this branch of the solution rather than the --device=/dev/null approach - it seems more aligned with future development.

That being said: going with the --device=/dev/null approach would be OK for the Flatpak use case - that's how its handled it in OSBS2, and no bugs from the lack of /proc and /sys during package installation were noticed.

@chmeliik
Copy link
Contributor

Ack, for me that's sufficient evidence that allowing --cap-add=all doesn't make things worse.

From the opposite point of view - have you tested that it does in fact allow you to make flatpak builds work?

@owtaylor
Copy link
Contributor Author

From the opposite point of view - have you tested that it does in fact allow you to make flatpak builds work?

Yes, I've tested building a Flatpak runtime with this exact change.

owtaylor added a commit to owtaylor/build-definitions that referenced this issue May 21, 2024
This corresponds to 'buildah build --cap-add=<value>'

One use case is when we want to do nested sandboxing within
the build, and need to set up a new mount namespace - in this case
we need ADD_CAPABILITIES=SYS_ADMIN. (Since we are running
buildah using --isolation=chroot.)

Related: konflux-ci#1014
@chmeliik
Copy link
Contributor

@owtaylor on the Konflux arch call last week, we couldn't reach a conclusion about the security of this proposal.

If you have some more details about how this works, it would be very helpful. Otherwise, we'll need to do more investigation on our side before accepting it.

In particular:

  • Why does --cap-add=all allow flatpak builds to work? Is it because it allows the buildah container to preserve all the capabilities that the pod had (and nothing more)?
  • Do you know which specific capabilities are needed?

@owtaylor
Copy link
Contributor Author

SYS_ADMIN is the privilege that is needed in particular here. It's needed to create a new mount namespace within the container running the build step. (I use SYS_ADMIN in my pipeline, all is referenced because the bootc base image creation uses that.

A full security analysis would require an expert. Summarizing what I know:

  • The primary security boundary is the boundary between the pod running the build step and the system. This has been advertised since addition of buildah build --isolation=chroot.
  • As long custom tasks are allowed, the pipeline owner can equally easily set a parameter or use an entirely custom task, so we're not introducing any vulnerability to pipeline owners. The only thing that adding the ability to set this as a parameter does is to make it easier for pipeline owners to create a pipeline that is "vulnerable" to malicious content installed within the container.
  • If my conclusion about chroot escapes above is accurate, then we have zero protection against malicious content doing arbitrary things within the build step anyways, since it can simply replace a binary that the step uses later (e.g. /usr/bin/cp). I can test this in an actual build pipeline. The eventual improvement is not to use --isolation=chroot
  • There may be additional defensive steps (make registry credentials unavailable to the build step) that would be useful in any case.

@ralphbean
Copy link
Member

We can add it as a parameter, and then disallow the using of that parameter for all policies, except for those shipping flatpaks. That will let us control the blast radius of improper usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants