RFE: way to set up a "install root" within a buildah task #1014

owtaylor · 2024-05-15T15:27:25Z

In various contexts, we want to install packages into a sub directory of the root directory, and then use them as the root directory of a subsequent stage. Examples include:

Creating a standard base image from a Containerfile
Creating a bootc base image using rpm-ostree
Creating a Flatpak runtime

In all of these cases, we really don't want to install into a bare directory - this will result in weirdness like redirections to /dev/null going into a actual file. The directory at least needs a skeleton /dev populated and would ideally have /proc and /sys as well.

There are two basic ways that this could be enabled:

Allow running buildah build with --cap-add=all --security-opt=label=disable ; this is sufficient to allow the Containerfile to create its own mount namespace and populate the root directory itself.

Instead specify the install directory as a task parameter, and have the buildah task add something like:

       --device=/dev/null:"$INSTALL_ROOT"/dev/null
       --device=/dev/random:"$INSTALL_ROOT"/dev/random
       --device=/dev/urandom:"$INSTALL_ROOT"/dev/urandom
       --device=/dev/zero:"$INSTALL_ROOT"/dev/zero

The advantage of this is that it is more flexible and satisfies some other needs for nested sandboxing¹. It may the only way that actually works for the bootc case [@cgwalters - is that right?]. On the other hand, the second approach is more obviously secure. ²

To delve some into this, if we were using buildah with user namespaces, then --cap-add=all would not be necessary since the Containerfile could create a nested user namespace where it had the SYS_ADMIN capability, and then create a mount namespace and set things up there. But with --isolation=chroot the run commands are being run in an active chroot, and the Linux kernel disables user namespace creation when a chroot is active, so we have to actually give the SYS_ADMIN capability to the subprocess so it can create a mount namespace ↩
--cap-add=all will not compromise overall system security, since we're counting on the pod running the task for that, and buildah --isolation=chroot is already not that strong. And of course, an actual malicious component could specify their own pipeline with its own build steps already. The main weakness it introduces is that it could make it easier for malicious content pulled into the Containerfile during the build process to mess with the build artifacts and build metadata. ↩

The text was updated successfully, but these errors were encountered:

In certain cases, having a context set could change the meaning of a Containerfile. See: konflux-ci#1014 For details of a situation affecting Flatpaks and bootc images. This applies to both to any context from the CONTEXT parameter, and the implicit context from SOURCE_CODE_DIR. To avoid any possible problems, just change to the context directory and provide use a context argument of .

In certain cases, having a context set could change the meaning of a Containerfile. See: konflux-ci#1014 For details of a situation affecting Flatpaks and bootc images. This applies to both to any context from the CONTEXT parameter, and the implicit context from SOURCE_CODE_DIR. To avoid any possible problems, just change to the context directory and use a context argument of .

This corresponds to 'buildah build --cap-add=<value>' One use case is when we want to do nested sandboxing within the build, and need to set up a new mount namespace - in this case we need ADD_CAPABILITIES=SYS_ADMIN. (Since we are running buildah using --isolation=chroot.) Related: konflux-ci#1014

chmeliik · 2024-05-16T08:12:22Z

Allow running buildah build with --cap-add=all --security-opt=label=disable ;

--cap-add=all will not compromise overall system security, since we're counting on the pod running the task for that, and buildah --isolation=chroot is already not that strong.

Apologies in advance, I don't know enough about namespaces and container engines. Does --cap-add=all allow the buildah build to gain capabilities that the pod itself - and the unshare namespace in that pod - didn't have? If yes, how does that stay secure? If not, how does it help the bootc and flatpak use cases?

In https://issues.redhat.com/browse/STONEBLD-660, there was strong pushback against adding capabilities, but that was on the Pod level, so IIUC much worse than this. Still, I would like someone more knowledgeable than me to check this - maybe @ralphbean @adambkaplan @nalind

owtaylor · 2024-05-16T11:53:10Z

In a quick experiment, a basic chroot escape like:

FROM fedora:40

COPY <<EOF /tmp/foo.py
#!/usr/bin/python3

import os

os.mkdir("chroot")
os.chroot("chroot")
for i in range(0, 1000):
    os.chdir("..")
os.chroot(".")
with open("/I_WAS_HERE", "w") as f:
     pass
EOF
RUN python3 /tmp/foo.py

escapes from buildah --isolation=chroot not just with --cap-add=SYS_ADMIN, but also without. This is testing locally with:

podman run  --rm -it --device=/dev/fuse --cap-add=SETFCAP --rm quay.io/buildah/stable

it's possible that the particular escape above is blocked within the Konflux buildah pods because the outer pod is set up slightly differently. But certainly, the message has always been that --isolation=chroot is not very good sandboxing, and we're counting on the outer container to achieve security.

I think there's consensus on the eventual right direction - to use usern namespaces to make buildah's other isolation modes work rather than --isolation=chroot. And the process of getting there seems to be making its way along - see https://kubernetes.io/blog/2024/04/22/userns-beta/. That's one reason why I lean toward this branch of the solution rather than the --device=/dev/null approach - it seems more aligned with future development.

That being said: going with the --device=/dev/null approach would be OK for the Flatpak use case - that's how its handled it in OSBS2, and no bugs from the lack of /proc and /sys during package installation were noticed.

chmeliik · 2024-05-20T09:18:28Z

Ack, for me that's sufficient evidence that allowing --cap-add=all doesn't make things worse.

From the opposite point of view - have you tested that it does in fact allow you to make flatpak builds work?

owtaylor · 2024-05-20T15:03:19Z

From the opposite point of view - have you tested that it does in fact allow you to make flatpak builds work?

Yes, I've tested building a Flatpak runtime with this exact change.

This corresponds to 'buildah build --cap-add=<value>' One use case is when we want to do nested sandboxing within the build, and need to set up a new mount namespace - in this case we need ADD_CAPABILITIES=SYS_ADMIN. (Since we are running buildah using --isolation=chroot.) Related: konflux-ci#1014

chmeliik · 2024-05-27T12:39:42Z

@owtaylor on the Konflux arch call last week, we couldn't reach a conclusion about the security of this proposal.

If you have some more details about how this works, it would be very helpful. Otherwise, we'll need to do more investigation on our side before accepting it.

In particular:

Why does --cap-add=all allow flatpak builds to work? Is it because it allows the buildah container to preserve all the capabilities that the pod had (and nothing more)?
Do you know which specific capabilities are needed?

owtaylor · 2024-05-27T13:26:41Z

SYS_ADMIN is the privilege that is needed in particular here. It's needed to create a new mount namespace within the container running the build step. (I use SYS_ADMIN in my pipeline, all is referenced because the bootc base image creation uses that.

A full security analysis would require an expert. Summarizing what I know:

The primary security boundary is the boundary between the pod running the build step and the system. This has been advertised since addition of buildah build --isolation=chroot.
As long custom tasks are allowed, the pipeline owner can equally easily set a parameter or use an entirely custom task, so we're not introducing any vulnerability to pipeline owners. The only thing that adding the ability to set this as a parameter does is to make it easier for pipeline owners to create a pipeline that is "vulnerable" to malicious content installed within the container.
If my conclusion about chroot escapes above is accurate, then we have zero protection against malicious content doing arbitrary things within the build step anyways, since it can simply replace a binary that the step uses later (e.g. /usr/bin/cp). I can test this in an actual build pipeline. The eventual improvement is not to use --isolation=chroot
There may be additional defensive steps (make registry credentials unavailable to the build step) that would be useful in any case.

ralphbean · 2024-05-31T19:24:19Z

We can add it as a parameter, and then disallow the using of that parameter for all policies, except for those shipping flatpaks. That will let us control the blast radius of improper usage.

owtaylor mentioned this issue May 15, 2024

buildah: always use a context of . #1015

Merged

owtaylor mentioned this issue May 16, 2024

buildah: Add an ADD_CAPABILITIES parameter #1017

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: way to set up a "install root" within a buildah task #1014

RFE: way to set up a "install root" within a buildah task #1014

owtaylor commented May 15, 2024

chmeliik commented May 16, 2024

owtaylor commented May 16, 2024

chmeliik commented May 20, 2024

owtaylor commented May 20, 2024

chmeliik commented May 27, 2024

owtaylor commented May 27, 2024

ralphbean commented May 31, 2024

RFE: way to set up a "install root" within a buildah task #1014

RFE: way to set up a "install root" within a buildah task #1014

Comments

owtaylor commented May 15, 2024

Footnotes

chmeliik commented May 16, 2024

owtaylor commented May 16, 2024

chmeliik commented May 20, 2024

owtaylor commented May 20, 2024

chmeliik commented May 27, 2024

owtaylor commented May 27, 2024

ralphbean commented May 31, 2024