Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gVisor unprivileged user can't access file in rootless mode #9918

Open
terenceli opened this issue Jan 24, 2024 · 11 comments
Open

gVisor unprivileged user can't access file in rootless mode #9918

terenceli opened this issue Jan 24, 2024 · 11 comments
Labels
type: bug Something isn't working

Comments

@terenceli
Copy link

Description

While in rootless mode, the container UID is the same as the UID of running runsc, in the container we can't access the file belongs to the host UID.

I want to know how to achieve this. So that in rootless mode, the unprivileged user can access the host file with the same UID.

Steps to reproduce

using following OCI spec

`test@test-VirtualBox:~/test$ id

uid=1000(test) gid=1000(test) groups=1000(test),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),122(lpadmin),135(lxd),136(sambashare)

test@test-VirtualBox:~/test$ ./runsc -rootless -root /home/test -ignore-cgroups --network host run abc

touch /home/test/bb

touch: cannot touch '/home/test/bb': Permission denied

`

`{

"ociVersion": "1.0.0",

"process": {

    "user": {

        "uid": 1000,

        "gid": 1000

    },

    "args": [

        "sh"

    ],

    "env": [

        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",

        "TERM=xterm"

    ],

    "cwd": "/",

    "capabilities": {

        "bounding": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ],

        "effective": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ],

        "inheritable": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ],

        "permitted": [

            "CAP_AUDIT_WRITE",

            "CAP_KILL",

            "CAP_NET_BIND_SERVICE"

        ]

    },

    "rlimits": [

        {

            "type": "RLIMIT_NOFILE",

            "hard": 1024,

            "soft": 1024

        }

    ]

},

"root": {

    "path": "/",

    "readonly": true

},

"hostname": "runsc",

"mounts": [

    {

        "destination": "/proc",

        "type": "proc",

        "source": "proc"

    },

    {

        "destination": "/dev",

        "type": "tmpfs",

        "source": "tmpfs"

    },

    {

        "destination": "/sys",

        "type": "sysfs",

        "source": "sysfs",

        "options": [

            "nosuid",

            "noexec",

            "nodev",

            "ro"

        ]

    },

    {

            "destination":"/home/test",

            "type":"bind",

            "source":"/home/test",

            "options":[

                    "rbind",

                    "rw"

            ]

    }



],

"linux": {

    "namespaces": [

        {

            "type": "pid"

        },

        {

            "type": "network"

        },

        {

            "type": "ipc"

        },

        {

            "type": "uts"

        },

        {

            "type": "mount"

        }

    ]

}

}

`

runsc version

No response

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

@terenceli terenceli added the type: bug Something isn't working label Jan 24, 2024
@avagin
Copy link
Collaborator

avagin commented Jan 25, 2024

./runsc --rootless maps the current user to the root user inside the sandbox.

@ayushr2
Copy link
Collaborator

ayushr2 commented Jan 25, 2024

Adding to what @avagin said: so if you remove this part from your config.json:

    "user": {
        "uid": 1000,
        "gid": 1000
    },

OR set the user to 0 then it should work fine.

In your usecase, the sandbox is running in a user namespace with UID/GID mappings like 1000:0:1 (host uid/gid 1000 is mapped to container root user). So all files owned by 1000 on host will appear to be owned by 0 in the sandbox. Similarly if you create a file from inside the sandbox, it will appear to be owned by 1000 on the host, but inside the container owner will be 0.

@terenceli
Copy link
Author

@avagin @ayushr2 So I see podman has a config --userns=keep-id.
We may also need this config I think.

We want the application in Sentry has the same view in host from filesystem perspective.

@terenceli terenceli reopened this Jan 26, 2024
@ayushr2
Copy link
Collaborator

ayushr2 commented Feb 1, 2024

runsc and runc are low level container runtimes. podman is a tool on top of these runtimes. So if you want podman's --userns=keep-id behavior, you can directly use podman with runsc to run rootless containers and enable that configuration. To see an example of how to configure podman with runsc, you can look at our Podman tests: test/podman/run.sh.

From podman docs, --userns=keep-id means:

keep-id: creates a user namespace where the current rootless user’s UID:GID are mapped to the same values in the container. This option is not allowed for containers created by the root user.

You can not use --rootless flag if you want --userns=keep-id behavior. Because, when --rootless is used, runsc re-executes itself as root with the caller UID mapped to 0. So the caller becomes root. We can not map the caller UID to itself again.

If you do not want to use podman, you have the following optoins:

  • Install newuidmap, newgidmap tools and run runsc without the --rootless flag and specify your desired user namespace and mappings in oci runtime config. Internally, we use the newuidmap, newgidmap binaries to configure the user namespace in this case. These binaries have the setuid bit set, which allows them to set UID/GID mappings beyond just the current user.
  • Hand craft the userns before calling runsc using unshare -U --uid-map --gid-map runsc .... So when runsc is called, all the necessary users are mapped. IIUC, this is what podman does.

@terenceli
Copy link
Author

terenceli commented Feb 5, 2024

@ayushr2 Thanks for your explanation. I try to use the first method, as step as follows:

install newuidmap, newgidmap
runsc OCI spec:

{
    "ociVersion": "1.0.0",
    "process": {
        "user": {
            "uid": 1000,
            "gid": 1000
        },
        "args": [
            "sh"
        ],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "TERM=xterm"
        ],
        "cwd": "/",
        "capabilities": {
            "bounding": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "effective": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "inheritable": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ],
            "permitted": [
                "CAP_AUDIT_WRITE",
                "CAP_KILL",
                "CAP_NET_BIND_SERVICE"
            ]
        },
        "rlimits": [
            {
                "type": "RLIMIT_NOFILE",
                "hard": 1024,
                "soft": 1024
            }
        ]
    },
    "root": {
        "path": "/",
        "readonly": true
    },
    "hostname": "runsc",
    "mounts": [
        {
            "destination": "/proc",
            "type": "proc",
            "source": "proc"
        },
        {
            "destination": "/dev",
            "type": "tmpfs",
            "source": "tmpfs"
        },
        {
            "destination": "/sys",
            "type": "sysfs",
            "source": "sysfs",
            "options": [
                "nosuid",
                "noexec",
                "nodev",
                "ro"
            ]
        }
    ],
    "linux": {
        "uidMappings": [
            {
                "containerID":1000,
                "hostID": 1000,
                "size": 100
            }
        ],
        "gidMappings": [
            {
                "containerID": 1000,
                "hostID": 1000,
                "size": 100
            }
        ],
        "namespaces": [
	    {
		"type": "user"
	    },
            {
                "type": "pid"
            },
            {
                "type": "network"
            },
            {
                "type": "ipc"
            },
            {
                "type": "uts"
            },
            {
                "type": "mount"
            }
        ]
    }
}

I get error at this point:
https://github.com/google/gvisor/blob/master/runsc/cmd/gofer.go#L775

The gofer is try to call setuid(0,0,0) it will failed.

Any ideas?

@terenceli
Copy link
Author

@ayushr2 I just do some tests.

Just as I said in the previous, the gofer process calls

'unix.RawSyscall(unix.SYS_SETUID, 0, 0, 0)'

But as we setup a 1000:1000 userns map, this will cause a invalid argument error. This is reasonable as we are 1000 user in the current user ns so we can't call setuid to 0.

I then found a commit

595d424

I realize this is what I want, 'runsc as a non-root user works'. But I can't make it work.

@ayushr2
Copy link
Collaborator

ayushr2 commented Feb 20, 2024

But as we setup a 1000:1000 userns map, this will cause a invalid argument error. This is reasonable as we are 1000 user in the current user ns so we can't call setuid to 0.

Yes, to set up the sandbox, the root user needs to be mapped inside the new user namespace.

In 595d424, I had tested it with:

In both cases, the container's root user is mapped, so it works.

If you want --userns=keep-id behavior, then maybe you need to use at least 2 host users. Map host 1000 to container 1000 and map the other user to root. The gofer needs to be root to setup the filesystem and all that (which require capabilities).

@avagin Am I missing something?

@ayushr2
Copy link
Collaborator

ayushr2 commented Feb 20, 2024

Let me look into this. It may be an issue with the function you pointed out (syncUsernsForRootless()). The gofer and sandbox processes are started in a new userns with complete set of capabilities. So to support --userns=keep-id, we may need to setuid/setgid to the user that has been mapped.

It should not be necessary to map 2 host users for this. Because Podman etc support this use case.

@ayushr2
Copy link
Collaborator

ayushr2 commented Feb 22, 2024

Yeah this seems like an issue in runsc. For rootless containers, we currently only support running the non-root user as root inside the container's user namespace. This covers the common usecase. But as a result, Podman features like --userns=keep-id are not supported yet.

I tried looking into this a bit. I don't have the bandwidth to pursue this right now. But in case someone else is looking at this, here are some pointers:

  • We should check if the root user in mapped in the container's user namespace mapping. If it is mapped, then we can setuid() to root like we do right now. If it is not, then we need to check
    which UID is the runsc caller user mapped to. We should attempt to setuid() into that UID instead.
  • Both the gofer process and the sandbox process re-execute themselves after setuid(). We need to ensure that the required capabilities are present in their ambient capability set before re-execing. For non-root users, only the ambient set of capabilities are preserved across an execve(2). As of right now, it looks like their ambient capability set is empty, so after re-exec, they have no capabilities. The gofer still needs these capabilities to function after re-exec. For example, it requires CAP_SYS_CHROOT to do the final chroot after re-execing.

@terenceli
Copy link
Author

terenceli commented Feb 23, 2024

@ayushr2 your second pointer just remind of that whether the non-root process's child which in a new userns will have all full capability. After reading the doc and do some experiment I found the answer is 'yes'.

After test, we also need 'Inheritable' caps.
I just try following patch:


diff --git a/runsc/cmd/gofer.go b/runsc/cmd/gofer.go
index 3228587..624e971 100644
--- a/runsc/cmd/gofer.go
+++ b/runsc/cmd/gofer.go
@@ -56,6 +56,8 @@ var goferCaps = &specs.LinuxCapabilities{
        Bounding:  caps,
        Effective: caps,
        Permitted: caps,
+       Inheritable: caps,
+       Ambient: caps,
 }
 
 // goferSyncFDs contains file descriptors that are used for synchronization
@@ -782,10 +784,11 @@ func syncUsernsForRootless(fd int) {
        // SETUID changes UID on the current system thread, so we have
        // to re-execute current binary.
        runtime.LockOSThread()
-       if _, _, errno := unix.RawSyscall(unix.SYS_SETUID, 0, 0, 0); errno != 0 {
+       var uid = os.Getuid()
+       if _, _, errno := unix.RawSyscall(unix.SYS_SETUID, uintptr(uid), 0, 0); errno != 0 {
                util.Fatalf("failed to set UID: %v", errno)
        }
-       if _, _, errno := unix.RawSyscall(unix.SYS_SETGID, 0, 0, 0); errno != 0 {
+       if _, _, errno := unix.RawSyscall(unix.SYS_SETGID, uintptr(uid), 0, 0); errno != 0 {
                util.Fatalf("failed to set GID: %v", errno)
        }
 }


And it just works!


test@test-VirtualBox:~/runsc$ ./runsc --ignore-cgroups --network none  run abc
id
uid=1000(test) gid=1000(test) groups=1000(test)
touch /tmp/abc
exit
test@test-VirtualBox:~/runsc$ ls -lh /tmp/abc
-rw-r--r-- 1 test test 0 2月  23 16:45 /tmp/abc


So could you review this and make a more formal(such as tests) patch. Thanks!

@ayushr2
Copy link
Collaborator

ayushr2 commented Mar 1, 2024

Awesome! I think I had missed the Inheritable: caps part. Uh I unfortunately don't have cycles right now, but have put this on my TODO list. If someone else wants to pick this, feel free to!

I think the missing piece is, we need to communicate with syncUsernsForRootless() the host UID of the runsc caller and then lookup that UID in the UID mappings from container spec. If found, we want to setuid to the mapped container UID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants