Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always enable cgroup namespace for containers #3735

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dqminh
Copy link

@dqminh dqminh commented Nov 16, 2021

Fix #3734

In cgroupv2 hierrachy, cgroup setup for nested containers (i.e. docker)
are incorrect without enabling cgroup namespace. This enables cgroup
namespace for all containers to fix the incorrect cgroup setup.

In cgroupv2 hierrachy, cgroup setup for nested containers (i.e. docker)
are incorrect without enabling cgroup namespace. This enables cgroup
namespace for all containers to fix the incorrect cgroup setup.

See linuxkit#3734

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
@djs55
Copy link
Contributor

djs55 commented Dec 14, 2021

Thanks for the PR! I've enabled the CI, although it might not test the change. I'll investigate.

@djs55
Copy link
Contributor

djs55 commented Dec 29, 2021

A progress update: when I enabled this locally I wasn't able to get dockerd + containerd to start in a container. It might be a problem in my setup. I'll investigate more and propose a test case if the problem persists.

@the-maldridge
Copy link
Contributor

@djs55 I think this might be related to something I'm looking at as well related to docker. Ignoring for the moment that the example on master doesn't build (something is wrong with the container image that gets stored, and so the resultant image doesn't appear to actually even have docker in it...), docker fails with the error below in my testing which I believe is related to cgroups:

ctr -n services.linuxkit task exec -exec-id debug docker docker run crccheck/hello-world
Unable to find image 'crccheck/hello-world:latest' locally
latest: Pulling from crccheck/hello-world
e685c5c858e3: Pulling fs layer
7bf3c383dbcd: Pulling fs layer
7bf3c383dbcd: Verifying Checksum
7bf3c383dbcd: Download complete
e685c5c858e3: Pull complete
7bf3c383dbcd: Pull complete
Digest: sha256:0404ca69b522f8629d7d4e9034a7afe0300b713354e8bf12ec9657581cf59400
Status: Downloaded newer image for crccheck/hello-world:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "c 5:1 rwm": write /sys/fs/cgroup/devices/docker/18ee78ce6bc9e313c760dc1a1428ed719a980dbb96eadd4ba55b527458e66aa2/devices.allow: operation not permitted: unknown.
time="2022-01-13T07:58:34Z" level=error msg="error waiting for container: context canceled"

@djs55
Copy link
Contributor

djs55 commented Jan 13, 2022

@the-maldridge interesting! I'll take a look at the example on master when I get a moment. It would be good to fix it and make sure the end-to-end tests are properly testing it.

I've seen a very similar error with devices.allow, but only with cgroup v1 and it seemed to be transient (!). My understanding is that this controller is used to prevent unauthorised containers running mknod to grant themselves access to the physical hardware. In cgroup v1 the controller (documented here) is configured by writing to the devices.allow file. I think in cgroup v2 it was removed and replaced with eBPF programs, so on a working cgroup v2 system I see:

dave@m1 ~ % docker run -it --privileged -v /sys/fs/bpf:/sys/fs/bpf -v /sys/fs/cgroup:/sys/fs/cgroup djs55/bpftool cgroup tree
CgroupPath
ID       AttachType      AttachFlags     Name           
/sys/fs/cgroup/006-metadata
    21       device          multi                          
/sys/fs/cgroup/011-bridge
    36       device          multi                          
/sys/fs/cgroup/dhcpcd
    53       device          multi     
...

So I guess we should make sure the example works in both cgroup v1 and (default) cgroup v2 mode, if possible.

@the-maldridge
Copy link
Contributor

Well this is far from transient, it happens without fail on every machine and instance I try this image on. I'm happy to pull logs or whatever else might be helpful to get this figured out because this is preventing docker from working and that is preventing me from updating things.

@the-maldridge
Copy link
Contributor

I have done some more checking. I'm not really sure what's going on, but I figured I'd add more information. FWIW all this information is obtained from a system built with a patched linuxkit that includes this PR.

The docker error is the same above, it cannot setup the container due to the controller being in the "wrong" spot. I really think this may be a case where docker wants to see the host cgroups paths since its talking at least as far as its concerned to a host containerd.

node1:/# ctr -n services.linuxkit task exec -exec-id debug docker ls /sys/fs/cgroup/devices
cgroup.clone_children
cgroup.procs
devices.allow
devices.deny
devices.list
notify_on_release
tasks

node1:/# ctr -n services.linuxkit task exec -exec-id debug docker ls /sys/fs/cgroup/devices/docker
ls: /sys/fs/cgroup/devices/docker: No such file or directory
node1:/# ls /sys/fs/cgroup/devices/
000-sysctl             cgroup.clone_children  devices.deny           logwrite               sshd
001-sysfs              cgroup.procs           devices.list           nomad                  tasks
002-rngd_boot          cgroup.sane_behavior   dhcpcd                 notify_on_release      vault
003-dhcpcd_boot        consul                 docker                 openntpd
004-metadata           coredns                emissary               release_agent
acpid                  devices.allow          getty                  rngd

node1:/# ls /sys/fs/cgroup/devices/docker/
cgroup.clone_children  devices.allow          devices.list           tasks
cgroup.procs           devices.deny           notify_on_release

node1:/# ctr -n services.linuxkit task exec -exec-id debug docker docker run --rm -i crccheck/hello-world
Unable to find image 'crccheck/hello-world:latest' locally
latest: Pulling from crccheck/hello-world
e685c5c858e3: Pulling fs layer
7bf3c383dbcd: Pulling fs layer
7bf3c383dbcd: Verifying Checksum
7bf3c383dbcd: Download complete
e685c5c858e3: Verifying Checksum
e685c5c858e3: Download complete
e685c5c858e3: Pull complete
7bf3c383dbcd: Pull complete
Digest: sha256:0404ca69b522f8629d7d4e9034a7afe0300b713354e8bf12ec9657581cf59400
Status: Downloaded newer image for crccheck/hello-world:latest
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "c 5:1 rwm": write /sys/fs/cgroup/devices/docker/37de90b430b4e77cb718c7105e1a62a22a0c47ee33cd259c0f2cea1722f454ba/devices.allow: operation not permitted: unknown.

This really does look like something is wrong with the way that docker is interacting with the way the host cgroups work, and it seems to have changed between v0.8 and here (though so too has a lot of other stuff).

@the-maldridge
Copy link
Contributor

Further research shows that the breaking component between v0.8 and here is runc. This is where I have to temporarily admit defeat as I don't understand enough of how runc and containerd slot together to fully understand what's going on here. All I know is that older versions of runc work and newer versions do not. I'll take a look at the runc changelog when I have time to see if I can find a specific issue.

@the-maldridge
Copy link
Contributor

Further progress!

Any version of runc past v1.0.0-rc90 breaks linuxkit. Not sure if its worth running a bisect through runc or not as runc appears to have introduced go modules at some point during that time which makes it tricky to build a nice one-line bisect.

@the-maldridge
Copy link
Contributor

A complete bisect finds that opencontainers/runc 60e21ec is the first bad commit. I don't see what to do from here, but a lot of work after that point in the runc tree has to do with cgroupsv2, and edits to the way it handles cgroups in general.

@mpoindexter
Copy link

@the-maldridge I suspect the problem causing the docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: process_linux.go:508: setting cgroup config for procHooks process caused: failed to write "c 5:1 rwm": write /sys/fs/cgroup/devices/docker/18ee78ce6bc9e313c760dc1a1428ed719a980dbb96eadd4ba55b527458e66aa2/devices.allow: operation not permitted: unknown. error is that the docker service does not have access to the console device. I added this to my docker service definition and the problem went away:

    devices:
      - path: "/dev/console"
        type: "c"
        major: 5
        minor: 1
        mode: "0666"
      - path: all
        type: b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable cgroup namespace by default for container
4 participants