Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE: Can't get systemd to run with 1.11 #22285

Closed
beetree opened this issue Apr 25, 2016 · 26 comments
Closed

ISSUE: Can't get systemd to run with 1.11 #22285

beetree opened this issue Apr 25, 2016 · 26 comments

Comments

@beetree
Copy link

beetree commented Apr 25, 2016

I've been running a few hundred containers with systemd in them since 1.7. The flags required has changed a little bit. In 1.10 I was adding --cap-add=SYS_ADMIN, --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro, and --security-opt=seccomp:unconfined.

With the same flags, it doesn't work in 1.11.

With --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --privileged it works.

With --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro --security-opt=seccomp:unconfined it does not work.

Here's a dump of the system:

root@Ubuntu-1510-wily-64-minimal ~ # docker info
Containers: 102
 Running: 75
 Paused: 0
 Stopped: 27
Images: 59
Server Version: 1.11.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 352
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Kernel Version: 4.2.0-35-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 125.9 GiB
Name: Ubuntu-1510-wily-64-minimal
ID: L6PF:6LTG:FHIZ:NBPC:CJSO:XXQ3:7KIV:ZVQF:C7LA:3NNG:XU3C:O6OT
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 243
 Goroutines: 431
 System Time: 2016-04-25T04:35:08.881928707+02:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
root@Ubuntu-1510-wily-64-minimal ~ # docker version
Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:38:59 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:38:59 2016
 OS/Arch:      linux/amd64
root@Ubuntu-1510-wily-64-minimal ~ # uname -a
Linux Ubuntu-1510-wily-64-minimal 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Thanks for the help!

FYI: This is the only thing I could find about the issue while Googling, and it suggests something indeed did change in 1.11: https://trello.com/c/RFUcI1eV/158-3-make-docker-systemd-cgroups-driver-work-in-1-11

@HackToday
Copy link
Contributor

Perhaps it is better to paste what the error msg look like for your containers run with such options

@justincormack
Copy link
Contributor

Have you got some way to reproduce this easily, I don't have any systemd containers to hand? Any error messages would be useful, although there may be no useful ones. I would expect you would need --cap-add=SYS_ADMIN --security-opt=seccomp:unconfined at least. It is possible that the apparmor config is also an issue, as that is also changed by --privileged, if you could check on a machine without apparmor enabled that might rule that part out. Otherwise it could need another capability you could also test with --cap-add all to rule that out (seems fairly unlikely).

@rbjorklin
Copy link

I think I'm seeing this or at least something similar running Fedora 23. I've been trying to run the official centos:7 image and it works like a charm with Fedora provided docker package version 1.9.1 but if I upgrade to 1.11 from the Docker repos it breaks. The errors I'm seeing are one of Failed to get D-Bus connection: Connection refused or Failed to get D-Bus connection: Operation not permitted depending on if I create a volume for /sys/fs/cgroup, /tmp and /run or not. Something potentially related have also been reported in #7459. One obvious difference I've found is the contents of the systemd unit file in the Docker provided package which holds some comments regarding systemd and cgroups.

$ sudo docker info
Containers: 1
Images: 14
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-253:0-2758710-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: /dev/loop2
 Metadata file: /dev/loop3
 Data Space Used: 592.1 MB
 Data Space Total: 107.4 GB
 Data Space Available: 40.91 GB
 Metadata Space Used: 1.303 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.146 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.109 (2015-09-22)
Execution Driver: native-0.2
Logging Driver: journald
Kernel Version: 4.4.6-301.fc23.x86_64
Operating System: Fedora 23 (Workstation Edition)
CPUs: 8
Total Memory: 31.31 GiB
Name: lxwsrbj
ID: CZWW:IYHJ:ZUCV:CBEO:2MIS:TPMT:YXF5:RZ6J:IACZ:XSFD:AT33:XOGR

docker.service file

$ cat docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network.target docker.socket
Requires=docker.socket

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/docker daemon -H fd://
MountFlags=slave
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes

[Install]
WantedBy=multi-user.target

@rbjorklin
Copy link

--cap-add=SYS_ADMIN --security-opt=seccomp:unconfined seems to have solved it for me! Cheers!

@justincormack
Copy link
Contributor

Ok great, currently we would expect systemd to need both of those.

@thaJeztah
Copy link
Member

I'll close this issue, but if you think there's something that needs to be improved, or have a suggestion to document it somewhere (although, I don't think we describe how to run systemd in a container currently), feel free to open a pull request

@beetree
Copy link
Author

beetree commented May 5, 2016

Alright, finally got the time to recreate.

Here's 1.10:

root@ubuntu:~/tmp# docker version
Client:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:40:35 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.2
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   c3959b1
 Built:        Mon Feb 22 21:40:35 2016
 OS/Arch:      linux/amd64
root@ubuntu:~/tmp# docker info
Containers: 40
 Running: 1
 Paused: 0
 Stopped: 39
Images: 125
Server Version: 1.10.2
Storage Driver: devicemapper
 Pool Name: docker-8:1-1053029-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 2.147 GB
 Backing Filesystem: ext4
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 3.53 GB
 Data Space Total: 214.7 GB
 Data Space Available: 34.13 GB
 Metadata Space Used: 6.992 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.14 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.99 (2015-06-20)
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: host bridge null
Kernel Version: 4.4.0-040400rc8-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 7.794 GiB
Name: ubuntu
ID: O6JD:MGK4:2SWN:D2TC:SIRV:53OE:IKCM:2C37:YZDF:2XOO:HQBF:UXZY
Username: eleet
Registry: https://index.docker.io/v1/
WARNING: No swap limit support


root@ubuntu:~/tmp# cat Dockerfile
FROM ubuntu:16.04

RUN apt-get update

RUN apt-get install openssh-server -y
RUN systemctl enable ssh

ENTRYPOINT ["/lib/systemd/systemd"]


root@ubuntu:~/tmp# docker build --tag=test .; docker run -d --security-opt=seccomp:unconfined --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro test;
Sending build context to Docker daemon 4.096 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> 864e1e0980e7
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> 938776fbc490
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> e01c1d84af28
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> 62d512414edf
Successfully built 62d512414edf
425bdc24dc0aa66b7df6758bbe965536dc0a8e439018a974856311eae6a2463b


root@ubuntu:~/tmp# docker exec -it 425bdc24dc0aa66b7df6758bbe965536dc0a8e439018a974856311eae6a2463b /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.3  0.0  37016  4952 ?        Ss   03:34   0:00 /lib/systemd/systemd
root        30  0.0  0.0  35276  3956 ?        Ss   03:34   0:00 /lib/systemd/systemd-journald
root        37  0.0  0.0  65612  6056 ?        Ss   03:34   0:00 /usr/sbin/sshd -D
root        45  0.0  0.0   4508  1864 ?        S    03:34   0:00 /bin/sh /etc/init.d/ondemand background
root        49  0.0  0.0   4380   652 ?        S    03:34   0:00 sleep 60
root        86  0.0  0.0  34424  2896 ?        Rs+  03:35   0:00 ps aux

Now, let's do the same on 1.11:

root@Ubuntu-1510-wily-64-minimal ~/tmp # docker version
Client:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:38:59 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.0
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   4dc5990
 Built:        Wed Apr 13 18:38:59 2016
 OS/Arch:      linux/amd64
root@Ubuntu-1510-wily-64-minimal ~/tmp # docker info
Containers: 153
 Running: 120
 Paused: 0
 Stopped: 33
Images: 119
Server Version: 1.11.0
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 501
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null
Kernel Version: 4.2.0-35-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 125.9 GiB
Name: Ubuntu-1510-wily-64-minimal
ID: XLOE:KPVX:3CAI:AZU6:3GPX:5MFN:EN5Z:XWLM:HSWO:RJSE:JYKL:ANO7
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 374
 Goroutines: 649
 System Time: 2016-05-05T05:36:29.843362464+02:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support


root@Ubuntu-1510-wily-64-minimal ~/tmp # cat Dockerfile
FROM ubuntu:16.04

RUN apt-get update

RUN apt-get install openssh-server -y
RUN systemctl enable ssh

ENTRYPOINT ["/lib/systemd/systemd"]


root@Ubuntu-1510-wily-64-minimal ~/tmp # docker build --tag=test .; docker run -d --security-opt=seccomp:unconfined --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> f075d0538829
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> fa3f72915303
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> beefd710320c
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> 08f551680378
Successfully built 08f551680378
ae6fef40693c821836f43b801fedf2769002211ad0de487aaefa17251321d424



root@Ubuntu-1510-wily-64-minimal ~/tmp # docker exec -it ae6fef40693c821836f43b801fedf2769002211ad0de487aaefa17251321d424 /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  1.4  0.0  36536  2148 ?        Ss   03:36   0:00 /lib/systemd/sy
root        12  0.0  0.0  34424  2924 ?        Rs+  03:37   0:00 ps aux

As you can see, in the 0.11 case, systemd doesn't start properly.

However, if I in 0.11 do the following:

root@Ubuntu-1510-wily-64-minimal ~/tmp # docker build --tag=test .; docker run -d --privileged -v /sys/fs/cgroup:/sys/fs/cgroup:ro test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> f075d0538829
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> fa3f72915303
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> beefd710320c
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> 08f551680378
Successfully built 08f551680378
69fb2f91cb07cb6ce74d19109e1cc5271abefdac44245c52bc0517b75837da1a

root@Ubuntu-1510-wily-64-minimal ~/tmp # docker exec -it 69fb2f91cb07cb6ce74d19109e1cc5271abefdac44245c52bc0517b75837da1a /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.1  0.0  37036  5180 ?        Ss   03:38   0:00 /lib/systemd/sy
root        19  0.0  0.0  35276  7764 ?        Ss   03:38   0:00 /lib/systemd/sy
systemd+    31  0.0  0.0 100324  2484 ?        Ssl  03:38   0:00 /lib/systemd/sy
root        37  0.0  0.0  65612  6212 ?        Ss   03:38   0:00 /usr/sbin/sshd
root        41  0.0  0.0  13028  1768 tty1     Ss+  03:38   0:00 /sbin/agetty --
root        43  0.0  0.0  13028  1816 tty2     Ss+  03:38   0:00 /sbin/agetty --
root        46  0.1  0.0   4508  1788 ?        S    03:38   0:00 /bin/sh /etc/in
root        47  0.0  0.0  13028  1864 tty3     Ss+  03:38   0:00 /sbin/agetty --
root        48  0.0  0.0  13028  1956 tty4     Ss+  03:38   0:00 /sbin/agetty --
root        49  0.0  0.0  13028  1840 tty5     Ss+  03:38   0:00 /sbin/agetty --
root        50  0.0  0.0  13028  1832 tty6     Ss+  03:38   0:00 /sbin/agetty --
root        60  0.0  0.0   4380   812 ?        S    03:38   0:00 sleep 60
root        62  0.0  0.0  34424  2912 ?        Rs+  03:39   0:00 ps aux

So, if I do --privileged in 0.11 it works. But with --security-opt=seccomp:unconfined --cap-add SYS_ADMIN in 0.11 it does not work. However, --security-opt=seccomp:unconfined --cap-add SYS_ADMIN works in 0.10.

So, something change when going from 0.10 to 0.11 that broke this. What happened?

Thanks!

/b3

@beetree
Copy link
Author

beetree commented May 5, 2016

@thaJeztah Can you please reopen? The suggested parameters were sufficient in 0.10 but are not sufficient in 0.11, and I provided details above on how to recreate the issue.

@thaJeztah
Copy link
Member

I just tried, and it looks like something changed indeed, only --privileged works, so I'll reopen.

Wondering why you need systemd in your container here, a very simple Dockerfile would probably give you the same;

FROM ubuntu:16.04

RUN apt-get update && apt-get install openssh-server -y
RUN mkdir -p /var/run/sshd && chmod 0755 /var/run/sshd \
 && echo 'root:screencast' | chpasswd \
 && sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config \
 && sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd

EXPOSE 22
CMD ["-D"]
ENTRYPOINT ["/usr/sbin/sshd"]

@thaJeztah thaJeztah reopened this May 5, 2016
@thaJeztah thaJeztah added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. and removed status/more-info-needed labels May 5, 2016
@thaJeztah thaJeztah added this to the 1.11.2 milestone May 5, 2016
@beetree
Copy link
Author

beetree commented May 5, 2016

@thaJeztah Long story about systemd. It essentially boils down to users not being familiar with containers, but only knowing how to deal with systemd. And while your suggestion seems simple to you and me, most users would find it highly complicated compared to "apt-get install openssh-server" (sample of how users think).

@thaJeztah
Copy link
Member

@beetree yes, unfortunately there's still a lot of educating needed; users still thing of containers as "virtual machines", whereas they should be more as "bundled executables"

@beetree
Copy link
Author

beetree commented May 5, 2016

@thaJeztah Yes, the user is certainly wrong :P

@thaJeztah
Copy link
Member

@beetree not in all cases, but in most cases, I don't see a reason to do it (just my 0.02c)

@justincormack
Copy link
Contributor

Ok, as far as I can see it fails if /sys/fs/cgroup/systemd does not exist. If you run it once with -v /sys/fs/cgroup:/sys/fs/cgroup:rw which means it can create this, or probably boot it on a systemd based system, then it works with docker run -d --security-opt seccomp:unconfined --cap-add sys_admin -v /sys/fs/cgroup:/sys/fs/cgroup:ro ... but if that directory is not there it fails. I am guessing that that may well be the difference between your 1.10 and 1.11 setup...

@nathwill
Copy link
Contributor

nathwill commented May 6, 2016

we're seeing this as well on 1.11,--security-opts=seccomp:unconfined --cap-add=SYS_ADMIN works under 1.10, but not on 1.11. i suspect this might indeed be AppArmor related, as it seems to work on Fedora 23, but not in the Docker-on-Mac beta, which I think uses an Ubuntu bhyve guest?

@thaJeztah it's been made abundantly clear what Docker's perspective is on multi-process containers, but I'd like to share our use case, in the interest of providing at least a single data-point on the types of reasons users may be interested in running systemd in Docker containers. Hopefully it helps get past this "you're doing it wrong" attitude i see so often in docker bug-reports 😄

We (Treehouse) offer a feature to our students called "Workspaces", which is online code editor and terminal that our students use to work on projects associated with their courses. Each Workspace is spun up as an on-demand docker container running the backing services that the frontend code-editor talks to, with persistence handled by bind-mounting gluster volumes into the container. The services that make up an active Workspace include things like:

  • posix file api web-service
  • web based terminal for interacting with the CLI (compiling css from sass, bundling gems, compiling C# & Java, etc)
  • apache server for previewing static files and PHP
  • postfix for sending mail to local user accounts (outbound mail's restricted of course...)

we use docker's dynamic port mapping to expose these services (and other common dev-preview ports for e.g. flask, etc) on the host, and inject the routes into Redis for our load-balancer.

because these are docker containers, we're able to run anywhere from 100-200 Workspaces on a given host, which is awesome. Having to do this in actual VMs would be cost-prohibitive, so Docker's worked really well for us in that regard.

With experience, we've found that treating each active Workspace as a single container is optimal for several reasons:

  • far fewer API calls, which increases the load we can put on a given docker daemon instance, since we have seen issues with lock contention making the API unresponsive under high-frequency API calls. this is far less of a problem with 100-200 containers than it is with 1000+, which would be necessary for multi-container (container-per-backend process) Workspaces
  • conceptual and operational simplicity: 1 container == 1 workspace, and if there's a problem, you just nuke the one container. it also avoids the need for complicated linking strategies.
  • related to the above, there's no "orphan" containers; running each backing service in its own container means that some services may stay running long after the rest (and it) has been shut down. due to docker api latency, some shutdown requests may not complete, so "container leakage" is a real problem. with a workspace == container model, docker ps tells us definitively whether a workspace is 'active' or not, so shutdowns can just be retried for expired workspaces if they fail at first.
  • zombie procs are still a real problem, so you need a pid 1 in the container to handle them anyways; we initially used start.sh entrypoints, and found systemd to be much more effective, as well as handling service-restarts as needed so we can delegate to systemd to ensure all of a Workspaces backing-services are restarted if they crash
  • we occasionally get bit by Creating fail with Could not find container for entity id <id> after upgrading to 1.9.0 #17691, and the failure case in the single-container model is a lot better, since it means the entire workspace won't launch, as opposed to launching in a degraded state due to a conflict with just one of the backing services

there's probably some other things i'm forgetting, but those are the big ones for us at present. ultimately, we hope Docker can see that there are some legitimate use cases for multi-process containers, though it's clear that best-practice for most use cases is still the single-process container model.

thanks for reading, and thanks for a great product! we love Docker, and hope we can keep using it well into the future!

@justincormack
Copy link
Contributor

@nathwill Docker-on-Mac does not use AppArmor or SELinux. It is based on Alpine not Ubuntu. Can you try running with -v /sys/fs/cgroup:/sys/fs/cgroup:rw the first time as per my comment above - that then worked for me.

In general though even with multiprocess containers I might go for something simpler than systemd.

@nathwill
Copy link
Contributor

nathwill commented May 6, 2016

Docker-on-Mac does not use AppArmor or SELinux. It is based on Alpine not Ubuntu.

ah, ok. i had seen aufs in the docker info output and assumed it was Ubuntu 👍

In general though even with multiprocess containers I might go for something simpler than systemd.

shrug... systemd's pretty dang simple if you're just using pid 1; we haven't really had many problems with it aside from the recent security-related stuff. do you have any recommendation for one that supports restarts, environment pass-through and zombie-proc handling?

Edit: re: "Can you try running with -v /sys/fs/cgroup:/sys/fs/cgroup:rw", i've forwarded a link to your comment to one of our developers who's in the beta, hope to hear back soon.

@beetree
Copy link
Author

beetree commented May 7, 2016

@justincormack

# ls -la /sys/fs/cgroup/systemd/
total 0
dr-xr-xr-x   5 root root   0 May  7 17:53 .
drwxr-xr-x  12 root root 320 Apr 23 20:57 ..
-rw-r--r--   1 root root   0 May  2 11:37 cgroup.clone_children
-rw-r--r--   1 root root   0 May  2 11:37 cgroup.procs
-r--r--r--   1 root root   0 May  2 11:37 cgroup.sane_behavior
drwxr-xr-x 111 root root   0 Apr 23 20:59 docker
-rw-r--r--   1 root root   0 May  2 11:37 notify_on_release
-rw-r--r--   1 root root   0 May  2 11:37 release_agent
drwxr-xr-x  54 root root   0 Apr 23 20:57 system.slice
-rw-r--r--   1 root root   0 May  2 11:37 tasks
drwxr-xr-x   3 root root   0 Apr 23 20:57 user.slice

So, it exists. Still:

root@Ubuntu-1510-wily-64-minimal ~/tmp # cat Dockerfile
FROM ubuntu:16.04

RUN apt-get update

RUN apt-get install openssh-server -y
RUN systemctl enable ssh

ENTRYPOINT ["/lib/systemd/systemd"]
root@Ubuntu-1510-wily-64-minimal ~/tmp # docker build --tag=test .; docker run -d --security-opt=seccomp:unconfined --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:ro test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> 8cc8fa33e927
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> ccff4008d4bd
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> c0c577808e65
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> ae50e3d5066a
Successfully built ae50e3d5066a
e18080496140313189463d96e5c6bd3ba32c42e4cd5bc30eb415f51cb9c99774
root@Ubuntu-1510-wily-64-minimal ~/tmp # docker exec -it e18080496140313189463d96e5c6bd3ba32c42e4cd5bc30eb415f51cb9c99774 /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  5.5  0.0  36536  2088 ?        Ss   15:54   0:00 /lib/systemd/systemd
root         7  0.0  0.0  34424  2944 ?        Rs+  15:54   0:00 ps aux

Testing your suggested approach (note the rw):

root@Ubuntu-1510-wily-64-minimal ~/tmp # docker build --tag=test .; docker run -d --security-opt=seccomp:unconfined --cap-add SYS_ADMIN -v /sys/fs/cgroup:/sys/fs/cgroup:rw test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> 8cc8fa33e927
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> ccff4008d4bd
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> c0c577808e65
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> ae50e3d5066a
Successfully built ae50e3d5066a
237502db4d53863551b10ea7c6676940e47ec958a522c566b18ec0977099cf05
root@Ubuntu-1510-wily-64-minimal ~/tmp # docker exec -it 237502db4d53863551b10ea7c6676940e47ec958a522c566b18ec0977099cf05 /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  5.2  0.0  36536  2124 ?        Ss   15:55   0:00 /lib/systemd/sy
root         7  0.0  0.0  34424  2936 ?        Rs+  15:55   0:00 ps aux

That said, I think there is something funky with the host system. On one of my machines it actually works:

root@m3182:~/tmp# docker build --tag=test .; docker run -d --privileged -v /sys/fs/cgroup:/sys/fs/cgroup:ro test;
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu:16.04
 ---> c5f1cf30c96b
Step 2 : RUN apt-get update
 ---> Using cache
 ---> dbe2316d1c8b
Step 3 : RUN apt-get install openssh-server -y
 ---> Using cache
 ---> 3a36e4f78434
Step 4 : RUN systemctl enable ssh
 ---> Using cache
 ---> 139635dffc97
Step 5 : ENTRYPOINT /lib/systemd/systemd
 ---> Using cache
 ---> a15fb9bfe596
Successfully built a15fb9bfe596
50ee3ce8fef0cb3078c5c4229ee67718714e61baaba126036d74bf5508465d6d
root@m3182:~/tmp# docker exec -it 50ee3ce8fef0cb3078c5c4229ee67718714e61baaba126036d74bf5508465d6d /bin/bash -c "ps aux"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  1.3  0.0  36992  4964 ?        Ss   15:57   0:00 /lib/systemd/sy
root        19  0.6  0.0  35276  7632 ?        Ss   15:57   0:00 /lib/systemd/sy
systemd+    30  0.0  0.0 100324  2564 ?        Ssl  15:57   0:00 /lib/systemd/sy
root        36  0.1  0.0  65612  6312 ?        Ss   15:57   0:00 /usr/sbin/sshd
root        43  0.0  0.0  13028  1840 tty2     Ss+  15:57   0:00 /sbin/agetty --
root        46  0.8  0.0   4508  1728 ?        S    15:57   0:00 /bin/sh /etc/in
root        47  0.0  0.0  13028  1840 tty3     Ss+  15:57   0:00 /sbin/agetty --
root        48  0.0  0.0  13028  1788 tty4     Ss+  15:57   0:00 /sbin/agetty --
root        49  0.0  0.0  13028  1816 tty5     Ss+  15:57   0:00 /sbin/agetty --
root        50  0.0  0.0  13028  1840 tty6     Ss+  15:57   0:00 /sbin/agetty --
root        62  0.0  0.0   4380   800 ?        S    15:57   0:00 sleep 60
root        70  0.0  0.0  34424  2796 ?        Rs+  15:57   0:00 ps aux
root@m3182:~/tmp# docker info
Containers: 57
 Running: 42
 Paused: 0
 Stopped: 15
Images: 23
Server Version: 1.11.1
Storage Driver: devicemapper
 Pool Name: docker-8:2-9699477-pool
 Pool Blocksize: 65.54 kB
 Base Device Size: 32.21 GB
 Backing Filesystem: ext4
 Data file: /dev/loop2
 Metadata file: /dev/loop3
 Data Space Used: 49.62 GB
 Data Space Total: 429.5 GB
 Data Space Available: 379.9 GB
 Metadata Space Used: 34.17 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.113 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: false
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 WARNING: Usage of loopback devices is strongly discouraged for production use. Either use `--storage-opt dm.thinpooldev` or use `--storage-opt dm.no_warn_on_loop_devices=true` to suppress this warning.
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.99 (2015-06-20)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null
Kernel Version: 3.19.8-031908-generic
Operating System: Ubuntu 15.10
OSType: linux
Architecture: x86_64
CPUs: 24
Total Memory: 125.8 GiB
Name: m3182.contabo.host
ID: PZ4G:F6LY:5J7E:7QRJ:CJ7A:5U6D:W6P2:NSDC:INKM:DMJI:UEUL:PF44
Docker Root Dir: /var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 146
 Goroutines: 270
 System Time: 2016-05-07T17:57:50.872828164+02:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
root@m3182:~/tmp# docker version
Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:38:55 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:38:55 2016
 OS/Arch:      linux/amd64

The issue is easy for me to reproduce. Just let me know what info you need about the two different host environments. Here are some basics:

Broken host environment:

root@Ubuntu-1510-wily-64-minimal ~/tmp # lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
Stepping:              4
CPU MHz:               3599.941
CPU max MHz:           3900.0000
CPU min MHz:           1200.0000
BogoMIPS:              7000.57
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              12288K
NUMA node0 CPU(s):     0-11
root@Ubuntu-1510-wily-64-minimal ~/tmp # uname -a
Linux Ubuntu-1510-wily-64-minimal 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@Ubuntu-1510-wily-64-minimal ~/tmp # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.10
Release:        15.10
Codename:       wily

Working host environment:

root@m3182:~/tmp# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               1213.781
CPU max MHz:           3200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4801.52
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23
root@m3182:~/tmp# uname -a
Linux m3182.contabo.host 3.19.8-031908-generic #201505110938 SMP Mon May 11 13:39:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
root@m3182:~/tmp# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 15.10
Release:        15.10
Codename:       wily

Kernel issue?

/beetree

@joesteele
Copy link

Thanks for the ping @nathwill. What ended up working on the Docker for Mac Beta is running with just the --privileged flag, without mounting the cgroup volume.

docker run --privileged -d <image> /sbin/init

@thaJeztah
Copy link
Member

@joesteele running with --privileged should really be regarded a workaround; a "privileged" container basically offers no protection, so processes inside the container can break out

@joesteele
Copy link

@thaJeztah gotcha. This isn't something we are/were considering for production. This is just, as you said, a workaround to facilitate setting things up for local development (and it's temporary at that).

With the Docker for Mac Beta, I've been going around setting up our various services with Docker for ease in local development and I ran into this particular issue when setting up the service @nathwill was describing above.

We'll probably end up moving away from our systemd approach anyhow.

@beetree
Copy link
Author

beetree commented May 10, 2016

Anyone looking into this? @justincormack any more info you need from me in order to recreate this?

@beetree
Copy link
Author

beetree commented May 12, 2016

;(

Clearly reproduced issue, but no solution :(

Is your response here that using systemd inside containers is not in line with the intended usage of Docker and I should look for other solutions (either replacing systemd or replacing Docker)?

/beetree

@thaJeztah
Copy link
Member

thaJeztah commented May 12, 2016

Ok I did some more digging, and discovered that on Debian Jessie, this just works, but on Ubuntu 15.10, it doesn't work, even if I mount /sys/fs/cgroup with :rw first.

I used these steps;

cat << EOF | docker build -t tester -
FROM ubuntu:16.04

RUN apt-get update

RUN apt-get install openssh-server -y
RUN systemctl enable ssh

ENTRYPOINT ["/lib/systemd/systemd"]
EOF
docker run -d \
  --cap-add SYS_ADMIN \
  --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
  --security-opt seccomp:unconfined \
  --name tester \
  tester

docker exec -it tester /bin/bash -c "ps -e -o uid,pid,cmd"

  UID   PID CMD
    0     1 /lib/systemd/systemd
    0     5 ps -e -o uid,pid,cmd

At first, I thought that the systemd version on the host may be related (Jessie uses systemd 215, Ubuntu 15.10 uses 225). Also wondered if the systemd version inside the container possibly should match the systemd version on the host (in case the cgroup structure is different), but changing the image to use FROM ubuntu:15.10 made no difference.

I downgraded docker to version 1.10.3 on Ubuntu, but this did not make a change, so it's not a regression in 1.11, just a difference between these hosts.

Comparing the cgroup structure between Jessie and Ubuntu 15.10 while the container is running (tree -d /sys/fs/cgroup/), I noticed this difference;

On Jessie:

/sys/fs/cgroup/
│
(snip)
└── systemd
    ├── docker
    │   └── 901fa72f2cc9e507bf1a13467917e076e4a7d9d78b92a949e514639779bb8722
    │       ├── init.scope
    │       └── system.slice
    │           ├── dev-hugepages.mount
    │           ├── dev-mqueue.mount
    │           ├── etc-hostname.mount
    │           (etc....)
    │           
    ├── system.slice
    │   ├── acpid.service
    │   ├── atd.service
    │   ├── cgroupfs-mount.service
    │   ├── cloud-config.service

On Ubuntu:

/sys/fs/cgroup/
│
(snip)
└── systemd
    ├── docker
    │   └── 9d5f265f4e12d8772f454fa24906855bcc5ca4e0135701711288fb9658bf3777
    ├── system.slice
    │   ├── accounts-daemon.service
    │   ├── apparmor.service
    │   ├── cgroupfs-mount.service
    ...

Notice that there are no cgroups beneath the container-ID

Wondering what could influence this, I decided to start the container with apparmor disabled;

docker run -d \
  --cap-add SYS_ADMIN \
  --volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
  --security-opt seccomp:unconfined \
  --security-opt apparmor:unconfined \
  --name tester \
  tester

And success:

docker exec -it tester /bin/bash -c "ps -e -o uid,pid,cmd"

 UID   PID CMD
    0     1 /lib/systemd/systemd
    0    19 /lib/systemd/systemd-journald
    0    24 /usr/sbin/sshd -D
    0   131 ps -e -o uid,pid,cmd

Cgroups are also created now beneath the container ID;

└── systemd
    ├── docker
    │   ├── 225b13dd9849b91387a1c0667c9c3f12d9811891c7f0497a4f231e7f71242b3b
    │   │   ├── init.scope
    │   │   └── system.slice
    │   │       ├── dev-hugepages.mount

So, it looks like there's no regression, but probably AppArmor is disabled on the host where it works, and enabled on the one where it doesn't work. Disabling AppArmor on the container (using --security-opt apparmor=unconfined) allows systemd inside the container to do its work.

@justincormack
Copy link
Contributor

I just found a build of systemd in a container that does not require any privileges or changes on the host, which seems a lot more useful: docker run -d -v /sys/fs/cgroup:/sys/fs/cgroup:ro maci0/systemd /usr/lib/systemd/systemd source is https://github.com/maci0/docker-systemd-unpriv

@thaJeztah thaJeztah removed this from the 1.11.2 milestone May 12, 2016
@thaJeztah thaJeztah removed the kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. label May 12, 2016
@thaJeztah
Copy link
Member

I'll close this issue, because it doesn't appear to be a bug, but perhaps we should add an example to the documentation (or if someone wants to contribute that 👍 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants