Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot start new distributions with systemd 255 #4402

Open
gitzdnex opened this issue Feb 19, 2024 · 3 comments
Open

Cannot start new distributions with systemd 255 #4402

gitzdnex opened this issue Feb 19, 2024 · 3 comments
Labels
Incomplete Waiting on more information from reporter

Comments

@gitzdnex
Copy link

gitzdnex commented Feb 19, 2024

Required information

  • Distribution: ubuntu
  • Systemd: 249
  • Distribution version: 22.04
$lxc-start --version
5.0.0
$uname -a
Kernel: 6.5.0-15-generic #15~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan 12 18:54:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Issue description

  • Starting: ubuntu:jammy works
  • Starting: ubuntu:noble - does not work

New systemd 255 in container is not able too boot or start services on kernel 6.5 and systemd249. From what I have read it is problem with apparmor, but now I can start container if i set apparmor profile to allow nesting. But then services still does not start. It seems that this happens in debian, ubuntu:noble and also on on fedora:39. All of them does not start systemd-networkd and so on.

Steps to reproduce

  1. lxc-create testsystemd -t download ubuntu > noble
  2. lxc-start testsystemd
  3. lxc-attach testsystemd - fails (it is needed to activate)
  4. If you edit config to allow lxc.include = /usr/share/lxc/config/nesting.conf
  5. lxc-start testsystemd - starts but services inside does not they also fail on apparmor

Information to attach

dmesg.. apparmor
[343225.725270] audit: type=1400 audit(1708328073.308:848): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-container-default-with-nesting" name="/run/systemd/mount-rootfs/" pid=176115 comm="(networkd)" srcname="/" flags="rw, rbind"
[343225.731762] audit: type=1400 audit(1708328073.316:849): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-container-default-with-nesting" name="/run/systemd/mount-rootfs/" pid=176117 comm="(networkd)" srcname="/" flags="rw, rbind"

I have tried to edit services, but it looks, like it needs much more access so I was not able to start any of them. From what I have seen, it seems that previusly systemd in ubuntu had a patch which when this failed it allowed to continue like this one.
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1959047
I am not sure if there can be something done or if we will not be able to run it anymore?
I have also seen, that there ways some problem with apprmor on older kernels like 6.1, but currently I have 6.5.

@mihalicyn
Copy link
Member

Hi @gitzdnex!

Couldn't you try strace -o strace.log -f lxc-start -F testsystemd and then post strace.log in there?

I guess that it's because of torvalds/linux@157a353 (read also https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/#t).

@stgraber stgraber added the Incomplete Waiting on more information from reporter label Feb 20, 2024
@gitzdnex
Copy link
Author

Hi, so when I run it with strace -f it behaves differently. Failing on uidmap? But anywy - here is more information, what I was able to get. From what I see, container is normally running, but some mount inside which systemd sd-gens do, makes it crash. When container nesting is active, then this normally starts, but fails inside on systemd again on sd something. Meanwhile I will try to get normal strace output, but it seems, that strace uses some diffrent user and somehow messes up whole start. In the end is TRACE log from lxc-start - where it can be seen that it actually correctly starts.

Normal start

lxc-start -F testsystemd
lxc-start: testsystemd: cgroups/cgfsng.c: __cgfsng_delegate_controllers: 2953 Device or resource busy - Could not enable "+memory +pids" controllers in the unified cgroup 8
lxc-start: testsystemd: cgroups/cgfsng.c: __cgfsng_delegate_controllers: 2953 Device or resource busy - Could not enable "+memory +pids" controllers in the unified cgroup 8
systemd 255.2-3ubuntu2 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Ubuntu Noble Numbat (development branch)!

Failed to fork off sandboxing environment for executing generators: Protocol error
[!!!!!!] Failed to start up manager.
Exiting PID 1...

here is just protocol error?

Debug start lxc.init.cmd = /sbin/init systemd.log_level=debug

lxc-start -F testsystemd
lxc-start: testsystemd: cgroups/cgfsng.c: __cgfsng_delegate_controllers: 2953 Device or resource busy - Could not enable "+memory +pids" controllers in the unified cgroup 8
lxc-start: testsystemd: cgroups/cgfsng.c: __cgfsng_delegate_controllers: 2953 Device or resource busy - Could not enable "+memory +pids" controllers in the unified cgroup 8
systemd 255.2-3ubuntu2 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Detected virtualization lxc.
CPUID func 1 0
CPUID result 806c1 100800 7ffafbff bfebfbff
CPUID is hypervisor: no
Detected architecture x86-64.
Detected initialized system, this is not the first boot.
Kernel version 6.5.0-15-generic, our baseline is 4.15
No credentials passed from initrd.
Acquired 0 regular credentials, 0 untrusted credentials.

Welcome to Ubuntu Noble Numbat (development branch)!

Hostname was already set to <testsystemd>.
127.0.0.1 has already been added to loopback interface
::1 has already been added to loopback interface
Successfully brought loopback interface up
Setting '/proc/sys/net/unix/max_dgram_qlen' to '512'
Setting '/proc/sys/fs/file-max' to '9223372036854775807'
RLIMIT_MEMLOCK is already as high or higher than we need it, not bumping.
Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
Unified cgroup hierarchy is located at /sys/fs/cgroup.
bpf-firewall: Can't load kernel CGROUP SKB BPF program, BPF firewalling is not supported: Operation not permitted
Can't load kernel CGROUP DEVICE BPF program, BPF device control is not supported: Operation not permitted
Controller 'cpu' supported: no
Controller 'cpuacct' supported: no
Controller 'cpuset' supported: no
Controller 'io' supported: no
Controller 'blkio' supported: no
Controller 'memory' supported: no
Controller 'devices' supported: no
Controller 'pids' supported: no
Controller 'bpf-firewall' supported: no
Controller 'bpf-devices' supported: no
Controller 'bpf-foreign' supported: yes
Controller 'bpf-socket-bind' supported: no
Controller 'bpf-restrict-network-interfaces' supported: no
Set up TFD_TIMER_CANCEL_ON_SET timerfd.
Failed to establish memory pressure event source, ignoring: Operation not permitted
Enabling (yes) showing of status (command line).
Successfully forked off '(sd-gens)' as PID 22.
PR_SET_MM_ARG_START failed: Operation not permitted
Failed to remount root directory as MS_SLAVE: Permission denied
(sd-gens) failed with exit status 1.
Failed to fork off sandboxing environment for executing generators: Protocol error
[!!!!!!] Failed to start up manager.
Exiting PID 1...

This is linked with this apprmor error - I guess that remount is main problem. Btw I think that remout

Systemd
PR_SET_MM_ARG_START failed: Operation not permitted
Failed to remount root directory as MS_SLAVE: Permission denied
(sd-gens) failed with exit status 1.

Dmesg
[707350.704852] audit: type=1400 audit(1708692188.597:1724): apparmor="DENIED" operation="mount" class="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=361794 comm="(sd-gens)" flags="rw, rslave"
lxc Trace log - `lxc-start -l TRACE --logfile=/tmp/out -F testsystemd` 

lxc_trace_log.log

@gitzdnex
Copy link
Author

gitzdnex commented Feb 23, 2024

And here is more information, about error when apparmor with nesting profile is active. Container starts, but rest is not able.

For example:

#systemctl log-level debug
#systemctl start systemd-networkd
#journalctl -eu systemd-networkd.service
Feb 23 13:00:42 test3 systemd[1]: Starting systemd-networkd.service - Network Configuration...
Feb 23 13:00:42 test3 (networkd)[1123]: PR_SET_MM_ARG_START failed: Operation not permitted
Feb 23 13:00:42 test3 (networkd)[1123]: Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
Feb 23 13:00:42 test3 systemd[1]: systemd-networkd.service: User lookup succeeded: uid=998 gid=998
Feb 23 13:00:42 test3 (networkd)[1123]: Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy
Feb 23 13:00:42 test3 (networkd)[1123]: /run/systemd/mount-rootfs/dev (read-write-implicit) is duplicate.
Feb 23 13:00:42 test3 (networkd)[1123]: /run/systemd/mount-rootfs/home (read-write-implicit) is duplicate.
Feb 23 13:00:42 test3 (networkd)[1123]: /run/systemd/mount-rootfs/proc (read-write-implicit) is duplicate.
Feb 23 13:00:42 test3 (networkd)[1123]: /run/systemd/mount-rootfs/root (read-write-implicit) is duplicate.
Feb 23 13:00:42 test3 (networkd)[1123]: /run/systemd/mount-rootfs/run/user (read-write-implicit) is duplicate.
Feb 23 13:00:42 test3 (networkd)[1123]: /run/systemd/mount-rootfs/sys (read-write-implicit) is duplicate.
Feb 23 13:00:42 test3 (networkd)[1123]: Bind-mounting / on /run/systemd/mount-rootfs (MS_BIND|MS_REC "")...
Feb 23 13:00:42 test3 (networkd)[1123]: Failed to mount / (type n/a) on /run/systemd/mount-rootfs (MS_BIND|MS_REC ""): Permission denied
Feb 23 13:00:42 test3 (networkd)[1123]: systemd-networkd.service: Failed to set up mount namespacing: Permission denied
Feb 23 13:00:42 test3 systemd[1]: systemd-networkd.service: Child 1123 belongs to systemd-networkd.service.
Feb 23 13:00:42 test3 systemd[1]: systemd-networkd.service: Main process exited, code=exited, status=226/NAMESPACE
Feb 23 13:00:42 test3 systemd[1]: systemd-networkd.service: Failed with result 'exit-code'.
Feb 23 13:00:42 test3 systemd[1]: systemd-networkd.service: Service will restart (restart setting)
Feb 23 13:00:42 test3 systemd[1]: systemd-networkd.service: Changed start -> failed-before-auto-restart
Feb 23 13:00:42 test3 systemd[1]: systemd-networkd.service: Job 2521 systemd-networkd.service/start finished, result=failed
Feb 23 13:00:42 test3 systemd[1]: Failed to start systemd-networkd.service - Network Configuration.

In here it looks again that mount...?

Feb 23 13:00:42 test3 (networkd)[1123]: Bind-mounting / on /run/systemd/mount-rootfs (MS_BIND|MS_REC "")...
Feb 23 13:00:42 test3 (networkd)[1123]: Failed to mount / (type n/a) on /run/systemd/mount-rootfs (MS_BIND|MS_REC ""): 

Makes a problem, but this is even when lxc apparmor profile with nesting is active. I guess when uncofined, will be active, it can work?

Dmesg

[708405.022257] audit: type=1400 audit(1708693242.866:1760): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-container-default-with-nesting" name="/run/systemd/mount-rootfs/" pid=362358 comm="(networkd)" srcname="/" flags="rw, rbind"
[708405.031923] audit: type=1400 audit(1708693242.878:1761): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="lxc-container-default-with-nesting" name="/run/systemd/mount-rootfs/" pid=362360 comm="(networkd)" srcname="/" flags="rw, rbind"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Incomplete Waiting on more information from reporter
Development

No branches or pull requests

3 participants