Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instance start failed with --fakeroot #2189

Open
mulroony opened this issue Apr 25, 2024 · 7 comments
Open

instance start failed with --fakeroot #2189

mulroony opened this issue Apr 25, 2024 · 7 comments
Assignees
Milestone

Comments

@mulroony
Copy link

Version of Apptainer

What version of Apptainer (or Singularity) are you using? Run

$ apptainer --version apptainer version 1.3.0-1.el8

Expected behavior

What did you expect to see when you do...?

  • instance should start with --fakeroot

Actual behavior

What actually happened? Why was it incorrect?

  • Instance failed to start with error below.

Steps to reproduce this behavior

How can others reproduce this issue/problem?

$ apptainer instance start --fakeroot docker://alpine a1
INFO:    Using cached SIF image
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    Using cached SIF image
INFO:    Using fakeroot command combined with root-mapped namespace
INFO:    Cleanup error: while stopping driver for /var/lib/apptainer/mnt/session/rootfs: squashfuse_ll exited: fuse: failed to unmount /var/lib/apptainer/mnt/session/rootfs: Invalid argument
ERROR:   container cleanup failed: no instance found with name a1
FATAL:   container creation failed: while applying cgroups config: Interactive authentication required.

FATAL:   while executing starter: failed to start instance: while running /usr/libexec/apptainer/bin/starter: exit status 255```

### What OS/distro are you running

```sh
$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="8.9 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.9 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.9
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.9"

How did you install Apptainer

Write here how you installed Apptainer (or Singularity).

RPM

$ yum info apptainer
Installed Packages
Name         : apptainer
Version      : 1.3.0
Release      : 1.el8
Architecture : x86_64
Size         : 118 M
Source       : apptainer-1.3.0-1.el8.src.rpm
Repository   : @System
From repo    : epel
Summary      : Application and environment virtualization formerly known as Singularity
URL          : https://apptainer.org
License      : BSD and LBNL BSD and ASL 2.0
Description  : Apptainer provides functionality to make portable
             : containers that can be used across host environments.

Notes

Seems to be the same issue as #1749 but that was resolved, possible regression? Thanks!

@DrDaveD DrDaveD added this to the 1.3.2 milestone Apr 25, 2024
@DrDaveD
Copy link
Contributor

DrDaveD commented Apr 25, 2024

Yes it does seem to be a regression. #1749 was supposed to have disabled the use of cgroups with instance start --fakeroot.

I do not get an error in my el8 test VM with apptainer 1.3.0 or 1.3.1; I'm not sure what is different between our environments. However I do get a different cgroups error on el9 with both 1.3.0 and 1.3.1:

$ apptainer instance start --fakeroot docker://alpine a1
INFO:    Using cached SIF image
INFO:    User not listed in /etc/subuid, trying root-mapped namespace
INFO:    Using cached SIF image
INFO:    Using fakeroot command combined with root-mapped namespace
INFO:    Terminating squashfuse_ll after timeout
INFO:    Timeouts can be caused by a running background process
INFO:    Cleanup error: while stopping driver for /var/lib/apptainer/mnt/session/rootfs: squashfuse_ll exited: fuse: failed to unmount /var/lib/apptainer/mnt/session/rootfs: Invalid argument
ERROR:   container cleanup failed: no instance found with name a1
FATAL:   container creation failed: while applying cgroups config: unable to start unit "apptainer-1830278.scope" (properties [{Name:Description Value:"libcontainer container 1830278"} {Name:Slice Value:"system.slice"} {Name:Delegate Value:true} {Name:PIDs Value:@au [1830278]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Interactive authentication required.

FATAL:   while executing starter: failed to start instance: while running /usr/libexec/apptainer/bin/starter: exit status 255

@JasonYangShadow fixed the problem last time and hopefully can fix it again

@JasonYangShadow
Copy link
Member

hmm, tested apptainer 1.3.1 on rocky linux 8 & 9. I can not reproduce this issue. Maybe it's related to system configurations?

[vagrant@localhost ~]$ apptainer instance start --fakeroot docker://alpine a1
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Copying blob 4abcf2066143 done   |
Copying config bc4e4f7999 done   |
Writing manifest to image destination
2024/05/10 02:24:37  info unpack layer: sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8
INFO:    Creating SIF file...
INFO:    instance started successfully
[vagrant@localhost ~]$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="8.9 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.9 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.9"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.9"
[vagrant@localhost ~]$ apptainer version
1.3.1-1.el8
[vagrant@localhost ~]$ apptainer instance start --fakeroot docker://alpine a1
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Copying blob 4abcf2066143 done   |
Copying config bc4e4f7999 done   |
Writing manifest to image destination
2024/05/10 02:52:03  info unpack layer: sha256:4abcf20661432fb2d719aaf90656f55c287f8ca915dc1c92ec14ff61e67fbaf8
INFO:    Creating SIF file...
INFO:    instance started successfully
[vagrant@localhost ~]$ cat /etc/os-release
NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"
[vagrant@localhost ~]$ apptainer version
1.3.1-1.el9

@JasonYangShadow
Copy link
Member

I installed apptainer through dnf and enabled epel-release, nothing else I did on the system.

@mulroony
Copy link
Author

mulroony commented May 10, 2024

Just tested again on a new RHEL9 system and still seeing it. Also tested on a CentOS 8 Stream system and it worked.

Maybe this is a RHEL thing and the other distros don't have the same default config? Also possible this is something unique to our setup as all these systems were configured by the same group except for the CentOS 8 box. Is anyone else with RHEL able to test? I can see about getting test VM spun up without any config we apply and see if it still happens.

Works...

# cat /etc/os-release 
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

Does not work (new install, minimal configuration)...

$ cat /etc/os-release 
NAME="Red Hat Enterprise Linux"
VERSION="9.4 (Plow)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Red Hat Enterprise Linux 9.4 (Plow)"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:redhat:enterprise_linux:9::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_BUGZILLA_PRODUCT_VERSION=9.4
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"

@DrDaveD
Copy link
Contributor

DrDaveD commented May 10, 2024

Oh, my el9 error only happens when /etc/subuid is not set up. Jason, try it with --ignore-subuid.

On el8 with --ignore-subuid I now get the original error reported in the description. I see that my el9 error was actually quite similar, it just had more details in the error message.

I also get the same error on el7 with --ignore-subuid.

@JasonYangShadow
Copy link
Member

Oh, my el9 error only happens when /etc/subuid is not set up. Jason, try it with --ignore-subuid.

On el8 with --ignore-subuid I now get the original error reported in the description. I see that my el9 error was actually quite similar, it just had more details in the error message.

I also get the same error on el7 with --ignore-subuid.

hmm, having some questions about this mode

if !useCG && lccgroups.IsCgroup2UnifiedMode() && l.engineConfig.File.SystemdCgroups && !l.cfg.Fakeroot && !hidePid {

in pure fakeroot mode, the l.uid != 0, so we won't use cgroup, it obviously looks good to me.
but in the --fakeroot --ignore-subuid mode, it uses root-mapped mode along with fakeroot mode, the l.uid = 0, but it looks like we can not successfully initialize the cgroup in this case. (The error is as reported in this ticket).

I tried skipping initializing cgroup in this case --fakeroot --ignore-subuid, it'll report

INFO:    Using fakeroot command combined with root-mapped namespace
INFO:    Instance stats will not be available - requires cgroups v2 with systemd as manager.
/.singularity.d/libs/fakeroot: eval: line 140: /.singularity.d/libs/faked: not found
fakeroot: error while starting the `faked' daemon.
sh: you need to specify whom to kill

I guess it's because in root-mapped mode the faked is not mapped.

@DrDaveD
Copy link
Contributor

DrDaveD commented May 14, 2024

The faked command ought to be bind mounted in whenever the fakeroot command is being used. I would next check with debug mode if it is attempting to do that and what went wrong with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants