Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to start with "Bad file descriptor" for AMD EPYC 7452 and 7502 CPUs running EL9 #2166

Open
chrisburr opened this issue Apr 20, 2024 · 7 comments
Assignees
Milestone

Comments

@chrisburr
Copy link
Contributor

Version of Apptainer

$ apptainer --version
apptainer version 1.2.2

Expected behavior

Containers start reliably.

Actual behavior

When using AMD EPYC 7452 32-Core Processor and AMD EPYC 7502 32-Core Processors I see this about half of the time:

ERROR  : Failed to get file information for file descriptor 3: Bad file descriptor
ERROR  : Could not write info to setgroups: Permission denied

The only happens for machines running EL9-like kernels, machines running EL7-like kernels aren't affected. I'm not exactly sure exactly which distribution as my tests are running inside containers so I only have access to the host's kernel info via uname -a.

If I use strace on a affecred mahine and compare the successful and failing output I see:

2284981 16:53:02.026479 unshare(CLONE_NEWUSER) = 0 <0.000053>
2284981 16:53:02.026634 rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0 <0.000023>
2284981 16:53:02.026756 clone(child_stack=0x55a89dc3e3b0, flags=SIGCHLD) = 2285108 <0.000442>
2284981 16:53:02.027327 close(5<UNIX-STREAM:[82532702->82532701]> <unfinished ...>
2285108 16:53:02.027434 rt_sigprocmask(SIG_SETMASK, [CHLD],  <unfinished ...>
2284981 16:53:02.027520 <... close resumed>) = 0 <0.000092>
2285108 16:53:02.027601 <... rt_sigprocmask resumed>NULL, 8) = 0 <0.000085>
2284981 16:53:02.027677 openat(AT_FDCWD</cvmfs/cernvm-prod.cern.ch/cvm4>, ".", O_RDONLY|O_DIRECTORY <unfinished ...>
2285108 16:53:02.027768 close(4<UNIX-STREAM:[82532701->82532702]>) = 0 <0.000029>
2285108 16:53:02.028002 fstat(3</cvmfs>,  <unfinished ...>
...
2284481 16:52:48.097896 unshare(CLONE_NEWUSER) = 0 <0.000056>
2284481 16:52:48.098054 rt_sigprocmask(SIG_BLOCK, NULL, [CHLD], 8) = 0 <0.000024>
2284481 16:52:48.098189 clone(child_stack=0x55f2bd9c43b0, flags=SIGCHLD) = 2284557 <0.000400>
2284481 16:52:48.098703 close(4<UNIX-STREAM:[82615440->82615439]> <unfinished ...>
2284557 16:52:48.098799 rt_sigprocmask(SIG_SETMASK, [CHLD],  <unfinished ...>
2284481 16:52:48.098888 <... close resumed>) = 0 <0.000093>
2284557 16:52:48.098953 <... rt_sigprocmask resumed>NULL, 8) = 0 <0.000069>
2284481 16:52:48.099028 openat(AT_FDCWD</cvmfs/cernvm-prod.cern.ch/cvm4>, ".", O_RDONLY|O_DIRECTORY <unfinished ...>
2284557 16:52:48.099135 close(3<UNIX-STREAM:[82615439->82615440]>) = 0 <0.000028>
2284557 16:52:48.099306 fstat(3, 0x7ffd65abd8d0) = -1 EBADF (Bad file descriptor) <0.000025>
2284481 16:52:48.099474 <... openat resumed>) = 4</cvmfs/cernvm-prod.cern.ch/cvm4> <0.000343>
2284557 16:52:48.099602 write(2</dev/pts/1<char 136:1>>, "\33[91mERROR  : Failed to get file information for file descriptor 3: Bad file descriptor\n\33[0m", 92 <unfinished ...>

i.e. in the bad case it closes fd(3) immediately before calling fstat on that file handle.

The source of the UNIX-STREAM:[ appears to be:

debugf("Create socketpair for master communication channel\n");
if ( socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, master_socket) < 0 ) {
fatalf("Failed to create communication socket: %s\n", strerror(errno));
}
/* create RPC sockets only if the container is created */
if ( !sconfig->container.namespace.joinOnly ) {
debugf("Create RPC socketpair for communication between stage 2 and RPC server\n");
if ( socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, rpc_socket) < 0 ) {
fatalf("Failed to create communication socket: %s\n", strerror(errno));
}
}

Looking back earlier in the log I see in the working case:

2284981 16:53:02.004792 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2285046, si_uid=171801, si_status=0, si_utime=1, si_stime=6} ---
2284981 16:53:02.004926 fchdir(7</cvmfs/cernvm-prod.cern.ch/cvm4>) = 0 <0.000032>

2284981 16:53:02.005121 openat(AT_FDCWD</cvmfs/cernvm-prod.cern.ch/cvm4>, "/proc/self/fd", O_RDONLY) = 8</proc/2284981/fd> <0.000048>
2284981 16:53:02.005342 newfstatat(8</proc/2284981/fd>, "", {st_mode=S_IFDIR|0500, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.001117>
2284981 16:53:02.007308 fcntl(8</proc/2284981/fd>, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) <0.000039>
2284981 16:53:02.007552 fcntl(8</proc/2284981/fd>, F_SETFD, FD_CLOEXEC) = 0 <0.000544>
2284981 16:53:02.008772 getdents64(8</proc/2284981/fd>, 0x55a89fc1f2f0 /* 11 entries */, 32768) = 264 <0.000506>

2284981 16:53:02.009489 fcntl(3</cvmfs>, F_SETFD, FD_CLOEXEC) = 0 <0.000020>

2284981 16:53:02.009628 close(4<anon_inode:[eventpoll]>) = 0 <0.000027>
2284981 16:53:02.009761 close(5<pipe:[82596754]>) = 0 <0.000021>
2284981 16:53:02.009887 close(6<pipe:[82596754]>) = 0 <0.000023>
2284981 16:53:02.010009 fcntl(7</cvmfs/cernvm-prod.cern.ch/cvm4>, F_SETFD, FD_CLOEXEC) = 0 <0.000020>
2284981 16:53:02.010178 getdents64(8</proc/2284981/fd>, 0x55a89fc1f2f0 /* 0 entries */, 32768) = 0 <0.000017>
2284981 16:53:02.010313 close(8</proc/2284981/fd>) = 0 <0.000018>
2284981 16:53:02.010426 close(8)        = -1 EBADF (Bad file descriptor) <0.000018>
2284981 16:53:02.010549 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000020>

2284981 16:53:02.010692 socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [4<UNIX-STREAM:[82532701->82532702]>, 5<UNIX-STREAM:[82532702->82532701]>]) = 0 <0.000039>
2284981 16:53:02.011841 socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [6<UNIX-STREAM:[82532703->82532704]>, 8<UNIX-STREAM:[82532704->82532703]>]) = 0 <0.000033>

and in the failing case:

2284481 16:52:48.078734 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2284530, si_uid=171801, si_status=0, si_utime=2, si_stime=7} ---
2284481 16:52:48.078862 fchdir(7</cvmfs/cernvm-prod.cern.ch/cvm4>) = 0 <0.000036>

2284481 16:52:48.079037 openat(AT_FDCWD</cvmfs/cernvm-prod.cern.ch/cvm4>, "/proc/self/fd", O_RDONLY) = 3</proc/2284481/fd> <0.000064>
2284481 16:52:48.079296 newfstatat(3</proc/2284481/fd>, "", {st_mode=S_IFDIR|0500, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000029>
2284481 16:52:48.079469 fcntl(3</proc/2284481/fd>, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE) <0.000025>
2284481 16:52:48.079604 fcntl(3</proc/2284481/fd>, F_SETFD, FD_CLOEXEC) = 0 <0.000024>
2284481 16:52:48.079748 getdents64(3</proc/2284481/fd>, 0x55f2be3bc2f0 /* 10 entries */, 32768) = 240 <0.000042>

2284481 16:52:48.079915 close(4<anon_inode:[eventpoll]>) = 0 <0.000039>
2284481 16:52:48.080072 close(5<pipe:[82619750]>) = 0 <0.000027>
2284481 16:52:48.080220 close(6<pipe:[82619750]>) = 0 <0.000034>
2284481 16:52:48.080361 fcntl(7</cvmfs/cernvm-prod.cern.ch/cvm4>, F_SETFD, FD_CLOEXEC) = 0 <0.000025>
2284481 16:52:48.080502 getdents64(3</proc/2284481/fd>, 0x55f2be3bc2f0 /* 0 entries */, 32768) = 0 <0.000026>
2284481 16:52:48.080670 close(3</proc/2284481/fd>) = 0 <0.000026>
2284481 16:52:48.080815 close(3)        = -1 EBADF (Bad file descriptor) <0.000025>
2284481 16:52:48.080976 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000024>

2284481 16:52:48.081133 socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [3<UNIX-STREAM:[82615439->82615440]>, 4<UNIX-STREAM:[82615440->82615439]>]) = 0 <0.000054>
2284481 16:52:48.082284 socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0, [5<UNIX-STREAM:[82615441->82615442]>, 6<UNIX-STREAM:[82615442->82615441]>]) = 0 <0.000036>

i.e. in the failing case there is a missing call to fcntl(3</cvmfs>, F_SETFD, FD_CLOEXEC) = 0 which I presume corrosponds to:

/* check if the file descriptor need to remain opened */
for ( i = 0; i < starter->numfds; i++ ) {
if ( starter->fds[i] == fd ) {
found = true;
/* set force close on exec */
if ( fcntl(starter->fds[i], F_SETFD, FD_CLOEXEC) < 0 ) {
debugf("Can't set FD_CLOEXEC on file descriptor %d: %s\n", starter->fds[i], strerror(errno));
}
break;
}
}

It's still not clear to me why this is is happening and I'm now struggling to get access to a node to continue debugging on. I'll update if I manage to make any more progress.

Steps to reproduce this behavior

I've only been able to reproduce with quite a complex command:

$ seq 50 | xargs -n1 -P10 /cvmfs/lhcb.cern.ch/lib/var/lib/LbEnv/stable/linux-64/bin/lb-dirac lb-run -c x86_64_v2-centos7-gcc12+detdesc-opt Moore/v54r22 true

How did you install Apptainer

From conda-forge.

@chrisburr
Copy link
Contributor Author

I've also now reproduced this with /cvmfs/oasis.opensciencegrid.org/mis/apptainer/1.3.0/bin on path, though oddly it happens way less often if I do that.

@chrisburr
Copy link
Contributor Author

This might be the same issue described in sylabs/singularity#7

@chrisburr
Copy link
Contributor Author

I built 1.3.0 both with and without sylabs/singularity#8 and it seems to fix the issue.

@chrisburr
Copy link
Contributor Author

I suspect this might be the same thing as what was seen in #1947.

@DrDaveD
Copy link
Contributor

DrDaveD commented Apr 22, 2024

We found another fix for the same problem as sylabs/singularity#8 that was less disruptive in apptainer/singularity#5946. I don't know if @cclerget would be in favor of disabling the starter gc.

@chrisburr
Copy link
Contributor Author

Applying the fix from sylabs/singularity#8 had a dramatic effect on reducing the rate at which we see this failure with apptainer 1.3.0 so I think something more than apptainer/singularity#5946 is needed:

Screenshot 2024-04-22 at 15 00 25

The remaining failures are appear to be CVMFS propogation delays and errors which I misidentify when trying to isolate this in the monitoring.

@DrDaveD DrDaveD added this to the 1.3.2 milestone Apr 22, 2024
@cclerget
Copy link
Contributor

@chrisburr thanks for the detailed report, I dont think that disabling the golang garbage collector is a good fix as it could potentially lead to other issues. Will try to identify the root cause for the 1.3.2 release based on your report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants