Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout seems changed in stopping mounts from 10 secs to 1 sec. #2216

Closed
tomgreen66 opened this issue May 8, 2024 · 2 comments
Closed

Timeout seems changed in stopping mounts from 10 secs to 1 sec. #2216

tomgreen66 opened this issue May 8, 2024 · 2 comments

Comments

@tomgreen66
Copy link

Version of Apptainer

What version of Apptainer (or Singularity) are you using? Run

apptainer --version (or singularity --version).

apptainer version 1.3.0

Expected behavior

When exiting from a container I expect to not see an error message.

Actual behavior

What actually happened? Why was it incorrect?

exit
DEBUG   [U=1170613,P=145401]CleanupContainer()            Cleanup container
DEBUG   [U=1170613,P=145401]umount()                      Umount /var/apptainer/mnt/session/final
DEBUG   [U=1170613,P=145401]umount()                      Umount /var/apptainer/mnt/session/rootfs
DEBUG   [U=1170613,P=145401]stop()                        Waiting for squashfuse_ll pid 145432 to exit
DEBUG   [U=1170613,P=145401]stop()                        Terminating pid 145432 after wait timeout
DEBUG   [U=1170613,P=145401]waitInstance()                squashfuse_ll pid 145432 has exited with status 241
DEBUG   [U=1170613,P=145401]filterMsg()                   fuse: failed to clone device fd: Inappropriate ioctl for device
DEBUG   [U=1170613,P=145401]filterMsg()                   fuse: trying to continue without -o clone_fd.
INFO    [U=1170613,P=145401]CleanupContainer()            Cleanup error: while stopping driver for /var/apptainer/mnt/session/rootfs: squashfuse_ll exited
DEBUG   [U=1170613,P=145401]Master()                      Child exited with exit status 137

Looks like in:

waitTimeout := 1 * time.Second

The timeout got changed from 10 seconds to 1 second looking at the previous code. Maybe misreading the change though so might be just taking a little longer to unmount.

Steps to reproduce this behavior

How can others reproduce this issue/problem?

In my case I would run a miniconda container, load an environment, run python and load a module and then exit Python and the container.

What OS/distro are you running

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

How did you install Apptainer

Write here how you installed Apptainer (or Singularity). Eg. RPM, source.

From source.

@DrDaveD
Copy link
Contributor

DrDaveD commented May 9, 2024

Please try it with 1.3.1. This could be a duplicate of #2104. There's still a message when it times out now, but it's more helpful and doesn't indicate that it was an error.

If you don't actually have any background processes holding a reference to the container, I'd like to know more details of exactly how to reproduce the problem.

@tomgreen66
Copy link
Author

Thanks for that. Having run ps aux after exiting Python it seems the python module (in this case the ArcGIS python module arcpy) can leave a process running Xvfb. So I assume it is entirely the same as #2104 - apologies for duplicate. Will try 1.3.1.

@DrDaveD DrDaveD closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants