New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow first execution inside the container #2050
Comments
It's not very surprising to me that initializing things takes extra time, especially when using overlays. What could be helpful is if you could find a combination that was considerably faster, so the slowest subsystem could be identified. I suggest trying it with apptainer-suid installed to see if that makes a difference, and trying it with sandbox containers and/or overlays. |
Thank you for your response! |
It's unfortunate that you can't try suid mode, it would be good to have that comparison. Maybe you could ask a system admin to help you out temporarily for a test? Make sure that it is running squashfuse_ll out of /usr/libexec/apptainer/bin. It should be if it was installed via apt by the system administrator. The benchmarks I ran in #665 didn't see much startup cost compared to a sandbox, but maybe your application is that much more punishing on squashfuse_ll. Can you tell if there's any slowdown compared to a sandbox once the application is up and running? |
I take it back, the final (lhcb-gen-sim-bmk) version of that benchmark also did see 15 seconds additional time for squashfuse_ll vs local disk sandbox. I just didn't break down startup time vs execute time. It saw 11 seconds additional time for squashfs vs local disk sandbox. It's hard to say how much of that was within the margin of error, however, since I didn't distinguish between startup and execute time. The primary benchmark I was using (atlas-gen-bmk) showed 8 seconds slower for squashfuse_ll vs sandbox, and it showed the kernel squashfs to be 4 seconds slower than that. That was non-intuitive and I chalked that up to margin of error. |
Hi DrDaveD, I'm unsure if its running squasfuse_II, the binary is in that path but there is a high CPU usage related to fuse2fs. Even a pip install with an overlay takes ages, and it's the only bottleneck we have in our system. How can i detect the bottleneck? |
fuse2fs is slow. Avoid using overlay images if you need performance. Have you tried an overlay sandbox? |
Hi, if i try to run a sandbox container (and fakeroot) with an overlay (sparse and normal + fakeroot), the |
I doubt you'll be able to give me instructions on how to reproduce that, so you're going to have to dive in deeper to see where it is hanging. Maybe the If you can arrange to run it without using overlay at all that would probably speed things up considerably. |
Hi, For this current test i'm using the sandboxed The Notes:
|
That is indeed a bad bug. I can reproduce it with a base ubuntu20.04 image. Fortunately it does not happen with the 1.3.0-rc.2 release. Please upgrade to that version for this test. I will create a separate issue just to document this. |
Okey, we will update Apptainer and try to replace it with the SUID installation in the following weeks. |
Hi, I have had the opportunity to run apptainer-suid + sandbox images + sparse overlay in a different machine and the results are pretty bad. Installing Tensorflow with this configuration inside the Cuda 11 Ubuntu 22.04 container takes up to 5 minutes meanwhile it only takes around 30s to execute the very same command with Docker Rootless. I know that using overlays is far from ideal, but our workflow is designed to avoid creating new images as |
I was suggesting using a directory overlay, not a sparse overlay image. An overlay image will use fuse2fs. Docker does not use overlay images, they use plain directories. Whether the underlying image is a sandbox or SIF doesn't make a lot of difference to the performance in my experience, not when you're running only one. |
Version of Apptainer
What version of Apptainer (or Singularity) are you using? Run
apptainer --version
(orsingularity --version
).We are running
apptainer version 1.2.5
Expected behavior
It is expected that each command takes the same time to execute inside the container
Example Command:
for i in {1..3}; do time (ls -la / > /dev/null); done
Host output
Actual behavior
What actually happened? Why was it incorrect?
The same running inside a container takes noticeable longer the first time.
This issue is way more noticeable while running the
Nvidia Isaac Sim
container, where the first execution takes 30s whereas the next executions only take about 3.4s. The latter behaviour is the expected, even if running the container with Docker for the very first time (without caching or anything, in a fresh docker + docker nvidia toolkit install)In this case we are not initializing the full container, only measuring the time between the start and the time it takes to the logger to show the currently installed GPUs (with an
nvidia-smi
) as initializing the full container requires more time and it also compiles the shaders of the simulator.Steps to reproduce this behavior
For the
ls
example, run anubuntu:20.04
container with the following command.apptainer shell -c ubuntu-20.04.sif
For the Isaac Sim execution based on the
isaac-sim.headless.native.sh
script, the same results happend with the following configurations.CFG 1
--nv
-c
--binds
--writable-tmpfs
--fakeroot
CFG 2
--nv
-c
--binds
--writable-tmpfs
CFG 3
--nv
--binds
--overlay
using a 10GB sparse disk.CFG 4
--nv
-c
--overlay
using a 10GB sparse disk. (should store the new cache)CFG 5
--nv
-c
Cache binds mount the following directories to store the Shader Cache compiled by the program.
What OS/distro are you running
How did you install Apptainer
Via the APT repository.
Important Information
The
$HOME
of the user is located in an NFS mount, that's why we are avoiding mounting the$HOME
and other directories with the-c
argument. All the files that are required for the experiment, such as the.sif
file, the.img
overlay and the folder mounts are stored in the local/tmp
directory.The text was updated successfully, but these errors were encountered: