Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpointing of containers run with apptainer run #2186

Open
Pigrenok opened this issue Apr 25, 2024 · 0 comments
Open

Checkpointing of containers run with apptainer run #2186

Pigrenok opened this issue Apr 25, 2024 · 0 comments
Assignees

Comments

@Pigrenok
Copy link

That is more of a feature request/clarification rather than a bug.

So, I need to checkpoint a container that was started with the apptainer run ... command rather than the apptainer instance start/run command. The container runs a very long analysis and it would be helpful to checkpoint it from time to time so, that if something goes wrong with the container runtime or host system, the analysis can be started from the last checkpoint instead of the beginning.

The problem is that apptainer instance run/start command works as a service (as expected) and

  1. It does not stop when the runscript finishes.
  2. Output of the runscript is not really accessible unless it is written to a file inside the runscript.

I can see the issue with using apptainer run in this case as it runs in the foreground, but it is not possible to set up a checkpoint saving loop after launching the container unless it is sent to the background. But that will still solve both issues mentioned above...

Unfortunately, I do not see how the container ran with apptainer run can be checkpointed as this command does not have --dmtcp_... options to launch or restart it and thus does not allow to associate the checkpoint location.

Is it even possible to do that way?

Thank you very much in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants