Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env var OMP_NUM_THREADS is upper bound on mksquashfs processors #1819

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

GregThain
Copy link

If environment variable OMP_NUM_THREADS is set to some positive integral value, it caps the number of processors used by mksquashfs. This is useful when running concurrent builds in jobs in a batch environment, and the batch system provisions a certain number of cores for each job to use via the OMP_NUM_THREADS environment variable. With the default of using all the cores, we can end up with an n-squared problem, where n is the number of cores.

Description of the Pull Request (PR):

Write your description of the PR here. Be sure to include as much background,
and details necessary for the reviewers to understand exactly what this is
fixing or enhancing.

This fixes or addresses the following GitHub issues:

  • Fixes #

Before submitting a PR, make sure you have done the following:

processors used by mksquashfs.  This is useful when
running concurrent builds in jobs in a batch environment,
and the batch system provisions a certain number of cores
for each job to use via the OMP_NUM_THREADS environment
variable.  With the default of using all the core, we
can end up with an n-squared problem, where n is the
number of cores.

Signed-off-by: Greg Thain <gthain@cs.wisc.edu>
@DrDaveD
Copy link
Contributor

DrDaveD commented Nov 21, 2023

Thanks for your first Apptainer PR, Greg.

This seems to be setting a new precedent, however, with an environment variable that doesn't begin with APPTAINER. Is OMP_NUM_THREADS some kind of standard? Oh google says it's associated with OpenMP. I don't see any current references to OpenMP or OMP in the Apptainer source code so I don't know if it makes sense to apply it or not. Try to make the case and I can see what other interested parties think.

From a higher-level perspective, under what circumstances are container builds being done in a batch environment? That sounds like an abuse of batch resources. It is expected that containers are pre-built and made available to batch jobs. Maybe we shouldn't make it easier for people to get away with that.

From a lower-level perspective, if we proceed with this PR I'll need entries in CONTRIBUTORS.md and CHANGELOG.md, and I'll need a separate PR to the documentation to describe the new environment variable.

@GregThain
Copy link
Author

Hi Dave -- thanks for the quick comments.

An ongoing frustration for batch systems is that Linux gives us ways to limit the amount of cpu that the processes/threads in a job use, but no standard way for the batch system to hint to the job how many it should spawn for best throughput. And, there are a number of programs that, by default, spawn as many threads/processes as cpus the detect, which can lead to problems. There are many ad-hoc solutions to this, and as a result, HTCondor sets about a dozen environment variables for jobs, including OMP_NUM_THREADS, and similar variables for Matlab, Julia, Xrootd, TensorFLow, and, perhaps interesting, GOMAXPROCS, which the go runtime looks at.

I'm not wedded to OMP_NUM_THREADS as the variable name, we could create an APPTAINER_NUM_THREADS, or just use GOMAXPROCS to pass the -processors argument to mksquashfs, even though mksquashfs isn't written in go.

As far as use cases, I believe the site where this was reported to us was one where apptainer was running images pulled from a docker-style repo at runtime. Even if we were to ban that (and I'm not sure we can), I think that running apptainer build in batch jobs as part of a CI/CD system is a valid use case. Perhaps we could force a static size in that case, and force the number of processors used to be some constant in the config file, though.

@DrDaveD
Copy link
Contributor

DrDaveD commented Nov 22, 2023

Since this is going to require some discussion and we're not sure if this PR is the solution, please create a more general github issue describing the problem and referring to this PR as a potential solution. I will mark it with the 1.3.0 milestone so it will get discussed at the next community meeting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants