Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm does not implement --maxCores, --maxMemory, and --maxDisk correctly #4863

Open
adamnovak opened this issue Apr 8, 2024 · 0 comments
Open

Comments

@adamnovak
Copy link
Member

adamnovak commented Apr 8, 2024

Just like #2864 for Kubernetes, the Slurm batch system ends up using the base AbstractBatchSystem version of check_resource_request() and imposes the various maxes as per-job limits, rather than overall limits on everything given to the backing scheduler at once.

Since Slurm doesn't actually schedule based on disk space, and especially not based on disk space on shared filesystems you might be using for the workdir if you have really big files, you might need to be able to limit the max disk space used by in-flight jobs at the Toil level in order to run a workflow.

We should maybe move the buffering system from #4356 from the Kubernetes batch system to the base AbstractBatchSystem.

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-1537

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant