Slurm: A Highly Scalable Workload Manager
-
Updated
May 17, 2024 - C
Slurm: A Highly Scalable Workload Manager
A DSL for data-driven computational pipelines
Machine Learning Engineering Open Book
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
A Slurm cluster using docker-compose
Prometheus exporter for performance metrics from Slurm.
Simplify HPC and Batch workloads on Azure
An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and support for E2E production ML pipelines when you're ready.
Tools for computation on batch systems
Slurm-Mail is a drop in replacement for Slurm's e-mails to give users much more information about their jobs compared to the standard Slurm e-mails.
Add a description, image, and links to the slurm topic page so that developers can more easily learn about it.
To associate your repository with the slurm topic, visit your repo's landing page and select "manage topics."