|
2 | 2 |
|
3 | 3 | **⚠️This repo is still under ACTIVE development.**
|
4 | 4 |
|
| 5 | +## Python Version Compatibility |
| 6 | +python version `>= 3.9` |
| 7 | + |
5 | 8 | ## Huh?
|
6 |
| -- **Slurm** is a robust open-source workload manager designed for high-performance computing clusters. It efficiently allocates resources, manages job submissions, and optimizes task execution. With commands like `sbatch` and `squeue`, Slurm provides a flexible and scalable solution for seamless task control and monitoring, making it a preferred choice in academic and research settings. Various research centers and universities have unique names for their Slurm clusters. At the University of Queensland, our clusters go by the distinctive name "Bunya." |
| 9 | +**Slurm** is a robust open-source workload manager designed for high-performance computing clusters. It efficiently allocates resources, manages job submissions, and optimizes task execution. With commands like `sbatch` and `squeue`, Slurm provides a flexible and scalable solution for seamless task control and monitoring, making it a preferred choice in academic and research settings. Various research centers and universities have unique names for their Slurm clusters. At the University of Queensland, our clusters go by the distinctive name "Bunya." |
7 | 10 |
|
8 | 11 | ## SlurmWatch
|
9 | 12 |
|
10 | 13 | Introducing **SlurmWatch** - a tool meticulously crafted for effortless monitoring of sbatch jobs. Say goodbye to uncertainties; experience prompt notifications, ensuring you stay informed and in control.
|
11 | 14 |
|
12 |
| -## Scheduling |
| 15 | +### Current Capabilities |
| 16 | + |
| 17 | +- monitor a single user's (the user signed in) Slurm job(s) -> `src/my_jobs.py` |
| 18 | +- monitor multiple users' Slurm GPU job(s) -> `src/gpu_jobs.py` |
| 19 | +- monitor resource(GPU) usage of multiple FileSet(s) -> `src/quota.py` |
| 20 | +- monitor resource(Nodes) availability -> `src/available_nodes.py` |
| 21 | + |
| 22 | +### Scheduling |
13 | 23 |
|
14 | 24 | - For the moment, you can fork it, or just clone it and use crontab to run `monitor.py`
|
15 | 25 | - Follow the `dot_env_template` to create your own `.env` file
|
16 | 26 | - then do `crontab -e`
|
17 |
| -- and add `* * * * * your-python-path complete-file-path-to-monitor.py` to your cronjob |
18 |
| - - for example, `* * * * * ~/anaconda3/bin/python /scratch/user/your-username/bunya_jobs/monitor.py` |
19 |
| -- then your jobs will be monitored at an 1 minute interval |
20 |
| -- if you wish to have a different interval, check this [page](https://www.atatus.com/tools/cron). |
| 27 | +- and add a schedule of your preference |
| 28 | + - for example, `* * * * * ~/anaconda3/bin/python /scratch/user/your-username/SlurmWatch/src/quota.py` |
| 29 | +- to choose a schedule of your preference, check this helpful [crontab expression page](https://www.atatus.com/tools/cron). |
21 | 30 |
|
22 |
| -## Slack Integration |
| 31 | +### Integration |
| 32 | + |
| 33 | +#### Slack |
23 | 34 |
|
24 | 35 | - follow [slack webhook tutorial](https://api.slack.com/messaging/webhooks) to create a slack app for your slack workspace and add it to appropriate channels
|
25 | 36 | - remember to replace the `.env` webhook to your own
|
26 | 37 |
|
27 |
| -## Future Features |
28 |
| -- notification when job status change |
29 |
| -- enable capability to monitor multiple users jobs instead of the signed in user |
30 |
| -- flexible configuration |
31 |
| -- adding a debug mode |
32 |
| - |
33 |
| -## Future Integrations |
| 38 | +### Future Features & Integrations |
34 | 39 |
|
35 | 40 | Currently, the future integrations considered are
|
36 | 41 | - email
|
|
0 commit comments