Skip to content

Commit d100df3

Browse files
authored
update readme (#8)
1 parent d8356ce commit d100df3

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

README.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,40 @@
22

33
**⚠️This repo is still under ACTIVE development.**
44

5+
## Python Version Compatibility
6+
python version `>= 3.9`
7+
58
## Huh?
6-
- **Slurm** is a robust open-source workload manager designed for high-performance computing clusters. It efficiently allocates resources, manages job submissions, and optimizes task execution. With commands like `sbatch` and `squeue`, Slurm provides a flexible and scalable solution for seamless task control and monitoring, making it a preferred choice in academic and research settings. Various research centers and universities have unique names for their Slurm clusters. At the University of Queensland, our clusters go by the distinctive name "Bunya."
9+
**Slurm** is a robust open-source workload manager designed for high-performance computing clusters. It efficiently allocates resources, manages job submissions, and optimizes task execution. With commands like `sbatch` and `squeue`, Slurm provides a flexible and scalable solution for seamless task control and monitoring, making it a preferred choice in academic and research settings. Various research centers and universities have unique names for their Slurm clusters. At the University of Queensland, our clusters go by the distinctive name "Bunya."
710

811
## SlurmWatch
912

1013
Introducing **SlurmWatch** - a tool meticulously crafted for effortless monitoring of sbatch jobs. Say goodbye to uncertainties; experience prompt notifications, ensuring you stay informed and in control.
1114

12-
## Scheduling
15+
### Current Capabilities
16+
17+
- monitor a single user's (the user signed in) Slurm job(s) -> `src/my_jobs.py`
18+
- monitor multiple users' Slurm GPU job(s) -> `src/gpu_jobs.py`
19+
- monitor resource(GPU) usage of multiple FileSet(s) -> `src/quota.py`
20+
- monitor resource(Nodes) availability -> `src/available_nodes.py`
21+
22+
### Scheduling
1323

1424
- For the moment, you can fork it, or just clone it and use crontab to run `monitor.py`
1525
- Follow the `dot_env_template` to create your own `.env` file
1626
- then do `crontab -e`
17-
- and add `* * * * * your-python-path complete-file-path-to-monitor.py` to your cronjob
18-
- for example, `* * * * * ~/anaconda3/bin/python /scratch/user/your-username/bunya_jobs/monitor.py`
19-
- then your jobs will be monitored at an 1 minute interval
20-
- if you wish to have a different interval, check this [page](https://www.atatus.com/tools/cron).
27+
- and add a schedule of your preference
28+
- for example, `* * * * * ~/anaconda3/bin/python /scratch/user/your-username/SlurmWatch/src/quota.py`
29+
- to choose a schedule of your preference, check this helpful [crontab expression page](https://www.atatus.com/tools/cron).
2130

22-
## Slack Integration
31+
### Integration
32+
33+
#### Slack
2334

2435
- follow [slack webhook tutorial](https://api.slack.com/messaging/webhooks) to create a slack app for your slack workspace and add it to appropriate channels
2536
- remember to replace the `.env` webhook to your own
2637

27-
## Future Features
28-
- notification when job status change
29-
- enable capability to monitor multiple users jobs instead of the signed in user
30-
- flexible configuration
31-
- adding a debug mode
32-
33-
## Future Integrations
38+
### Future Features & Integrations
3439

3540
Currently, the future integrations considered are
3641
- email

0 commit comments

Comments
 (0)