Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement cluster sidecar #53

Open
leoisl opened this issue Apr 19, 2022 · 3 comments
Open

Implement cluster sidecar #53

leoisl opened this issue Apr 19, 2022 · 3 comments

Comments

@leoisl
Copy link
Collaborator

leoisl commented Apr 19, 2022

See snakemake new feature: snakemake/snakemake#1397

This is a more efficient and better way to query for job statuses than bjobs <jobid>, but rather complex to implement. An implementation for slurm cluster can be found here: https://github.com/holtgrewe/snakemake-profiles-slurm/blob/slurm-sidecar/%7B%7Bcookiecutter.profile_name%7D%7D/slurm-sidecar.py and can be used as base

@mbhall88
Copy link
Member

mbhall88 commented May 24, 2022

I'm guessing the best way to implement this would be something like bjobs -o 'jobid stat' -noheader -a which outputs the status of all jobs for the user

Example

2510890 RUN
2509904 RUN
2637332 RUN
2637541 EXIT
2637554 EXIT
2637542 DONE
2637537 DONE
2637527 DONE
2637539 DONE

and then this just gets kept in a dict.

The Slurm profile polls this every 60 seconds.

The only thing we have to test out is whether there is a line limit to this. For example, if I have 1500 jobs running, do I get all of them listed? I assume so, but will need to test this.

@leoisl can you think of a better way of doing this?

One thing that might slow this down is if the job has disappeared from the bjobs menu and we have to go searching in the log... I wonder if speaking with systems could be useful here?

@mbhall88
Copy link
Member

Another option here is to watch the bjobs command and just change the interval from the default 2 seconds to something more reasonable like 30-60 seconds.

@mbhall88
Copy link
Member

Looking at the slurm profile status-checker and the snakemake docs it looks like the sidecar needs to start some kind of server. The sidecar should output a single line (in the case of a REST server, this line could be the port the server is listening on and any credentials). This line is subsequently provided to the --cluster-status and --cluster-cancel commands. The key is checking whether the environment variable SNAKEMAKE_CLUSTER_SIDECAR_VARS is set, and if so, checking the status by polling the server. See https://github.com/Snakemake-Profiles/slurm/blob/8ee65d648e502beba406059e2a2d026110d38b9a/%7B%7Bcookiecutter.profile_name%7D%7D/slurm-status.py#L56-L71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants