Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service rules on different nodes than their consumers #2446

Open
zmbc opened this issue Sep 15, 2023 · 0 comments
Open

Service rules on different nodes than their consumers #2446

zmbc opened this issue Sep 15, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@zmbc
Copy link

zmbc commented Sep 15, 2023

Is your feature request related to a problem? Please describe.

Snakemake service rules could be used for a lot of different things, as described in the docs:

This can for example be the socket of a database, a shared memory device, a ramdisk, and so on. It can even just be a dummy file, and access to the service might happen via a different channel (e.g. a local http port).

I can imagine running a database with this approach, and having multiple rules running in parallel (that are mainly compute-constrained) while each reading data from that database. Or even a Spark cluster: the main node is one service, the worker nodes are services that depend on the main node service, and then rules can depend on all those services and use Spark. However:

Snakemake will schedule the service with all consumers to the same physical node...

which entirely beats the point of both examples I just listed. I haven't been able to find an explanation in the docs, or in the PR that added this feature, that explains why this is desirable.

Snakemake models "services" as files, and also (usually) assumes that files are shared between nodes. So isn't it natural to assume that services would be available across nodes as well?

Describe the solution you'd like

Service rules are no longer scheduled in the same group as their consumers. Of course, they could still manually be added to groups, like any other rule.

To facilitate inter-node communication that isn't file-based, a service could somehow expose its IP address/port/URL to its consumers. One way to do this would be to store them in the service "file" instead of using an empty dummy file.

Describe alternatives you've considered

Though I haven't tried it, I think it's possible to hack around the Snakemake service rules feature completely to achieve this. You could define normal, non-service rules that watch for the creation of special dummy files, signaling that services are ready or that consumers are done using the service, and react when those are created. For example, see this similar exercise in Nextflow: https://github.com/JaneliaSciComp/nextflow-spark.

This feels like a lot of work, essentially re-creating the Snakemake feature from scratch in order to address one small limitation.

@zmbc zmbc added the enhancement New feature or request label Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant