Service rules on different nodes than their consumers #2446

zmbc · 2023-09-15T23:01:58Z

Is your feature request related to a problem? Please describe.

Snakemake service rules could be used for a lot of different things, as described in the docs:

This can for example be the socket of a database, a shared memory device, a ramdisk, and so on. It can even just be a dummy file, and access to the service might happen via a different channel (e.g. a local http port).

I can imagine running a database with this approach, and having multiple rules running in parallel (that are mainly compute-constrained) while each reading data from that database. Or even a Spark cluster: the main node is one service, the worker nodes are services that depend on the main node service, and then rules can depend on all those services and use Spark. However:

Snakemake will schedule the service with all consumers to the same physical node...

which entirely beats the point of both examples I just listed. I haven't been able to find an explanation in the docs, or in the PR that added this feature, that explains why this is desirable.

Snakemake models "services" as files, and also (usually) assumes that files are shared between nodes. So isn't it natural to assume that services would be available across nodes as well?

Describe the solution you'd like

Service rules are no longer scheduled in the same group as their consumers. Of course, they could still manually be added to groups, like any other rule.

To facilitate inter-node communication that isn't file-based, a service could somehow expose its IP address/port/URL to its consumers. One way to do this would be to store them in the service "file" instead of using an empty dummy file.

Describe alternatives you've considered

Though I haven't tried it, I think it's possible to hack around the Snakemake service rules feature completely to achieve this. You could define normal, non-service rules that watch for the creation of special dummy files, signaling that services are ready or that consumers are done using the service, and react when those are created. For example, see this similar exercise in Nextflow: https://github.com/JaneliaSciComp/nextflow-spark.

This feels like a lot of work, essentially re-creating the Snakemake feature from scratch in order to address one small limitation.

zmbc added the enhancement New feature or request label Sep 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service rules on different nodes than their consumers #2446

Service rules on different nodes than their consumers #2446

zmbc commented Sep 15, 2023

Service rules on different nodes than their consumers #2446

Service rules on different nodes than their consumers #2446

Comments

zmbc commented Sep 15, 2023