You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Snakemake service rules could be used for a lot of different things, as described in the docs:
This can for example be the socket of a database, a shared memory device, a ramdisk, and so on. It can even just be a dummy file, and access to the service might happen via a different channel (e.g. a local http port).
I can imagine running a database with this approach, and having multiple rules running in parallel (that are mainly compute-constrained) while each reading data from that database. Or even a Spark cluster: the main node is one service, the worker nodes are services that depend on the main node service, and then rules can depend on all those services and use Spark. However:
Snakemake will schedule the service with all consumers to the same physical node...
which entirely beats the point of both examples I just listed. I haven't been able to find an explanation in the docs, or in the PR that added this feature, that explains why this is desirable.
Snakemake models "services" as files, and also (usually) assumes that files are shared between nodes. So isn't it natural to assume that services would be available across nodes as well?
Describe the solution you'd like
Service rules are no longer scheduled in the same group as their consumers. Of course, they could still manually be added to groups, like any other rule.
To facilitate inter-node communication that isn't file-based, a service could somehow expose its IP address/port/URL to its consumers. One way to do this would be to store them in the service "file" instead of using an empty dummy file.
Describe alternatives you've considered
Though I haven't tried it, I think it's possible to hack around the Snakemake service rules feature completely to achieve this. You could define normal, non-service rules that watch for the creation of special dummy files, signaling that services are ready or that consumers are done using the service, and react when those are created. For example, see this similar exercise in Nextflow: https://github.com/JaneliaSciComp/nextflow-spark.
This feels like a lot of work, essentially re-creating the Snakemake feature from scratch in order to address one small limitation.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Snakemake service rules could be used for a lot of different things, as described in the docs:
I can imagine running a database with this approach, and having multiple rules running in parallel (that are mainly compute-constrained) while each reading data from that database. Or even a Spark cluster: the main node is one service, the worker nodes are services that depend on the main node service, and then rules can depend on all those services and use Spark. However:
which entirely beats the point of both examples I just listed. I haven't been able to find an explanation in the docs, or in the PR that added this feature, that explains why this is desirable.
Snakemake models "services" as files, and also (usually) assumes that files are shared between nodes. So isn't it natural to assume that services would be available across nodes as well?
Describe the solution you'd like
Service rules are no longer scheduled in the same group as their consumers. Of course, they could still manually be added to groups, like any other rule.
To facilitate inter-node communication that isn't file-based, a service could somehow expose its IP address/port/URL to its consumers. One way to do this would be to store them in the service "file" instead of using an empty dummy file.
Describe alternatives you've considered
Though I haven't tried it, I think it's possible to hack around the Snakemake service rules feature completely to achieve this. You could define normal, non-service rules that watch for the creation of special dummy files, signaling that services are ready or that consumers are done using the service, and react when those are created. For example, see this similar exercise in Nextflow: https://github.com/JaneliaSciComp/nextflow-spark.
This feels like a lot of work, essentially re-creating the Snakemake feature from scratch in order to address one small limitation.
The text was updated successfully, but these errors were encountered: