fix: merging of pipe groups when multiple rules are chained together via pipes #1173
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related to: #975 (and #349)
The problem
In versions of Snakemake up to (and including) v5.26.1, workflows where chained pipe rules get aggregated across their wildcards could be executed successfully. Here's an example.
However, in v5.27.0 (ie the next version), a change was introduced that led to the following failure.
The change in question?
As a result, the code that handles the grouping of piped jobs in
dag.py
no longer iterates over jobs in the same order:snakemake/snakemake/dag.py
Lines 1164 to 1167 in 2e2c63a
It turns out that the order really matters. If the jobs are not sorted topologically, the groups that are created for the jobs from each of rules
a
andb
will not get properly merged. Instead, the jobs will be in separate groups, resulting in a WorkflowError about conflicting groups.Unfortunately, the error happens somewhat non-deterministically. Since
self._needrun
is an unorderedset()
, the jobs it contains might appear in the right order occasionally just by chance.The proposed solution
This PR revises the
handle_pipes()
code to force jobs to be processed in BFS order, and thus, resolves #975.Feel free to let me know if there's a better way to accomplish this. For example, I also considered changing the
self._needrun
set into some sort of ordered set. But I don't want to break anything, and this seemed like the most straightforward solution.QC
test_pipes_multiple