Skip to content

Commit

Permalink
fix: added missing input files in reason.updated_input in dag.py
Browse files Browse the repository at this point in the history
Let's consider a 'A' rule that takes 'N' inputs from a 'B' rule with
different wildcards. The input function for the A rule requests output
files from 'B'.

If a first run has already generated the output of 'A' and 'B' and if
the input function of 'A' requests new input files from B not yet
generated, snakemake will neither generate missing 'B' output files
nor regenerate the 'A' output. However the list-input-changes is able
to list correctly the missing files.

The commit allows to generate missing 'B' outputs and regenerate the
'A' output.
  • Loading branch information
Christophe Clienti committed Mar 8, 2022
1 parent ef0475c commit 6771da4
Show file tree
Hide file tree
Showing 7 changed files with 27 additions and 3 deletions.
6 changes: 4 additions & 2 deletions docs/project_info/faq.rst
Expand Up @@ -48,7 +48,7 @@ For debugging such cases, Snakemake provides the command line flag ``--debug-dag

In addition, it is advisable to check whether certain intermediate files would be created by targetting them individually via the command line.

Finally, it is possible to constrain the rules that are considered for DAG creating via ``--allowed-rules``.
Finally, it is possible to constrain the rules that are considered for DAG creating via ``--allowed-rules``.
This way, you can easily check rule by rule if it does what you expect.
However, note that ``--allowed-rules`` is only meant for debugging.
A workflow should always work fine without it.
Expand Down Expand Up @@ -285,7 +285,7 @@ This will cause Snakemake to re-run all jobs of that rule and everything downstr
How should Snakefiles be formatted?
--------------------------------------

To ensure readability and consistency, you can format Snakefiles with our tool `snakefmt <https://github.com/snakemake/snakefmt>`_.
To ensure readability and consistency, you can format Snakefiles with our tool `snakefmt <https://github.com/snakemake/snakefmt>`_.

Python code gets formatted with `black <https://github.com/psf/black>`_ and Snakemake-specific blocks are formatted using similar principles (such as `PEP8 <https://www.python.org/dev/peps/pep-0008/>`_).

Expand Down Expand Up @@ -484,6 +484,8 @@ Snakemake has a kind of "lazy" policy about added input files if their modificat
Here, ``snakemake --list-input-changes`` returns the list of output files with changed input files, which is fed into ``-R`` to trigger a re-run.

It is worth mentioning that if the additional input files does not yet exist and can be found in outputs of another rules, Snakemake will correctly generate the missing dependencies and re-run the rule.


How do I trigger re-runs for rules with updated code or parameters?
-------------------------------------------------------------------
Expand Down
4 changes: 3 additions & 1 deletion snakemake/dag.py
Expand Up @@ -996,7 +996,9 @@ def update_needrun(job):
output_mintime_ = output_mintime.get(job)
if output_mintime_:
updated_input = [
f for f in job.input if f.exists and f.is_newer(output_mintime_)
f
for f in job.input
if (f.exists and f.is_newer(output_mintime_)) or (not f.exists)
]
reason.updated_input.update(updated_input)
if noinitreason and reason:
Expand Down
14 changes: 14 additions & 0 deletions tests/test_update_input/Snakefile
@@ -0,0 +1,14 @@
rule all:
input:
lambda wildcards: [rules.B.output[0].format(name=name)
for name in config.get("names", "john").split(",")]
output:
"A.txt"

run:
f = open(output[0], "w")
f.write(' '.join(input) + "\n")

rule B:
output:
touch("B-{name}.txt")
1 change: 1 addition & 0 deletions tests/test_update_input/expected-results/A.txt
@@ -0,0 +1 @@
B-john.txt B-doe.txt
Empty file.
Empty file.
5 changes: 5 additions & 0 deletions tests/tests.py
Expand Up @@ -1523,3 +1523,8 @@ def test_groupid_expand_cluster():
@skip_on_windows
def test_service_jobs():
run(dpath("test_service_jobs"), check_md5=False)


def test_update_input():
run(dpath("test_update_input"), cleanup=False, check_results=False)
run(dpath("test_update_input"), config={"names": "john,doe"})

0 comments on commit 6771da4

Please sign in to comment.