Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rules that are based on scripts ignore metadata for code-change trigger #2192

Open
nigiord opened this issue Mar 24, 2023 · 4 comments
Open
Labels
bug Something isn't working

Comments

@nigiord
Copy link
Contributor

nigiord commented Mar 24, 2023

Snakemake version

7.25.0

Describe the bug

When updating a script that is used in a rule, we get the following message from Snakemake :

Some jobs were triggered by provenance information, see 'reason' section in the rule displays above.
If you prefer that only modification time is used to determine whether a job shall be executed, use the command line option '--rerun-triggers mtime' (also see --help).
If you are sure that a change for a certain output file (say, <outfile>) won't change the result (e.g. because you just changed the formatting of a script or environment definition), you can also wipe its metadata to skip such a trigger via 'snakemake --cleanup-metadata <outfile>'.

But apparently, there is actually no way to really skip this trigger because cleaning up metadata is not sufficient (even removing the .snakemake directory entirely still triggers a "code changed"). This is because a script being older than output has priority over any metadata snakemake stores:

snakemake/snakemake/dag.py

Lines 1166 to 1169 in 5e5cb61

if "code" in self.workflow.rerun_triggers:
reason.code_changed = any(
job.outputs_older_than_script_or_notebook()
) or any(self.workflow.persistence.code_changed(job))

The message is thus rather confusing as there is nothing the user can do except to use the --rerun-triggers mtime option. This is really annoying when we changed real code in some parts of the pipeline, but only formatting in other parts, as we cannot select which rules will have to be re-run (since --rerun-triggers apply globally, contrary to --cleanup-metadata).

Logs

Minimal example

Snakefile

rule testrule:
    output:
        "testoutput.txt"
    script:
        "testscript.py"  # any script that produce testoutput.txt
snakemake --jobs 1  # create testoutput.txt
snakemake  # nothing to do
touch testscript.py
snakemake -n  # code has changed
snakemake --cleanup-metadata testoutput.txt
snakemake -n # still code has changed
rm -rf .snakemake
snakemake -n # still code has changed...

Additional context

@nigiord nigiord added the bug Something isn't working label Mar 24, 2023
@Hocnonsense
Copy link
Contributor

It seems that testrule regard the setscript.py as an input, and even with --rerun-triggers mtime, the output should be updated because now it may be out-of-dated.
So --rerun-triggers mtime should also ignore when script edited, or we should touch the output manually?

@nigiord
Copy link
Contributor Author

nigiord commented Mar 29, 2023

So --rerun-triggers mtime should also ignore when script edited, or we should touch the output manually?

When snakemake runs code directly written in the Snakefile (either through run: or shell:), the options --cleanup-metadata or --rerun-triggers mtime seems to work properly (the Snakefile that contains the actual code is not regarded as an input). I think it would be logical to expect the same behavior from script: rules (which is, without touching the output manually).

A basic fix would be to remove the job.outputs_older_than_script_or_notebook() test, but I’m not sure why it exists in the first place. Is it a relic of the pre-metadata period or is there a reason for which snakemake regards the scriptfile as input for such rules?

@Hocnonsense
Copy link
Contributor

A basic fix would be to remove the job.outputs_older_than_script_or_notebook() test, but I’m not sure why it exists in the first place.

This funciton get_outputs_with_changes were added 13 monthes ago by #1419.

For me, I always declare Rmd code file as input when I'm going to knit it to html, and I've never use script to use external code, instead, I'll just import a python package and never detect whether it is changed.
So I will agree to disable job.outputs_older_than_script_or_notebook when --rerun-triggers mtime, for clearly distinguish script from input.

@nigiord
Copy link
Contributor Author

nigiord commented Mar 29, 2023

After some tests, there’s no record of code changes for scripts and notebooks via the Persistence class in snakemake/persistence.py , hence no trace of script updates in the metadata. This is why they are handled by comparing script and output mtimes, and why cleaning-up metadata has no effect.

I tried to understand if there’s a technical limitation that impedes keeping track of script changes in metadata but it’s not trivial. I think we’ll need help from contributors who actually understand where and how metadata are generated.

If this cannot be fixed, it would be wise to edit the warning displayed after code changes, especially this part:

If you are sure that a change for a certain output file (say, <outfile>) won't change the result (e.g. because you just changed the formatting of a script or environment definition), you can also wipe its metadata to skip such a trigger via snakemake --cleanup-metadata <outfile>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants