New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change in rerun behavior in 7.8.0 #1677
Comments
I suspect this is to do with the change of behaviour caused by #1663. Specifically, the change to rerun triggers, which are now This is frustrating! It isn’t a “backwards incompatible” change, so doesn’t technically warrant a major version bump, but it is a major change in default behaviour! |
I also do not like this change, and snakemake does unnecessary reruns because of apparent changes of the software stack. I tried it out yesterday and could not understand why it determined that the software stack changed, because I think it didn't- it must have been the conda environment that snakemake automatically builds but it is totally unclear to me why it determined everything needs to rerun. I agree this should not be the new default. |
My 2p- I can see why this change can be annoying and it's going to cause some headaches. However, I would say this is the "correct" behavior and it should be the default since changes in code and params are just as important as changes in input. |
I haven't tried it that much, but the "code changes" seem heavy handed where non-functional changes (renaming variable, deleting whitespace, etc) still would trigger a rerun. Input changing would logically affect all the way down the DAG, but some of the other changes may have more exceptions than cases. Probably from a "best practices for reproducibility" perspective this is correct, but who is perfect... |
@ASLeonard I think it's impossible for snakemake (or any workflow manager) to establish which changes are functional and which can be ignored so they went for the heavy-handed approach. I believe that If you are sure that changes are irrelevant you can use the
|
I hear you folks, indeed, this is quite a disruptive change. I tried to make it self-explanatory, in the sense that Snakemake should print a message at the end of a dryrun, saying this:
I also agree about cosmetic nature such changes often might have, in particular during development. On the other hand, I thought the conservative approach would be a reasonable default from a reproducibility perspective. Believe me, I have heard as many requests on having the current behavior than complaints I hear now. I personally would use it as follows: during heavy development of a workflow, I might stick to |
7.8.0
(and 7.6.2
) re-runs rules whose output files exist (unlike v 5.8.1
)
I hear you Johannes, this is indeed a tricky case, and good arguments for "both" sides. I agree with your judgement and recommendation, and you summarize it well - dev vs non-dev. The metadata cleanup is also a nice trick. Maybe your last reply could be a new FAQ related to the the new rerun-triggers as a general guideline. The snakemake message is actually clear, I think I simply missed it the first time I run 7.8. Thanks for the effort and this change which - in reproducibility terms - is totally useful but tougher in practise - As mentioned before, if someone does some small changes to the syntax like renaming variables, which doesnt hange the results but only the readability of the code, would that trigger a rerun? |
Also, I think when changing from snakemake < 7.8 to 7.8 and above, there shouldnt be any rerun because of the software stack and the other new triggers, only mtime should be checked as this was the only check before 7.8. Once this is done "once", the other triggers can then be checked and integrated. |
Actually, I have seen that the message I mentioned above was accidentally not shown in 7.8.0. I will fix this now in PR #1686 along with other things. Sorry for that, I guess this made it much more disruptive than intended, together with the other bugs fixed in that PR. We urgently need a test coverage checker for Snakemake, but this is not so trivial because of the multi process nature of Snakemake. I will make that a priority though. |
Yes, that could help with the transition, however, I currently don't have the capacity to work on such a corner case feature which will become obsolete after a few days or weeks. |
Is it really a corner case, when thousands of users switch to >=7.8 and experience the proposed reruns of their pipelines? It affects every use, doesnt it, when upgrading eventually? Many did not upgrade yet and are not even aware of this yet. I dont feel this is a corner case, quite the opposite actually :) |
Sorry for the wording, I did not mean to put your request down. I meant corner case in terms of the period of time where this applies, compared to the amount of work to implement something like that. We would need to somehow track whether this is the first 7.8 run or not, and change the logic based on that. And then, the question is what would really be the benefit, because after the first run you would still suddenly see the changed behavior. And this might then be quite unexpected for people because they will start complain that snakemake suddenly behaves different. Hence, one would need to document this change in behavior very boldly. But still my experience is that there will always be cases where such documentation (or also messages at runtime) is/are missed by people. So overall, I think it is not worth the effort. |
All good, I didnt interpret your reply in any negative way. I am just putting on purely the "user glasses" here, abstracting from anything else, effort in particular because I am not sure how difficult such a feature would actually be. So in short: How snakemake reacts when v7.8 is run for an existing (pre-7.8) .snakemake folder, the metadata for the software stack etc is not yet available, and the pipeline is restarted? Would it trigger anything because of that? |
missing metadata means no trigger at all. And yes, the behavior you have seen is likely just the bug. I am really angry on me that I did not detect that bug beforehand. I thought that I had tested it really carefully, but apparently I missed a pretty obvious case. I don't know how that happened, and it led to exactly the situation I wanted to avoid. |
Oh I see, then disregard all my comment and suggestions :). Dont worry, these things happen, particularly for a complex software just as snakemake. Once fixed, it will be ok. I was just confused because I didnt understand it was a bug when I first tested it and reported it. |
There is no way to keep both parties happy with an option like this. As much as I don't like it, the current default probably is best - i.e., the more conservative approach. |
Thanks for your understanding :-) |
There is now 7.8.1, which fixes many small glitches that occurred with the change. In particular, the software env trigger should now not lead to completely wrong false positives anymore. |
Closing this now, in favor of the summarizing issue #1694. |
I hit this new behaviour and opened a question on StackOverflow without knowing there was a discussion here: https://stackoverflow.com/questions/72508225/how-to-ignore-snakemakes-params-have-changed-since-last-execution?noredirect=1#comment128119555_72508225 Maybe someone could write a good answer over there explaining common workarounds? I didn't see the message mentioned by Johannes, maybe I never made it to the end of the dryrun section. It may be useful to add it to the top, too. I pipe dryrun into |
How to solve it? I find it is so hard to use when I update the snakemake |
Snakemake version
5.8.1
vs7.8.0
Describe the bug
Snakemake v
7.8.0
(also7.6.2
) tries to re-run rules whose output files already exist. The same (expected) behaviour is not observed on snakemake version 5.8.1`Logs
The commands below describe what I am observing
Minimal example
Just try to re-run a finished workflow using the two aforementioned runs. What I am observing is:
5.8.1
: expected behaviour i.e. rules whose output files already exist are not re-run7.8.0
: snakemake tries to re-run the whole workflow, even though output files already existAdditional context
The text was updated successfully, but these errors were encountered: