New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression after 7.7.0 until 7.15.X: rules downstream of a checkpoint aggregation rule keep being re-executed even when all output files exist and nothing upstream has changed #1818
Comments
Hi @johanneskoester and Snakemake team - sorry to bug, do you have any word on this? It's a bug that is pretty impacting, that snakemake is always re-executing (sometimes time-consuming) rules downstream of a |
#1694 I believe this is connected to the comment I made just now. I suspect the new re-run behavior is being inappropriately triggered, perhaps by rules with input/output/params that are functions. I am not sure how the behavior is implemented, but I wonder -- if one wrote a rule that used the current time of day function (or a random number, etc) as an input, would reruns always occur? |
I was able to dig into this a bit more: When I run "--list-params-changes" I always get the same result: Snakemake always detects params changes in All of these rules all the time. I think it has to do with setting the java options, which I am doing dynamically.
And then rules that call this function are always "changing" rule gatk_split_n_cigar_reads: |
Looking at the simpler complete example in the OP might help better troubleshoot, not sure why it keeps saying
|
Using the provided reprex, I confirmed that this was introduced in 7.8.0:
I purposefully tested the boundary of 7.8.1 and 7.8.2 because according to the changelog, the fix c634b78 from PR #1704 was intended to address this issue of re-running rules downstream of a checkpoint. |
Hi, On top of that, I found that Snakemake does not correctly list the files that should be updated, despite indicating the reason for rerun is a change in input files.
EDIT: Using
|
Sorry to complain... hope this will get fixed at some point soon because it's a pretty major regression. Causing me issues currently because I have a very large snakemake workflow where rules downstream of a checkpoint aggregation are computationally expensive to run and they keep getting re-executed even though they shouldn't. |
I list |
I've had a nasty surprise from this bug, too, unfortunately. I have a very large workflow which I've been running piecemeal with |
I don't have a workaround to this yet other than commenting things out which of course doesn't work in every case when you are developing a workflow, and I have a large workflow with three checkpoints! So this bug is really causing me trouble. |
@hermidalc The workaround is to avoid using checkpoints for now. For example, checkpoints can be converted into rules which have an additional output containing a list of files they had produced, and downstream rules can use that list of files as input. It is hacky, and not the best solution, but the alternative is to fix the bug ourselves (I've tried and failed, the source code is a little too complex for the amount of free time I have). |
@hermidalc What about pinning Snakemake to version 7.7.0? Is there some important feature in a more recent release that you need/want to use? If you've got 3 checkpoint rules in a pipeline, updating doesn't seem like it would be worth it to me (I've only got 1 checkpoint rule, but I'm not planning on updating any time soon). |
Thanks @jdblischak for providing the last working version, I was wondering which one it was and hadn't browsed the CHANGELOG yet. Yeah I will downgrade instead of doing @ning-y recommendation since requires not code changes, though @ning-y it was a good one as a workaround. |
Actually I realized this isn't such an easy decision. Version 7.8.0 also introduced another feature that I love: Freezing environments to exactly pinned packages. So far I've been lucky in that the pipelines in which I've pinned the conda envs haven't relied on checkpoint rules. It's unfortunate that this new feature was introduced in the same release as this bug. I don't want to have to choose! |
I just read through this didn't know of the feature... though can't initially understand, is this feature providing architecture specific pinning? Because can't one already specify package version/build requirements and pinning in the regular conda env yaml files that rules use? I already am doing that. I can understand the feature add to give a way to provide arch specific envs that are seamless to the user at deployment. For me this isn't an immediate problem as I'm not generally writing workflows for automatic deployment yet, only for publishing easy to install software. So every project I've done with conda I test it on both Linux and Mac to make sure all ok with a single env yaml spec, or I provide two yamls one for Linux and one for Mac if this is not possible and in the README tell people to install the right one for their arch. |
It saves the environment using conda's explicit specification file format. This has 2 big implications:
The explicit specification is architecture specific. Snakemake only uses these pinned envs if they match the current architecture, otherwise it falls back to the standard conda environment files. In other words, if you pin the conda envs for linux-64, and someone executes the pipeline on macOS, there won't be a problem. Here's a minimal example I created that you can explore: https://github.com/jdblischak/smk-simple-slurm/tree/main/examples/pinned-conda-envs |
@jdblischak @ning-y I've been using 7.7.0 and thank you for the suggestion. But it still has issues with re-running rules downstream of a checkpoint aggregation rule that just ran successfully and it has the files for. But with 7.7.0 it does the OP example correctly and doesn't re-run things. I've checked my checkpoint and downstream rule code and to the best of my knowledge it's designed correctly, so I'm at a loss. The DAG it produces and files it generates is correct so can't figure out why it's trying to re-run these rules when the files are there and were just created. In the large workflow I'm currently working on it's still wanting to re-run rules downstream of a checkpoint. It also gives me zero explanation as to why, it doesn't even show a message that there are any input/params/code changes and again the I'm really against getting rid of my checkpoints and doing the output file list rule approach. I know this sounds harsh, but Snakemake is at major version 7, so it really shouldn't have such a important and fundamental issue with how it works, it's very frustrating. I am a python developer, but honestly right now would take me a long time to get familiar with the codebase and fix it myself. Something so important should be fixed by the core devs and it should be made a priority. |
@cclienti and @timtroendle since you developed the post-7.7.0 functionality, which I apologize to say there is a major bug it doesn't seem to work properly with checkpoints, or maybe a commit after broke this, but do you have any idea what's broken? |
Sorry my bad @jdblischak @ning-y, with 7.7.0 I figured what was going on, timestamps... timestamps... timestamps! Since I've been developing this large workflow and during the process only re-running parts of it, then timestamps for steps in the DAG weren't in the order they should be as if I ran the entire workflow from scratch. But still with post-7.7.0 the OP issue still exists, regardless of timestamps or anything else, so it's a major issue. But at least when I publish my software and codebase for others to use with 7.7.0 things will be ok. |
I just see now the very useful |
@hermidalc I'm glad you were able to find at least a partial solution. While I agree that this bug is particularly frustrating, it may not be reasonable to expect a fix so soon. As far as I know, snakemake is an open-source project maintained by other scientists, researchers, and teachers; they have many formal responsibilities to be busy with, very few of which (if any) depend on maintaining this software. As much as you or I lack time to learn the codebase, I suspect they too lack the time to tackle this bug. So it goes, I guess. |
All open-source projects have a group of core devs that are maintaining the core, controlling what features get developed and included and all pull requests... i.e. basically driving things, it doesn't truly work the way you are stating where the entire project is completely maintained and developed only by the wider community, otherwise design, vision, features would become a hot mess. And this is an issue with core snakemake, not a specific additional more or less self-contained feature. |
And @ning-y sorry for you to be in the firing line, but look at the https://snakemake.readthedocs.io/en/stable/project_info/history.html, there are version updates quite often and I would argue the core devs (not random people in the wider community with pull requests) are spending cycles and making commits for some things that aren't regressions as fundamentally broken as this. Regressions to core functionality are big deals in soft dev. The need to use checkpoints are actually quite common in our work (I'm an computational biologist/bioinformatician too) and I would argue are as much part of core snakemake as a rule is. |
@hermidalc My view is that most scientific software operates in a system which does not reward maintenance, especially for purely academic projects like Snakemake (c.f. Rstudio). Furthermore, developers of these software are often swamped with larger responsibilities such as research or teaching which manifest formal and harsher penalties if ignored. Faced with these pressures, I think it is reasonable that many bugs are left unfixed; or that easier to tackle issues are repeatedly addressed over complex ones. Anyways, I think we are getting off-topic. We can agree to disagree, but again I feel your frustration. All my projects depend on Snakemake. To steer us back, have you tried using 7.7.0 but specifying all the available rerun triggers? I think you wouldn't have to give up checkpoints then, and you would still get the benefit of updated code rerun triggers, etc. |
Yes thank you, 7.7.0 does work properly with checkpoints it was the touch/timestamp issue and my earlier comment on it can be disregarded. It was my bad I've been developing a larger workflow so when working on parts and testing them timestamps easily get messed up. |
Hey folks. Sorry for not seeing this issue before. Also thanks a lot for your understanding. I get so many github notifications that I cannot possibly see through all of them. Instead, what you can always do is to ping me via discord if an issue is particularly pressing. I'll have a look at the minimal example ASAP. In general, if you want to speed up or get more attention, one option is always to create a PR with the minimal example. This way I already have a skeleton for debugging. |
But in this case, this is not needed as I am looking into this already. |
Ok, problem identified. If have to catch a train now, but I'll be able to fix it likely tomorrow. |
Thank you @johanneskoester! I wasn't in a big rush because could downgrade to 7.7.0. With the latest snakemake I noticed checkpoints and DAG resolution is much faster compared to 7.7.0, likely due to some performance improvements commits, and in my current project I'm processing almost 100k files from a checkpoint, so will be happy to upgrade once the new version is out on bioconda |
Thanks for the minimal example and the exploration work all of you already did. This is now fixed in PR #1907! |
fixes issue #1818 ### Description <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).
This seems to be an issue also on 7.8.5 and it remained when I tried to use 6* (which I do not prefer due to other options that 7* offers) |
@Aiswarya-prasad This Issue describes a bug that was introduced in Snakemake 7.8.0 and was fixed in 7.16.0. Thus it is expected that the bug exists in 7.8.5. Your options are to either upgrade to 7.16.0+ (recommended) or downgrade to 7.7.0. |
Snakemake version
7.15.1
Describe the bug
Rules downstream of a
checkpoint
aggregation rule are always re-executed when both aggregation and downstream rules have already been run and output files exist.Minimal example
Using the docs example and adding two rules downstream of
aggregate
. It keeps re-running theprocess
andprocess2
rules even though output files exists and nothing upstream had changed. It doesn't matter if you have one rule afteraggregate
or multiple, it keeps re-running all of them even when the outputs exist and nothing upstream has changed. If you remove the downstream rules and go back to the exact docs code, things work as expected.The text was updated successfully, but these errors were encountered: