New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: issue with aggregated checkpoint output on GLS #2108
Conversation
Please format your code with black: |
Kudos, SonarCloud Quality Gate passed! |
hmm looks like the tests passed So at first glance, it seems like we might have failed to reproduce the issue. But then I looked through the logs and couldn't find any sign that our tests were executed. @johanneskoester, why is this line commented out on main? That seems to be the reason it didn't run. snakemake/.github/workflows/main.yml Line 231 in 2a4b9b6
|
Looking at the comments in #574, I believe @vsoch's proposed solution to check for the remote prefix in |
It has been a while since I started digging into this, and I think I've come up with a candidate solution. To use a checkpoint's output we use this function: snakemake/snakemake/checkpoints.py Line 25 in bdde680
Which returns the checkpoint's output for a set of wildcards if it has been produced. However, the function does not check if the checkpoint output exists remotely, and this is why when the DAG is built on the cloud node, it includes the checkpoint before the target job. My proposed solution is to check if the checkpoint output exists remotely and if so download it, thus allowing the workflow to be run only using the target rule. Proposed solution: def get(self, **wildcards):
missing = self.rule.wildcard_names.difference(wildcards.keys())
if missing:
raise WorkflowError(
"Missing wildcard values for {}".format(", ".join(missing))
)
output, _ = self.rule.expand_output(wildcards)
if self.checkpoints.future_output is not None:
for iofile in output:
if iofile.exists_remote and not iofile.exists_local:
iofile.download_from_remote()
return CheckpointJob(self.rule, output)
if iofile in self.checkpoints.future_output:
break
if not iofile.exists and not iofile.is_temp:
break
else:
return CheckpointJob(self.rule, output)
raise IncompleteCheckpointException(self.rule, checkpoint_target(output[0])) I have tested this locally by keeping the checkpoint output files remotely and it seems to work well. Would appreciate any feedback @aryarm @johanneskoester. Let me know if I can provide any more info or clarity! I will commit this change to this PR soon. |
Sure thing! Thanks for taking all of this on, @cademirch . Looking forward to seeing the fix! |
resolves #2021
Description
PR #1294 (from a year ago) added a simple test for checkpoints on the cloud.
This PR adds tests for a more complicated example where the output of the checkpoint is now aggregated over wildcards. In this case, we are trying to reproduce the behavior in #2021, in which Snakemake might be
Note that this PR does not yet contain a fix for the strange behavior. Our hope is just to reproduce things with the test suite at the moment. As noted in #574 (comment), it can be difficult to debug issues on the cloud because Snakemake redeploys a different version of itself in each cloud instance. But labeling this PR with
update-container-image
allows us to deploy a temporary version of the code from this PR.QC
docs/
) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).The docs do not need updating in this case, since this PR simply adds tests and fixes a bug.