Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Don't handle the same job output error twice when --keep-going is set #1311

Closed
wants to merge 1 commit into from

Conversation

cjops
Copy link
Contributor

@cjops cjops commented Dec 20, 2021

Description

Hoping I understand Snakemake's scheduler loop logic correctly. This should fix #1244.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

@sonarcloud
Copy link

sonarcloud bot commented Dec 20, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@iamh2o
Copy link
Contributor

iamh2o commented Jan 14, 2022

++ support to adopting this fix ASAP. I am going to pull it and give it a whirl right now b/c the seriousness of the impact on operations has reached a point workarounds are failing adI was exploring next options when happily noted this in my inbox. For more words, see here. Will report back- i can test BIG jobs running via qsub and drmaaa on uge.

@cstenkamp
Copy link

I seem to have the same issue as @iamh2o - would be great if this PR could find it's way into a Release ASAP!

@johanneskoester
Copy link
Contributor

Thanks a lot! I recently applied this modification as well, in addition to some other fixes in PR #1332.

@iamh2o
Copy link
Contributor

iamh2o commented Feb 28, 2022

So far so good with snakemake v7.0.1 -- 5K jobs and no crashes. I'll force the failure condition when I have some down time and that will be a best test I have (it just takes a fair amount of prep as a ton of jobs need to be in flight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError: checkpoint
4 participants