Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve job selection performance in case of potential ambiguity that is resolved by comprehensive ruleorder statements. #1147

Merged
merged 3 commits into from Sep 6, 2021

Conversation

johanneskoester
Copy link
Contributor

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

…y that is resolved by comprehensive ruleorder statements.
@johanneskoester
Copy link
Contributor Author

@dkuzminov you probably want to have a look at this?

@github-actions
Copy link
Contributor

Please format your code with black: black snakemake tests/*.py.

Copy link
Contributor

@dkuzminov dkuzminov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this improvement is incorrect. Assume there is a job primary that dominates all other jobs. Next, imagine that this rule is disregarded later because of missing input. We would get an incorrect is_strictly_ordered value because we know nothing about the ordering of the rest of the jobs. We would select the first producer (other than primary) even if it is ambiguous.

snakemake/dag.py Outdated
# check if all potential producers are strictly ordered
jobs = sorted(jobs, reverse=True)
primary_job = jobs[0]
is_strictly_ordered = all(primary_job > job for job in jobs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, as the first element shall be compared to the rest of the elements:

is_strictly_ordered = all(primary_job > job for job in jobs[1:])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@sonarcloud
Copy link

sonarcloud bot commented Aug 23, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@johanneskoester
Copy link
Contributor Author

johanneskoester commented Aug 23, 2021

Overall this improvement is incorrect. Assume there is a job primary that dominates all other jobs. Next, imagine that this rule is disregarded later because of missing input. We would get an incorrect is_strictly_ordered value because we know nothing about the ordering of the rest of the jobs. We would select the first producer (other than primary) even if it is ambiguous.

Right, I missed that case indeed. I have worked a bit on it. Still not 100% validated in my mind, but I think this is more correct (although it is 10pm here now, so I might miss something again ;-)).

@johanneskoester
Copy link
Contributor Author

I think it is fine.

@johanneskoester johanneskoester merged commit 921f4f7 into main Sep 6, 2021
@johanneskoester johanneskoester deleted the optimize-job-selection branch September 6, 2021 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants