New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRA: scheduler event handlers via assume cache #124595
base: master
Are you sure you want to change the base?
DRA: scheduler event handlers via assume cache #124595
Conversation
This is a basic implementation of a first-in-first-out queue with unbounded size. It's useful for cases where a channel with fixed size might deadlock. The caller is responsible for locking.
Step simplifies using WithStep because it creates a local scope where the same tCtx variable is the one with the step name.
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
3cc6fe2
to
7d9abd5
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This enables using the assume cache for cluster events.
7d9abd5
to
2d66ba2
Compare
/retest |
This enables connecting the event handler for ResourceClaim to the assume cache, which addresses a theoretic race condition. It may also be useful for implementing the autoscaler support, because now the autoscaler can modify the content of the cache.
2d66ba2
to
0b0e8e3
Compare
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Events that make pods scheduleable were triggered by the informer cache, not the assume cache. For "claim was deallocated", this led to a small, unlikely race if a pod got scheduled and stopped so quickly that the informer cache didn't ever see the "claim is allocated" state. The event handler now reacts to changes in the assume cache because that cache is guaranteed to receive the "claim is allocated" state which cause some pod to not get scheduled, because by definition the cache must have listed some other claim as using resources needed for that pod.
Which issue(s) this PR fixes:
Fixes ##123698
Does this PR introduce a user-facing change?
/assign @kerthcet
Do you have time to review?
/cc @towca
This is related to the work that you are doing for the cluster autoscaler.