New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make requeuing Strategy mutable #1934
Make requeuing Strategy mutable #1934
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
c8aa824
to
168a14d
Compare
/assign @yaroslava-serdiuk |
168a14d
to
a64ce8f
Compare
/assign @alculquicondor |
pkg/queue/cluster_queue_interface.go
Outdated
// to sort workloads. The function sorts workloads based on their priority. | ||
// When priorities are equal, it uses the workload's creation or eviction | ||
// time. | ||
func queueOrderingFunc(wo workload.Ordering) func(a, b *workload.Info) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
I guess this function could be moved to the _impl file, and not be passed as a parameter.
Also maybe the interface is no longer necessary (in favor of a plain pointer to the implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this function could be moved to the _impl file, and not be passed as a parameter.
Makes sense, done.
Also maybe the interface is no longer necessary (in favor of a plain pointer to the implementation).
No strong view here, let me try how this would look like, I will put in a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 493cc34
@@ -83,6 +85,7 @@ func newClusterQueueImpl( | |||
func (c *clusterQueueBase) Update(apiCQ *kueue.ClusterQueue) error { | |||
c.rwm.Lock() | |||
defer c.rwm.Unlock() | |||
c.queueingStrategy = apiCQ.Spec.QueueingStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we move all the inadmissible pods back to the heap when changing to StrictFIFO?
Maybe add a unit test? Or integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we move all the inadmissible pods back to the heap when changing to StrictFIFO?
This is already happening, because when the spec.queueingStrategy
is updated, then specUpdated=true
here
This in turn calls queueAllInadmissibleWorkloadsInCohort
within cohort here.
Maybe add a unit test? Or integration.
This is covered by the integration test. You can see this in this section. First, we assert ExpectPendingWorkloadsMetric(0, 1)
, meaning one inadmissible pending. After the change to StrictFIFO we assert ExpectPendingWorkloadsMetric(1, 0)
meaning it was moved to the heap.
As for the unit test I can try to add but the logic is scattered across clusterqueue_controller
: Update
and Reconcile
, but also manager and cache, so seems involving, and the integration test is needed anyway,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking
a64ce8f
to
f8d3b1d
Compare
f8d3b1d
to
88dfe6a
Compare
88dfe6a
to
0ab2501
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Just a nit
@@ -83,6 +85,7 @@ func newClusterQueueImpl( | |||
func (c *clusterQueueBase) Update(apiCQ *kueue.ClusterQueue) error { | |||
c.rwm.Lock() | |||
defer c.rwm.Unlock() | |||
c.queueingStrategy = apiCQ.Spec.QueueingStrategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
Could you update the API document as well?
// across the queues in this ClusterQueue. This field is immutable. |
/release-note-edit
|
1af7fbb
to
1cc4e3c
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
LGTM label has been added. Git tree hash: 36d2136e77d46e4e66ef27a86f9a04b6f3509a47
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, we need to re-generate API documentation as well.
Oh, it has already been regenerated. NVM |
* Make requeuingStrategy mutable * Rebase and comments * Drop the ClusterQueue interface * cleanup
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #1873
Special notes for your reviewer:
There seems to be that StrictFIFO misreports the number of pending workloads in the status and in metrics (the count flakes), when there is a blocking workload. I'm looking into this, I repeated the issue on main branch, independently of changes here. See repro test: #1936.
Does this PR introduce a user-facing change?