Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make requeuing Strategy mutable #1934

Merged

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Apr 2, 2024

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #1873

Special notes for your reviewer:

There seems to be that StrictFIFO misreports the number of pending workloads in the status and in metrics (the count flakes), when there is a blocking workload. I'm looking into this, I repeated the issue on main branch, independently of changes here. See repro test: #1936.

Does this PR introduce a user-facing change?

Make ClusterQueue queueingStrategy field mutable. The field can be mutated while there are pending workloads.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 2, 2024
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 2, 2024
Copy link

netlify bot commented Apr 2, 2024

Deploy Preview for kubernetes-sigs-kueue ready!

Name Link
🔨 Latest commit 1cc4e3c
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/660edba7dc1bc300082e497e
😎 Deploy Preview https://deploy-preview-1934--kubernetes-sigs-kueue.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mimowo mimowo changed the title Make requeuing Strategy mutable Make requeuing Strategy mutable [WIP] Apr 2, 2024
@mimowo mimowo force-pushed the mutable-requeuing-strategy branch 2 times, most recently from c8aa824 to 168a14d Compare April 2, 2024 09:01
@mimowo mimowo changed the title Make requeuing Strategy mutable [WIP] Make requeuing Strategy mutable Apr 2, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Apr 2, 2024

/assign @yaroslava-serdiuk

@mimowo
Copy link
Contributor Author

mimowo commented Apr 3, 2024

/assign @alculquicondor

// to sort workloads. The function sorts workloads based on their priority.
// When priorities are equal, it uses the workload's creation or eviction
// time.
func queueOrderingFunc(wo workload.Ordering) func(a, b *workload.Info) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:
I guess this function could be moved to the _impl file, and not be passed as a parameter.

Also maybe the interface is no longer necessary (in favor of a plain pointer to the implementation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this function could be moved to the _impl file, and not be passed as a parameter.

Makes sense, done.

Also maybe the interface is no longer necessary (in favor of a plain pointer to the implementation).

No strong view here, let me try how this would look like, I will put in a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 493cc34

@@ -83,6 +85,7 @@ func newClusterQueueImpl(
func (c *clusterQueueBase) Update(apiCQ *kueue.ClusterQueue) error {
c.rwm.Lock()
defer c.rwm.Unlock()
c.queueingStrategy = apiCQ.Spec.QueueingStrategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we move all the inadmissible pods back to the heap when changing to StrictFIFO?

Maybe add a unit test? Or integration.

Copy link
Contributor Author

@mimowo mimowo Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we move all the inadmissible pods back to the heap when changing to StrictFIFO?

This is already happening, because when the spec.queueingStrategy is updated, then specUpdated=true here

This in turn calls queueAllInadmissibleWorkloadsInCohort within cohort here.

Maybe add a unit test? Or integration.

This is covered by the integration test. You can see this in this section. First, we assert ExpectPendingWorkloadsMetric(0, 1), meaning one inadmissible pending. After the change to StrictFIFO we assert ExpectPendingWorkloadsMetric(1, 0) meaning it was moved to the heap.

As for the unit test I can try to add but the logic is scattered across clusterqueue_controller: Update and Reconcile, but also manager and cache, so seems involving, and the integration test is needed anyway,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2024
@mimowo mimowo force-pushed the mutable-requeuing-strategy branch from a64ce8f to f8d3b1d Compare April 4, 2024 06:58
@mimowo mimowo force-pushed the mutable-requeuing-strategy branch from f8d3b1d to 88dfe6a Compare April 4, 2024 07:02
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 4, 2024
@mimowo mimowo force-pushed the mutable-requeuing-strategy branch from 88dfe6a to 0ab2501 Compare April 4, 2024 07:06
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

Just a nit

pkg/queue/cluster_queue.go Outdated Show resolved Hide resolved
@@ -83,6 +85,7 @@ func newClusterQueueImpl(
func (c *clusterQueueBase) Update(apiCQ *kueue.ClusterQueue) error {
c.rwm.Lock()
defer c.rwm.Unlock()
c.queueingStrategy = apiCQ.Spec.QueueingStrategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM
Could you update the API document as well?

// across the queues in this ClusterQueue. This field is immutable.

@tenzen-y
Copy link
Member

tenzen-y commented Apr 4, 2024

/release-note-edit

Make ClusterQueue queueingStrategy field mutable. The field can be mutated while there are pending workloads.

@mimowo mimowo force-pushed the mutable-requeuing-strategy branch from 1af7fbb to 1cc4e3c Compare April 4, 2024 16:56
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alculquicondor
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 4, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 36d2136e77d46e4e66ef27a86f9a04b6f3509a47

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, we need to re-generate API documentation as well.

@tenzen-y
Copy link
Member

tenzen-y commented Apr 4, 2024

Probably, we need to re-generate API documentation as well.

Oh, it has already been regenerated. NVM

@k8s-ci-robot k8s-ci-robot merged commit 3228c68 into kubernetes-sigs:main Apr 4, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone Apr 4, 2024
vsoch pushed a commit to researchapps/kueue that referenced this pull request Apr 18, 2024
* Make requeuingStrategy mutable

* Rebase and comments

* Drop the ClusterQueue interface

* cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make queueingStrategy mutable
5 participants