Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3063: DRA: 1.30 update #4181

Merged
merged 13 commits into from Feb 9, 2024
Merged

KEP-3063: DRA: 1.30 update #4181

merged 13 commits into from Feb 9, 2024

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Sep 5, 2023

  • One-line PR description: This updates the README to reflect what has been done and fills in sections that were left out earlier. The next milestone is 1.30 where DRA will remain in alpha.

  • Issue link: dynamic resource allocation #3063

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Sep 5, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 5, 2023
@pohly
Copy link
Contributor Author

pohly commented Sep 5, 2023

Some work remains for the 1.29 development cycle:

The KEP needs to be merged, then that work needs to be done and/or finished, and then before the actual beta graduation we need to do another review round to determine whether beta graduation criteria have been met.

@pohly pohly mentioned this pull request Sep 5, 2023
34 tasks
@alculquicondor
Copy link
Member

/sig scheduling
/sig autoscaling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. labels Sep 5, 2023
@alculquicondor
Copy link
Member

/cc @alculquicondor @mwielgus

@bart0sh
Copy link
Contributor

bart0sh commented Sep 6, 2023

/assign @bart0sh @klueska
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2023
@bart0sh bart0sh moved this from Triage to Needs Approver in SIG Node PR Triage Sep 6, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Sep 6, 2023

@alculquicondor
Copy link
Member

/hold
for all involved SIGs to review and approve.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 6, 2023
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 7, 2023
As discussed on Slack, scale down must determine whether some currently
running pods could get moved. This simulation depends on simulating
deallocation, otherwise the allocated claim prevents moving pods.
The discussion around autoscaling needs more time.
Technically the support for cluster autoscaling can be defined and implemented
as an extension of the core DRA, without changing the core feature. By
separating out the specification of "numeric parameters" into a separate KEP it
might be easier to make progress on the different aspects because they are
better separated.
kubernetes/kubernetes#121876 changed where the cluster
gets updated with blocking API calls.
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give another pass after @johnbelamaric gives LGTM

schedulable, like creating a claim that the pod needs or finishing the
allocation of a claim.

[Queuing hints](https://github.com/kubernetes/enhancements/issues/4247) are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI that queueing hints are disabled by default and are unlikely to reach a stable state in 1.30 to be re-enabled. In case this is crucial for the usability of DRA.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. For this KEP update, we will have to state the status of DRA w/o usage of queueing hints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not crucial. Performance is different, but DRA is usable also without that optimization. I know that you disagree, but all the use cases in https://docs.google.com/document/d/1XNkTobkyz-MyXhidhTp5RfbMsM-uRCWDoflUMqNcYTk/edit#heading=h.ljj9kaa144nr confirmed that scheduling performance is not a priority because of long-running pods.

I'll update the text to explain this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added something.

```
This is not possible with opaque parameters as described in this KEP. If a DRA
driver developer wants to support Cluster Autoscaler, they have to use numeric
parameters. Numeric parameters are an extension of this KEP that is defined in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should phrase this case as only being an enabler for cluster autoscaler. I see that KEP as an opportunity to simplify this KEP before it goes to beta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't fit into this "Autoscaler" section, so I added a new section for this thought under "Alternatives".

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduling (except for support CA) part looks good. A new nits.

schedulable, like creating a claim that the pod needs or finishing the
allocation of a claim.

[Queuing hints](https://github.com/kubernetes/enhancements/issues/4247) are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. For this KEP update, we will have to state the status of DRA w/o usage of queueing hints.

go through the backoff queue and the usually 5 second long delay associated
with that.

#### PreEnqueue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, PreEnqueue is likely to reach GA in 1.30.

@@ -1889,6 +1905,12 @@ At the moment, the claim plugin has no information that might enable it to
prioritize which resource to deallocate first. Future extensions of this KEP
might attempt to improve this.

This is currently using blocking API calls. They are unlikely because this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This is currently using blocking API calls. They are unlikely because this
This is currently using blocking API calls. It's quite rare because this

"Numeric parameters" are now called "semantic parameters" because they are not
just about numbers.
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
from sig scheduling
/hold for @johnbelamaric's LGTM


When pods fail to get scheduled, kube-scheduler reports that through events
and pod status. For DRA, that includes "waiting for resource driver to
provide information" (node not selected yet) and "waiting for resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these events/status updates include information about the specific claim/driver that is blocking progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When waiting for a claim, the claim is mentioned. This allows the user to drill down and check the claim.

When waiting for PodSchedulingContext information, that object is not mentioned explicitly. It doesn't need to be named because the name is the same as for the pod.

In both cases it is assumed that users understand the concepts enough to know what "claim" and "pod scheduling context" are.

@@ -2769,6 +2828,18 @@ Why should this KEP _not_ be implemented?

## Alternatives

### Semantic Parameters instead of PodSchedulingContext

When a DRA driver uses semantic parameters, there is no DRA driver controller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe in the latest cut this has shifted a little? In the semantic parameter version, there is still a driver controller but its responsibility is to evaluate the claim parameters and produce the driver-neutral semantic request. Since we are skipping CEL for this (for now at least...). But the following text on "we might be able to remove PodSchedulingContext" looks right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change the text into "there is no need for a DRA driver controller
which allocates the claim and no need for communication between scheduler and such a controller"

That leaves it open whether a controller is still needed for other purposes (will be defined in the "semantic parameters KEP" and may depend on whether users are allowed to create in-tree parameter objects directly) and just focuses on the aspect that is relevant here.

@@ -2894,6 +2965,52 @@ type ResourceDriverFeature struct {
}
```

### Complex sharing of ResourceClaim

At the moment, the allocation result marks as a claim as either "shareable" by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's something not quite right in that model. I wonder if the underlying semantic models can express this?

Also, there are different kinds of "shareable" - for example, I know of use cases where you want to mount the GPU devices in a monitoring container - or perhaps to access it via some config tools.

No change needed, just raising some things to think about post 1.30.

@johnbelamaric
Copy link
Member

Couple minor comments but overall LGTM. What we are saying here is:

  1. Main DRA stays mostly the same in 1.30 and stays in alpha.
  2. Semantic models are implemented via DRA: structured parameters #4381 in 1.30 alpha
  3. We learn from that whether we can simplify this KEP, removing PodSchedulingContext (perhaps with a new, different "escape hatch" that is part of semantic models).

@dchen1107
Copy link
Member

/lgtm from SIG Node perspective
/approve based on @johnbelamaric @alculquicondor @Huang-Wei's comments.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, dchen1107, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2024
@johnbelamaric
Copy link
Member

/lgtm

I am root in this repo so I can't do approve without it going through (and I am not really a node approver)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 8, 2024
@mrunalp
Copy link
Contributor

mrunalp commented Feb 8, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 8, 2024
@k8s-ci-robot k8s-ci-robot merged commit 646c6c8 into kubernetes:master Feb 9, 2024
4 checks passed
SIG Node PR Triage automation moved this from Needs Reviewer to Done Feb 9, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet