New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 4383: add "mode switching" semantic model #4464
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
289654f
to
dff3dfb
Compare
``` | ||
type Selector struct { | ||
Label string | ||
Op SelectorOp // LT, LEQ, EQ, NEQ, GEQ, GT, IsSet, IsNotSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left out In
because that can be expressed via label == x || label == y
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using the names for comparison as in https://pkg.go.dev/github.com/blang/semver/v4... except that I should have used GTE instead of GEQ. Same with LTE instead of LEQ, NEQ -> NE - should have looked this up instead of writing it from memory.
We can also spell it out, if that's preferred. I wanted to keep it short because this isn't a user-facing API: a vendor generates this (somehow), the semantic model receives it.
// ClaimParameters are the parameters from the time that the claim was | ||
// allocated. Some of the fields were used by the counter controller to | ||
// allocated resources, others can be arbitrary setup parameters. | ||
ClaimParameters runtime.RawExtension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a discussion over in #4384 about using RawExtension
. Here it is unavoidable because the type of the claim parameters is the unknown vendor CRD. Same for the class parameters.
components: | ||
- kube-scheduler | ||
# kube-apiserver, kubelet and kube-controller-manager are agnostic to | ||
# which numeric model is being used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which might not be true, depending on #4384 (comment). We need to conclude that discussion and then update here.
Value string | ||
And []Selector | ||
Or []Selector | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thockin: in https://github.com/kubernetes/enhancements/pull/4384/files#r1473463075 you wrote
Encoding an expression-grammar in YAML is kind of disgusting.
So metav1.LabelSelector
is disgusting? It does exactly that. This Selector
is conceptually very similar, just with additional operations, type support, and nested expression. Okay, quite a bit more complex... But remember that this is for DRA driver vendors, not end-users.
I'm open for suggestions if there is a better way to do this while preserving versioning support in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So metav1.LabelSelector is disgusting?
Yes, it is :) It's mitigated by the limited scope (no compound booleans, limited operators), but I feel that we should have just made it an expression (we didn't have CEL at the time, but still).
selector: In(self["example.com/my-key"], ["foo", "bar"])
vs:
selector:
matchExpressions:
- key: "example.com/my-key"
operator: NotIn
values:
- foo
- bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I'm sold on the idea of using CEL expressions instead.
The label selector is matched against a set of label name and value pairs. In | ||
the mode switching model, this set is the combination of labels specified for | ||
the instance and the labels specified for each mode, with labels of a mode | ||
overwriting the ones of the instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also make the label selector available as a stand-alone semantic model. Then it can be combined with #4382.
This is not in the KEP yet, but could be added.
Is this still a "draft"? The enhancements freeze is approaching. |
|
||
### Goals | ||
|
||
* Enable DRA GPU drivers to support cluster autoscaling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is not just about autoscaling.
A semantic model allows us to do faster scheduling.
It will also allow to do bulk calculations for a set of pods, like Kueue's quota system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left "faster scheduling" out of the goals because a) it's not relevant for the use cases that DRA is being considered for and b) if we make it a goal, we would have to measure whether it has been achieved.
Because we don't have a workload to measure with and no customer requirements to meet certain scheduling throughput or latency targets, we wouldn't be able to decide whether the goal has been reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can call it standalone scheduling
, in the sense that kube-scheduler doesn't need to communicate with any other component to make a scheduling decision.
dynamic resource scheduler plugin if (and only if) the model is enabled. | ||
|
||
The semantic model uses a label selector for selecting instances and modes that | ||
is more capable than `metav1.LabelSelector`. For example, semantic version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not enhance the existing node selectors?
semantic version awareness seems like something universally useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because extending metav1.LabelSelector
is modifying other existing GA APIs. That could be solved by being extremely careful about where and when to allow new operators, but is probably not something that should be attempted immediately when we don't even know whether this new selector proposal is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not much in favor with the idea of having more than one way to express the same thing.
Is there anything else, other than semantic versioning, that this API allows that the existing node selectors doesn't support?
They already support OR (nodeSelectorTerms) and AND (matchExpressions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a similar concern about Karpenter - the EC2NodeClass API has “selectors” that are not Kubernetes selectors, they are actually AWS EC2 filters (which can match on tags
, whereas Kubernetes doesn't have tags).
However, EC2NodeClass is vendor code and not part of the Karpenter core that the Kubernetes project has adopted. So, it's not nearly as much as a problem; we don't expect vendors to follow our guidelines, though it's nice when they do.
Adding a new comparison operator would OK by me (eg SemVerGe
to compare to strings as semantic versions), because we already have some selectors that support numeric comparison and others that don't. I don't like the idea of adding type
into selectors; right now, the operator infers a type (eg, Lt
infers Quantity). We should stick with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything else, other than semantic versioning, that this API allows that the existing node selectors doesn't support?
Deeply nested expressions. With this grammar one can do a == 1 || (b > 100 && (c != 42 || d == 1000) )
.
I don't like the idea of adding type into selectors; right now, the operator infers a type (eg, Lt infers Quantity).
We need to implement ==
for strings, quantity, and semver. I can see us using Eq" for semver,
==` for quantity, and "Is" for strings.
But what do we then use for '<' for strings?
None of this is an issue when defining the type separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using a CEL expression stored inside a string.
``` | ||
|
||
|
||
### Capacity type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these standalone API objects? what is the correspondence with Nodes? 1:1? 1:n?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these standalone API objects?
No, they get embedded in a resource.k8s.io/NodeResourceCapacity
API object. There was some back-and-forth in #4384 on how to do that ("strongly typed" vs. "RawExtension"), but the consensus now is that there will be a pointer to this type in a one-of-many union type ("strongly typed").
Each node can have multiple instances where each instance is associated with one Capacity
struct that describes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify in the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do once #4384 is a bit closer to merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh gosh... I thought I was reading the one and only KEP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alculquicondor This KEP is now obsolete. I've been working on an replacement to this model that will be added to the other KEP you have been reviewing in the next day or to.
@pohly let's go ahead and close this KEP to avoid confusion.
// Count is the total number of items in a certain mode. | ||
Count int64 | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a yaml example would be useful to visualize how this can look like.
|
||
### Allocation result | ||
|
||
The following object gets serialized as JSON and stored in a claim's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by whom?
Does this impact scheduling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is out-dated. Because we decided to do "strong typing", the serialization will be done indirectly after embedding a pointer to this AllocationResult
struct into the ResourceClaim status. Encoding then happens in the bind goroutine when the scheduler does the API call which updates the claim status.
In the sense that this was the initial version. I removed it from the PR title. But it is still very tentative and I keep asking @klueska to look at it because of the deadline. |
participating-sigs: | ||
- sig-scheduling | ||
- sig-autoscaling | ||
status: implementable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest making this provisional for now, and then bumping after the initial merge.
<!-- | ||
What other approaches did you consider, and why did you rule them out? These do | ||
not need to be as detailed as the proposal, but should include enough | ||
information to express the idea and why it was not acceptable. | ||
--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use CEL to define an expression that evaluates down to the value we want? If we do, is that helpful; if we couldn't or don't want that, why is it worse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use CEL. I like this better than building an improved LabelSelector
grammar.
dynamic resource scheduler plugin if (and only if) the model is enabled. | ||
|
||
The semantic model uses a label selector for selecting instances and modes that | ||
is more capable than `metav1.LabelSelector`. For example, semantic version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a similar concern about Karpenter - the EC2NodeClass API has “selectors” that are not Kubernetes selectors, they are actually AWS EC2 filters (which can match on tags
, whereas Kubernetes doesn't have tags).
However, EC2NodeClass is vendor code and not part of the Karpenter core that the Kubernetes project has adopted. So, it's not nearly as much as a problem; we don't expect vendors to follow our guidelines, though it's nice when they do.
Adding a new comparison operator would OK by me (eg SemVerGe
to compare to strings as semantic versions), because we already have some selectors that support numeric comparison and others that don't. I don't like the idea of adding type
into selectors; right now, the operator infers a type (eg, Lt
infers Quantity). We should stick with that.
selector: | ||
label: capabilityFoo | ||
op: IsSet | ||
or: | ||
- and: | ||
- label: capabilityBar | ||
op: IsNotSet | ||
- label: name | ||
op: GE | ||
value: a | ||
- label: memory | ||
op: LT | ||
value: 20Gi | ||
type: Quantity | ||
- label: driver | ||
op: GT | ||
value: v1.0.0 | ||
type: Semver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selector: | |
label: capabilityFoo | |
op: IsSet | |
or: | |
- and: | |
- label: capabilityBar | |
op: IsNotSet | |
- label: name | |
op: GE | |
value: a | |
- label: memory | |
op: LT | |
value: 20Gi | |
type: Quantity | |
- label: driver | |
op: GT | |
value: v1.0.0 | |
type: Semver | |
resourceSelectorTerms: | |
- matchExpressions: | |
- key: features.vendor.example/capability-foo | |
operator: Exists | |
- matchExpressions: | |
- key: features.vendor.example/capability-bar | |
operator: NotExists | |
- key: vendor.example/xpu-memory | |
operator: Lt | |
values: ["20Gi"] | |
- key: vendor.example/driver-version | |
operator: SemVerGe | |
values: ["v1.0.0"] |
One-line PR description: initial draft
Issue link: DRA: semantic model: mode switching #4383