KEP 4383: add "mode switching" semantic model #4464

pohly · 2024-01-31T15:15:20Z

One-line PR description: initial draft
Issue link: DRA: semantic model: mode switching #4383

k8s-ci-robot · 2024-01-31T15:15:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign deads2k, mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/prod-readiness/OWNERS
keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pohly · 2024-01-31T15:16:15Z

/assign @byako @klueska

pohly · 2024-01-31T15:41:33Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+```
+type Selector struct {
+    Label string
+    Op    SelectorOp // LT, LEQ, EQ, NEQ, GEQ, GT, IsSet, IsNotSet


I left out In because that can be expressed via label == x || label == y.

I am using the names for comparison as in https://pkg.go.dev/github.com/blang/semver/v4... except that I should have used GTE instead of GEQ. Same with LTE instead of LEQ, NEQ -> NE - should have looked this up instead of writing it from memory.

We can also spell it out, if that's preferred. I wanted to keep it short because this isn't a user-facing API: a vendor generates this (somehow), the semantic model receives it.

pohly · 2024-01-31T19:52:24Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+	// ClaimParameters are the parameters from the time that the claim was
+	// allocated. Some of the fields were used by the counter controller to
+	// allocated resources, others can be arbitrary setup parameters.
+	ClaimParameters runtime.RawExtension


There's a discussion over in #4384 about using RawExtension. Here it is unavoidable because the type of the claim parameters is the unknown vendor CRD. Same for the class parameters.

pohly · 2024-01-31T19:54:58Z

keps/sig-node/4383-dra-semantic-model-mode-switching/kep.yaml

+    components:
+      - kube-scheduler
+      # kube-apiserver, kubelet and kube-controller-manager are agnostic to
+      # which numeric model is being used.


Which might not be true, depending on #4384 (comment). We need to conclude that discussion and then update here.

pohly · 2024-02-01T09:22:20Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+    Value string
+    And   []Selector
+    Or    []Selector
+}


@thockin: in https://github.com/kubernetes/enhancements/pull/4384/files#r1473463075 you wrote

Encoding an expression-grammar in YAML is kind of disgusting.

So metav1.LabelSelector is disgusting? It does exactly that. This Selector is conceptually very similar, just with additional operations, type support, and nested expression. Okay, quite a bit more complex... But remember that this is for DRA driver vendors, not end-users.

I'm open for suggestions if there is a better way to do this while preserving versioning support in the API.

So metav1.LabelSelector is disgusting?

Yes, it is :) It's mitigated by the limited scope (no compound booleans, limited operators), but I feel that we should have just made it an expression (we didn't have CEL at the time, but still).

selector: In(self["example.com/my-key"], ["foo", "bar"])

vs:

selector: matchExpressions: - key: "example.com/my-key" operator: NotIn values: - foo - bar

Ack. I'm sold on the idea of using CEL expressions instead.

pohly · 2024-02-01T09:39:38Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+The label selector is matched against a set of label name and value pairs. In
+the mode switching model, this set is the combination of labels specified for
+the instance and the labels specified for each mode, with labels of a mode
+overwriting the ones of the instance.


We can also make the label selector available as a stand-alone semantic model. Then it can be combined with #4382.

This is not in the KEP yet, but could be added.

alculquicondor · 2024-02-01T22:39:03Z

Is this still a "draft"? The enhancements freeze is approaching.

alculquicondor · 2024-02-01T22:41:09Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+
+### Goals
+
+* Enable DRA GPU drivers to support cluster autoscaling.


Again, this is not just about autoscaling.
A semantic model allows us to do faster scheduling.
It will also allow to do bulk calculations for a set of pods, like Kueue's quota system.

I left "faster scheduling" out of the goals because a) it's not relevant for the use cases that DRA is being considered for and b) if we make it a goal, we would have to measure whether it has been achieved.

Because we don't have a workload to measure with and no customer requirements to meet certain scheduling throughput or latency targets, we wouldn't be able to decide whether the goal has been reached.

Maybe we can call it standalone scheduling, in the sense that kube-scheduler doesn't need to communicate with any other component to make a scheduling decision.

alculquicondor · 2024-02-01T22:44:39Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+dynamic resource scheduler plugin if (and only if) the model is enabled.
+
+The semantic model uses a label selector for selecting instances and modes that
+is more capable than `metav1.LabelSelector`. For example, semantic version


why not enhance the existing node selectors?

semantic version awareness seems like something universally useful.

Because extending metav1.LabelSelector is modifying other existing GA APIs. That could be solved by being extremely careful about where and when to allow new operators, but is probably not something that should be attempted immediately when we don't even know whether this new selector proposal is useful.

I'm not much in favor with the idea of having more than one way to express the same thing.

Is there anything else, other than semantic versioning, that this API allows that the existing node selectors doesn't support?
They already support OR (nodeSelectorTerms) and AND (matchExpressions).

I had a similar concern about Karpenter - the EC2NodeClass API has “selectors” that are not Kubernetes selectors, they are actually AWS EC2 filters (which can match on tags, whereas Kubernetes doesn't have tags).

However, EC2NodeClass is vendor code and not part of the Karpenter core that the Kubernetes project has adopted. So, it's not nearly as much as a problem; we don't expect vendors to follow our guidelines, though it's nice when they do.

Adding a new comparison operator would OK by me (eg SemVerGe to compare to strings as semantic versions), because we already have some selectors that support numeric comparison and others that don't. I don't like the idea of adding type into selectors; right now, the operator infers a type (eg, Lt infers Quantity). We should stick with that.

Is there anything else, other than semantic versioning, that this API allows that the existing node selectors doesn't support?

Deeply nested expressions. With this grammar one can do a == 1 || (b > 100 && (c != 42 || d == 1000) ).

I don't like the idea of adding type into selectors; right now, the operator infers a type (eg, Lt infers Quantity).

We need to implement == for strings, quantity, and semver. I can see us using Eq" for semver, ==` for quantity, and "Is" for strings.

But what do we then use for '<' for strings?

None of this is an issue when defining the type separately.

I like the idea of using a CEL expression stored inside a string.

alculquicondor · 2024-02-01T22:49:34Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+```
+
+
+### Capacity type


Are these standalone API objects? what is the correspondence with Nodes? 1:1? 1:n?

Are these standalone API objects?

No, they get embedded in a resource.k8s.io/NodeResourceCapacity API object. There was some back-and-forth in #4384 on how to do that ("strongly typed" vs. "RawExtension"), but the consensus now is that there will be a pointer to this type in a one-of-many union type ("strongly typed").

Each node can have multiple instances where each instance is associated with one Capacity struct that describes it.

Please clarify in the doc.

Will do once #4384 is a bit closer to merging.

oh gosh... I thought I was reading the one and only KEP.

@alculquicondor This KEP is now obsolete. I've been working on an replacement to this model that will be added to the other KEP you have been reviewing in the next day or to.

@pohly let's go ahead and close this KEP to avoid confusion.

alculquicondor · 2024-02-01T22:50:01Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+    // Count is the total number of items in a certain mode.
+    Count int64
+}
+```


a yaml example would be useful to visualize how this can look like.

alculquicondor · 2024-02-01T22:50:32Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+
+### Allocation result
+
+The following object gets serialized as JSON and stored in a claim's


by whom?
Does this impact scheduling?

This is out-dated. Because we decided to do "strong typing", the serialization will be done indirectly after embedding a pointer to this AllocationResult struct into the ResourceClaim status. Encoding then happens in the bind goroutine when the scheduler does the API call which updates the claim status.

pohly · 2024-02-02T07:28:57Z

Is this still a "draft"? The enhancements freeze is approaching.

In the sense that this was the initial version. I removed it from the PR title. But it is still very tentative and I keep asking @klueska to look at it because of the deadline.

sftim · 2024-02-02T10:40:14Z

keps/sig-node/4383-dra-semantic-model-mode-switching/kep.yaml

+participating-sigs:
+  - sig-scheduling
+  - sig-autoscaling
+status: implementable


I suggest making this provisional for now, and then bumping after the initial merge.

sftim · 2024-02-02T10:42:30Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+<!--
+What other approaches did you consider, and why did you rule them out? These do
+not need to be as detailed as the proposal, but should include enough
+information to express the idea and why it was not acceptable.
+-->


Could we use CEL to define an expression that evaluates down to the value we want? If we do, is that helpful; if we couldn't or don't want that, why is it worse?

We could use CEL. I like this better than building an improved LabelSelector grammar.

sftim · 2024-02-02T17:11:12Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+dynamic resource scheduler plugin if (and only if) the model is enabled.
+
+The semantic model uses a label selector for selecting instances and modes that
+is more capable than `metav1.LabelSelector`. For example, semantic version


I had a similar concern about Karpenter - the EC2NodeClass API has “selectors” that are not Kubernetes selectors, they are actually AWS EC2 filters (which can match on tags, whereas Kubernetes doesn't have tags).

However, EC2NodeClass is vendor code and not part of the Karpenter core that the Kubernetes project has adopted. So, it's not nearly as much as a problem; we don't expect vendors to follow our guidelines, though it's nice when they do.

Adding a new comparison operator would OK by me (eg SemVerGe to compare to strings as semantic versions), because we already have some selectors that support numeric comparison and others that don't. I don't like the idea of adding type into selectors; right now, the operator infers a type (eg, Lt infers Quantity). We should stick with that.

sftim · 2024-02-02T17:15:18Z

keps/sig-node/4383-dra-semantic-model-mode-switching/README.md

+selector:
+  label: capabilityFoo
+  op: IsSet
+  or:
+    - and:
+      - label: capabilityBar
+        op: IsNotSet
+      - label: name
+        op: GE
+        value: a
+      - label: memory
+        op: LT
+        value: 20Gi
+        type: Quantity
+      - label: driver
+        op: GT
+        value: v1.0.0
+        type: Semver


Suggested change

selector:

label: capabilityFoo

op: IsSet

or:

- and:

- label: capabilityBar

op: IsNotSet

- label: name

op: GE

value: a

- label: memory

op: LT

value: 20Gi

type: Quantity

- label: driver

op: GT

value: v1.0.0

type: Semver

resourceSelectorTerms:

- matchExpressions:

- key: features.vendor.example/capability-foo

operator: Exists

- matchExpressions:

- key: features.vendor.example/capability-bar

operator: NotExists

- key: vendor.example/xpu-memory

operator: Lt

values: ["20Gi"]

- key: vendor.example/driver-version

operator: SemVerGe

values: ["v1.0.0"]

pohly · 2024-02-06T07:06:16Z

/close

Will be merged into #4384, aka KEP #4381.

k8s-ci-robot · 2024-02-06T07:06:21Z

@pohly: Closed this PR.

In response to this:

/close

Will be merged into #4384, aka KEP #4381.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 31, 2024

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 31, 2024

k8s-ci-robot requested review from dchen1107 and kikisdeliveryservice January 31, 2024 15:15

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 31, 2024

KEP 4383: initial draft for "mode switching" semantic model

dff3dfb

pohly force-pushed the dra-semantic-model-mode-switching branch from 289654f to dff3dfb Compare January 31, 2024 15:15

k8s-ci-robot assigned byako and klueska Jan 31, 2024

This was referenced Jan 31, 2024

DRA: semantic model: mode switching #4383

Closed

KEP 4381: add structured parameters for dynamic resource allocation #4384

Merged

pohly commented Jan 31, 2024

View reviewed changes

SergeyKanzhelev added this to Triage in SIG Node PR Triage Jan 31, 2024

pohly commented Jan 31, 2024

View reviewed changes

pohly commented Feb 1, 2024

View reviewed changes

bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Feb 1, 2024

alculquicondor reviewed Feb 1, 2024

View reviewed changes

pohly changed the title ~~KEP 4383: initial draft for "mode switching" semantic model~~ KEP 4383: add "mode switching" semantic model Feb 2, 2024

sftim reviewed Feb 2, 2024

View reviewed changes

k8s-ci-robot closed this Feb 6, 2024

SIG Node PR Triage automation moved this from Needs Reviewer to Done Feb 6, 2024


		### Goals

		* Enable DRA GPU drivers to support cluster autoscaling.


		### Allocation result

		The following object gets serialized as JSON and stored in a claim's

KEP 4383: add "mode switching" semantic model #4464

KEP 4383: add "mode switching" semantic model #4464

Conversation

pohly commented Jan 31, 2024 • edited

k8s-ci-robot commented Jan 31, 2024

pohly commented Jan 31, 2024

Choose a reason for hiding this comment

pohly Jan 31, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Feb 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Feb 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim Feb 2, 2024 • edited

Choose a reason for hiding this comment

pohly commented Feb 6, 2024

k8s-ci-robot commented Feb 6, 2024

pohly commented Jan 31, 2024 •

edited

pohly Jan 31, 2024 •

edited

sftim Feb 2, 2024 •

edited