Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 4383: add "mode switching" semantic model #4464

Closed

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jan 31, 2024

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 31, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign deads2k, mrunalp for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 31, 2024
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jan 31, 2024
@pohly pohly force-pushed the dra-semantic-model-mode-switching branch from 289654f to dff3dfb Compare January 31, 2024 15:15
@pohly
Copy link
Contributor Author

pohly commented Jan 31, 2024

/assign @byako @klueska

```
type Selector struct {
Label string
Op SelectorOp // LT, LEQ, EQ, NEQ, GEQ, GT, IsSet, IsNotSet
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left out In because that can be expressed via label == x || label == y.

Copy link
Contributor Author

@pohly pohly Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using the names for comparison as in https://pkg.go.dev/github.com/blang/semver/v4... except that I should have used GTE instead of GEQ. Same with LTE instead of LEQ, NEQ -> NE - should have looked this up instead of writing it from memory.

We can also spell it out, if that's preferred. I wanted to keep it short because this isn't a user-facing API: a vendor generates this (somehow), the semantic model receives it.

// ClaimParameters are the parameters from the time that the claim was
// allocated. Some of the fields were used by the counter controller to
// allocated resources, others can be arbitrary setup parameters.
ClaimParameters runtime.RawExtension
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a discussion over in #4384 about using RawExtension. Here it is unavoidable because the type of the claim parameters is the unknown vendor CRD. Same for the class parameters.

components:
- kube-scheduler
# kube-apiserver, kubelet and kube-controller-manager are agnostic to
# which numeric model is being used.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which might not be true, depending on #4384 (comment). We need to conclude that discussion and then update here.

Value string
And []Selector
Or []Selector
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thockin: in https://github.com/kubernetes/enhancements/pull/4384/files#r1473463075 you wrote

Encoding an expression-grammar in YAML is kind of disgusting.

So metav1.LabelSelector is disgusting? It does exactly that. This Selector is conceptually very similar, just with additional operations, type support, and nested expression. Okay, quite a bit more complex... But remember that this is for DRA driver vendors, not end-users.

I'm open for suggestions if there is a better way to do this while preserving versioning support in the API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So metav1.LabelSelector is disgusting?

Yes, it is :) It's mitigated by the limited scope (no compound booleans, limited operators), but I feel that we should have just made it an expression (we didn't have CEL at the time, but still).

selector: In(self["example.com/my-key"], ["foo", "bar"])

vs:

selector:
  matchExpressions:                                                                                        
    - key: "example.com/my-key"
      operator:  NotIn
      values:
      - foo
      - bar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I'm sold on the idea of using CEL expressions instead.

The label selector is matched against a set of label name and value pairs. In
the mode switching model, this set is the combination of labels specified for
the instance and the labels specified for each mode, with labels of a mode
overwriting the ones of the instance.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also make the label selector available as a stand-alone semantic model. Then it can be combined with #4382.

This is not in the KEP yet, but could be added.

@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Feb 1, 2024
@alculquicondor
Copy link
Member

Is this still a "draft"? The enhancements freeze is approaching.


### Goals

* Enable DRA GPU drivers to support cluster autoscaling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is not just about autoscaling.
A semantic model allows us to do faster scheduling.
It will also allow to do bulk calculations for a set of pods, like Kueue's quota system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left "faster scheduling" out of the goals because a) it's not relevant for the use cases that DRA is being considered for and b) if we make it a goal, we would have to measure whether it has been achieved.

Because we don't have a workload to measure with and no customer requirements to meet certain scheduling throughput or latency targets, we wouldn't be able to decide whether the goal has been reached.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can call it standalone scheduling, in the sense that kube-scheduler doesn't need to communicate with any other component to make a scheduling decision.

dynamic resource scheduler plugin if (and only if) the model is enabled.

The semantic model uses a label selector for selecting instances and modes that
is more capable than `metav1.LabelSelector`. For example, semantic version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not enhance the existing node selectors?

semantic version awareness seems like something universally useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because extending metav1.LabelSelector is modifying other existing GA APIs. That could be solved by being extremely careful about where and when to allow new operators, but is probably not something that should be attempted immediately when we don't even know whether this new selector proposal is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not much in favor with the idea of having more than one way to express the same thing.

Is there anything else, other than semantic versioning, that this API allows that the existing node selectors doesn't support?
They already support OR (nodeSelectorTerms) and AND (matchExpressions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar concern about Karpenter - the EC2NodeClass API has “selectors” that are not Kubernetes selectors, they are actually AWS EC2 filters (which can match on tags, whereas Kubernetes doesn't have tags).

However, EC2NodeClass is vendor code and not part of the Karpenter core that the Kubernetes project has adopted. So, it's not nearly as much as a problem; we don't expect vendors to follow our guidelines, though it's nice when they do.


Adding a new comparison operator would OK by me (eg SemVerGe to compare to strings as semantic versions), because we already have some selectors that support numeric comparison and others that don't. I don't like the idea of adding type into selectors; right now, the operator infers a type (eg, Lt infers Quantity). We should stick with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything else, other than semantic versioning, that this API allows that the existing node selectors doesn't support?

Deeply nested expressions. With this grammar one can do a == 1 || (b > 100 && (c != 42 || d == 1000) ).

I don't like the idea of adding type into selectors; right now, the operator infers a type (eg, Lt infers Quantity).

We need to implement == for strings, quantity, and semver. I can see us using Eq" for semver, ==` for quantity, and "Is" for strings.

But what do we then use for '<' for strings?

None of this is an issue when defining the type separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using a CEL expression stored inside a string.

```


### Capacity type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these standalone API objects? what is the correspondence with Nodes? 1:1? 1:n?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these standalone API objects?

No, they get embedded in a resource.k8s.io/NodeResourceCapacity API object. There was some back-and-forth in #4384 on how to do that ("strongly typed" vs. "RawExtension"), but the consensus now is that there will be a pointer to this type in a one-of-many union type ("strongly typed").

Each node can have multiple instances where each instance is associated with one Capacity struct that describes it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify in the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do once #4384 is a bit closer to merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh gosh... I thought I was reading the one and only KEP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alculquicondor This KEP is now obsolete. I've been working on an replacement to this model that will be added to the other KEP you have been reviewing in the next day or to.

@pohly let's go ahead and close this KEP to avoid confusion.

// Count is the total number of items in a certain mode.
Count int64
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a yaml example would be useful to visualize how this can look like.


### Allocation result

The following object gets serialized as JSON and stored in a claim's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by whom?
Does this impact scheduling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out-dated. Because we decided to do "strong typing", the serialization will be done indirectly after embedding a pointer to this AllocationResult struct into the ResourceClaim status. Encoding then happens in the bind goroutine when the scheduler does the API call which updates the claim status.

@pohly pohly changed the title KEP 4383: initial draft for "mode switching" semantic model KEP 4383: add "mode switching" semantic model Feb 2, 2024
@pohly
Copy link
Contributor Author

pohly commented Feb 2, 2024

Is this still a "draft"? The enhancements freeze is approaching.

In the sense that this was the initial version. I removed it from the PR title. But it is still very tentative and I keep asking @klueska to look at it because of the deadline.

participating-sigs:
- sig-scheduling
- sig-autoscaling
status: implementable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this provisional for now, and then bumping after the initial merge.

Comment on lines +886 to +890
<!--
What other approaches did you consider, and why did you rule them out? These do
not need to be as detailed as the proposal, but should include enough
information to express the idea and why it was not acceptable.
-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use CEL to define an expression that evaluates down to the value we want? If we do, is that helpful; if we couldn't or don't want that, why is it worse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use CEL. I like this better than building an improved LabelSelector grammar.

dynamic resource scheduler plugin if (and only if) the model is enabled.

The semantic model uses a label selector for selecting instances and modes that
is more capable than `metav1.LabelSelector`. For example, semantic version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar concern about Karpenter - the EC2NodeClass API has “selectors” that are not Kubernetes selectors, they are actually AWS EC2 filters (which can match on tags, whereas Kubernetes doesn't have tags).

However, EC2NodeClass is vendor code and not part of the Karpenter core that the Kubernetes project has adopted. So, it's not nearly as much as a problem; we don't expect vendors to follow our guidelines, though it's nice when they do.


Adding a new comparison operator would OK by me (eg SemVerGe to compare to strings as semantic versions), because we already have some selectors that support numeric comparison and others that don't. I don't like the idea of adding type into selectors; right now, the operator infers a type (eg, Lt infers Quantity). We should stick with that.

Comment on lines +228 to +245
selector:
label: capabilityFoo
op: IsSet
or:
- and:
- label: capabilityBar
op: IsNotSet
- label: name
op: GE
value: a
- label: memory
op: LT
value: 20Gi
type: Quantity
- label: driver
op: GT
value: v1.0.0
type: Semver
Copy link
Contributor

@sftim sftim Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
selector:
label: capabilityFoo
op: IsSet
or:
- and:
- label: capabilityBar
op: IsNotSet
- label: name
op: GE
value: a
- label: memory
op: LT
value: 20Gi
type: Quantity
- label: driver
op: GT
value: v1.0.0
type: Semver
resourceSelectorTerms:
- matchExpressions:
- key: features.vendor.example/capability-foo
operator: Exists
- matchExpressions:
- key: features.vendor.example/capability-bar
operator: NotExists
- key: vendor.example/xpu-memory
operator: Lt
values: ["20Gi"]
- key: vendor.example/driver-version
operator: SemVerGe
values: ["v1.0.0"]

@pohly
Copy link
Contributor Author

pohly commented Feb 6, 2024

/close

Will be merged into #4384, aka KEP #4381.

@k8s-ci-robot
Copy link
Contributor

@pohly: Closed this PR.

In response to this:

/close

Will be merged into #4384, aka KEP #4381.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

SIG Node PR Triage automation moved this from Needs Reviewer to Done Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants