RFC: KEP-4381: DRA: avoid kubelet API version dependency with REST proxy #4615

pohly · 2024-05-03T14:57:10Z

One-line PR description: REST proxy
Issue link: DRA: structured parameters #4381
Other comments:

By tunneling REST requests from drivers through kubelet to the API server it becomes possible to update drivers and the control plane to a newer release than the kubelet because the kubelet doesn't need to encode/decode the resource.k8s.io API types that are used by the drivers. This simplifies cluster upgrades.

k8s-ci-robot · 2024-05-03T14:57:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pohly
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2024-05-03T15:34:51Z

keps/sig-node/4381-dra-structured-parameters/README.md

+
+Keeping kubelet at some old release while upgrading the control and DRA drivers
+is desirable and officially supported by Kubernetes. To support the same when
+using DRA, the kubelet now provides a REST proxy via gRPC that can be used by


This seems generally useful for a range of version skew scenarios between on-node agents and the apiserver, as well as security controls.

I'm slightly concerned as to the scalability of this approach (kubelet acting as gateway) but the upsides involved for managing the impact of per node agents is very interesting.

I'm slightly concerned as to the scalability of this approach (kubelet acting as gateway)

OTOH, kube-apiserver will be rate-limiting requests anyway, so I doubt that it would really be kubelet that would be a gatekeeper here - in such case those would probably be throttled server side anyway.

pohly · 2024-05-04T18:13:02Z

keps/sig-node/4381-dra-structured-parameters/README.md

+remain the same.
+
+The alternative is to keep supporting older revisions for the sake of the kubelet.
+<<[/UNRESOLVED]>>


I think it's fine if kubelet uses a certain version of the API for its own purposes. We can ensure that the apiserver provides a version of the API that is needed by an older kubelet as far back as we want to support version skew.

For example, suppose we switch to v1alpha3 in 1.31, a breaking change. Then in 1.32, we introduce v1beta1. Normally we would remove v1alpha3, but to try out version skew support, we can keep v1alpha3. Ideally this won't be hard as we don't plan any big changes for the API in 1.32. Then kubelet from v1.31 can be combined with drivers and control plane from v1.32 where both are using v1beta1.

By tunneling REST requests from drivers through kubelet to the API server it becomes possible to update drivers and the control plane to a newer release than the kubelet because the kubelet doesn't need to encode/decode the resource.k8s.io API types that are used by the drivers. This simplifies cluster upgrades.

liggitt · 2024-05-06T12:30:12Z

keps/sig-node/4381-dra-structured-parameters/README.md

+convenience because for a PUT of a ResourceSlice, the driver has to be trusted
+to not create an object for some other driver on the node. This cannot be


This is a downside of the proxy path, right? The unstructured / dynamic client aggregation approach could have checked the driver association field, right?

Correct.

However, the "unstructured / dynamic client" then has other downsides. Right now, it can check this because we assume that all future ResourceSlices will be of the form "nodeName + driverName + other fields". Without knowing anything about "other fields", it's hard to write a controller that synchronizes the content in the ResourceSlice objects with the desired content - not impossible, but harder.

The other downside if we consider apiserver traffic is the usage of JSON for the request bodies (both ways) instead of protobuf.

Why do we want kubelet in the middle here if drivers are going to be doing more than write-only status reporting? If they are watching / getting / reading / reconciling status, that seems more like a normal client to me. Is the only reason for the node proxy so they can piggy-back on kubelet credentials and get authorized?

The advantage is indeed that the node authorizer continues to work.

but... not really the way we want, since it lets devices stomp each others' status?

Drivers on the same node have to trust each other. They often run with elevated privileges, so they can already do much harm locally even without this shared access to ResourceSlice objects of the node.

That's still better than having to trust all drivers anywhere in the cluster.

I feel like this is the ideal scenario for us to build out the authz capabilities of node bound SA tokens (possibly combined with #4600). We have a good story for node agents that want to perform write requests via VAP+SA node ref. Having something similar to that for reads seems broadly useful. Building an entire gRPC proxy as a one-off for DRA seems like the wrong approach.

The REST proxy could be reused. But I agree, figuring out how to do this through normal auth mechanisms is the better approach. That would be out-of-scope for this KEP, though.

Instead, I'll put in some wording along the lines of:

DRA kubelet plugins must connect to the API server themselves to publish their own ResourceSlice obects and to read ResourceClaim objects.

Limiting access of the plugin is the responsibility of whoever deploys the driver. Some methods and best practices may get documented separately.

Does that sound okay? Then I'll close this PR and create a different one with that approach.

@pohly I added this to the SIG Auth meeting for May 22nd. Hopefully we can hash out a concrete path forward.

It gets even more interesting with some uses cases that people have brought up for DRA: a DRA driver for a NIC runs on the node and must publish network-related information, ideally to the ResourceClaim status. It must do that itself because kubelet might not know what that information is due to version skew.

The node authorizer can limit write access to ResourceClaims that are actually in use on the node because it builds the graph (node -> pod -> claim). In contrast to ResourceSlice, there isn't a specific field in a ResourceClaim that can be checked.

liggitt · 2024-05-06T12:31:28Z

keps/sig-node/4381-dra-structured-parameters/README.md

+[CSI](https://github.com/container-storage-interface/spec/blob/master/spec.md),
+with “volume” replaced by “resource” and volume specific parts removed.
+
+#### REST proxy


/sig api-machinery auth
/cc @jpbetz @deads2k @enj

For visibility to a kube-apiserver <-- kubelet <-- client proxied API surface proposal

The read part of this is similar in some ways to what was proposed in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4188-kubelet-pod-readiness-api#proposed-api

the reads / watches are passed through to kube-apiserver with the kubelet's credentials

the API is a complete proxy, so client-go is usable (though tunneled over gRPC, which I'm not sure we've done before)

the API allows drivers to write through to kube-apiserver using the kubelet's credentials

A lot of the discussion around that proposal seems relevant to this one (sig-arch 2023-08-24, https://docs.google.com/document/d/1BlmHq5uPyBUDlppYqAAzslVbAO8hilgjqZUTaNXUhKM/edit#bookmark=id.vud1o04xj4iv, https://www.youtube.com/watch?v=toN7t_y4zCk&t=22m45s)

The read part of this is similar in some ways to what was proposed in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4188-kubelet-pod-readiness-api#proposed-api

One key difference is that I am not proposing to add a new kubelet socket. I was worried about security implications around that. Instead, kubelet connects to plugins the same way as before and takes requests through one stream per plugin which is kept open by each plugin.

Given a desire to read/write effectively from the kube-apiserver and the development of validatingadmissionpolicy + serviceaccount node claims + downward API for node name that can be used to self-restrict reads (for convenience) and enforce write requirements (must be from the same node), why is the proxy a better and more secure choice?

I'm not familiar enough with that alternative to do a comparison. Can you point me to documentation?

The ask is to ensure that a DRA driver deployed as daemonset gets RBAC permissions which allow it to read/create/update ResourceSlice objects where the "nodeName" field is the same as the node on which the pod is running.

For ResourceClaim, the ask is to ensure that the pod only gets read permission for ResourceClaims referenced by pods which have been scheduled to the node.

@deads2k: I looked through https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy and https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/. It's not clear to me how I can connect the identity of the driver pod with the node that it is running on and that identity with rules that restricts what that pod can access.

The node authorizer uses hand-crafted Go code for this.

See kubernetes/kubernetes#124711 for the VAP+SA example David is referring to.

Note that admission only covers writes, so reads cannot be restricted in this way.

This seems to be the key part:
object.metadata.name == request.userInfo.extra[\"authentication.kubernetes.io/node-name\"][0])

Does object provide access to any field?

Is it possible to extend what gets added to the credentials? If there was a dra.kubernetes.io/driver-name, then the same check could be done for the driverName field in ResourceSlice. That would be better than what the REST proxy can do.

I'm fine with dropping the read check on ResourceClaim. Node authorizer also has a "loop hole" there because it does limit what data gets returned by watches.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 3, 2024

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 3, 2024

k8s-ci-robot requested review from dchen1107 and mrunalp May 3, 2024 14:57

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 3, 2024

smarterclayton reviewed May 3, 2024

View reviewed changes

This was referenced May 3, 2024

DRAFT: Propose an alternative to the types in capacity_types.go kubernetes-sigs/wg-device-management#8

Draft

RFC: KEP-4381: DRA: management access #4611

Open

pohly commented May 4, 2024

View reviewed changes

bart0sh added this to Needs Reviewer in SIG Node PR Triage May 4, 2024

pohly force-pushed the dra-kubelet-rest-proxy branch from ccb88f4 to 1428dec Compare May 6, 2024 11:21

liggitt reviewed May 6, 2024

View reviewed changes

k8s-ci-robot requested review from deads2k, enj and jpbetz May 6, 2024 12:45

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels May 6, 2024

pohly mentioned this pull request May 6, 2024

RFC: make kubelet independent of the resource.k8s.io API version kubernetes/kubernetes#124274

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: KEP-4381: DRA: avoid kubelet API version dependency with REST proxy #4615

RFC: KEP-4381: DRA: avoid kubelet API version dependency with REST proxy #4615

pohly commented May 3, 2024

k8s-ci-robot commented May 3, 2024

smarterclayton May 3, 2024

wojtek-t May 6, 2024

pohly May 4, 2024

liggitt May 6, 2024

pohly May 6, 2024

liggitt May 6, 2024

pohly May 6, 2024

liggitt May 6, 2024

pohly May 6, 2024

enj May 8, 2024

pohly May 9, 2024

enj May 15, 2024

pohly May 15, 2024

liggitt May 6, 2024

pohly May 6, 2024

deads2k May 6, 2024

pohly May 6, 2024

pohly May 6, 2024

pohly May 7, 2024

enj May 8, 2024

pohly May 8, 2024

		convenience because for a PUT of a ResourceSlice, the driver has to be trusted
		to not create an object for some other driver on the node. This cannot be

RFC: KEP-4381: DRA: avoid kubelet API version dependency with REST proxy #4615

Are you sure you want to change the base?

RFC: KEP-4381: DRA: avoid kubelet API version dependency with REST proxy #4615

Conversation

pohly commented May 3, 2024

k8s-ci-robot commented May 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment