New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: KEP-4381: DRA: avoid kubelet API version dependency with REST proxy #4615
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
Keeping kubelet at some old release while upgrading the control and DRA drivers | ||
is desirable and officially supported by Kubernetes. To support the same when | ||
using DRA, the kubelet now provides a REST proxy via gRPC that can be used by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems generally useful for a range of version skew scenarios between on-node agents and the apiserver, as well as security controls.
I'm slightly concerned as to the scalability of this approach (kubelet acting as gateway) but the upsides involved for managing the impact of per node agents is very interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly concerned as to the scalability of this approach (kubelet acting as gateway)
OTOH, kube-apiserver will be rate-limiting requests anyway, so I doubt that it would really be kubelet that would be a gatekeeper here - in such case those would probably be throttled server side anyway.
remain the same. | ||
|
||
The alternative is to keep supporting older revisions for the sake of the kubelet. | ||
<<[/UNRESOLVED]>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if kubelet uses a certain version of the API for its own purposes. We can ensure that the apiserver provides a version of the API that is needed by an older kubelet as far back as we want to support version skew.
For example, suppose we switch to v1alpha3 in 1.31, a breaking change. Then in 1.32, we introduce v1beta1. Normally we would remove v1alpha3, but to try out version skew support, we can keep v1alpha3. Ideally this won't be hard as we don't plan any big changes for the API in 1.32. Then kubelet from v1.31 can be combined with drivers and control plane from v1.32 where both are using v1beta1.
By tunneling REST requests from drivers through kubelet to the API server it becomes possible to update drivers and the control plane to a newer release than the kubelet because the kubelet doesn't need to encode/decode the resource.k8s.io API types that are used by the drivers. This simplifies cluster upgrades.
ccb88f4
to
1428dec
Compare
convenience because for a PUT of a ResourceSlice, the driver has to be trusted | ||
to not create an object for some other driver on the node. This cannot be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a downside of the proxy path, right? The unstructured / dynamic client aggregation approach could have checked the driver association field, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
However, the "unstructured / dynamic client" then has other downsides. Right now, it can check this because we assume that all future ResourceSlices will be of the form "nodeName + driverName + other fields". Without knowing anything about "other fields", it's hard to write a controller that synchronizes the content in the ResourceSlice objects with the desired content - not impossible, but harder.
The other downside if we consider apiserver traffic is the usage of JSON for the request bodies (both ways) instead of protobuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want kubelet in the middle here if drivers are going to be doing more than write-only status reporting? If they are watching / getting / reading / reconciling status, that seems more like a normal client to me. Is the only reason for the node proxy so they can piggy-back on kubelet credentials and get authorized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage is indeed that the node authorizer continues to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but... not really the way we want, since it lets devices stomp each others' status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drivers on the same node have to trust each other. They often run with elevated privileges, so they can already do much harm locally even without this shared access to ResourceSlice objects of the node.
That's still better than having to trust all drivers anywhere in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is the ideal scenario for us to build out the authz capabilities of node bound SA tokens (possibly combined with #4600). We have a good story for node agents that want to perform write requests via VAP+SA node ref. Having something similar to that for reads seems broadly useful. Building an entire gRPC proxy as a one-off for DRA seems like the wrong approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The REST proxy could be reused. But I agree, figuring out how to do this through normal auth mechanisms is the better approach. That would be out-of-scope for this KEP, though.
Instead, I'll put in some wording along the lines of:
- DRA kubelet plugins must connect to the API server themselves to publish their own ResourceSlice obects and to read ResourceClaim objects.
- Limiting access of the plugin is the responsibility of whoever deploys the driver. Some methods and best practices may get documented separately.
Does that sound okay? Then I'll close this PR and create a different one with that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pohly I added this to the SIG Auth meeting for May 22nd. Hopefully we can hash out a concrete path forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets even more interesting with some uses cases that people have brought up for DRA: a DRA driver for a NIC runs on the node and must publish network-related information, ideally to the ResourceClaim status. It must do that itself because kubelet might not know what that information is due to version skew.
The node authorizer can limit write access to ResourceClaims that are actually in use on the node because it builds the graph (node -> pod -> claim). In contrast to ResourceSlice, there isn't a specific field in a ResourceClaim that can be checked.
[CSI](https://github.com/container-storage-interface/spec/blob/master/spec.md), | ||
with “volume” replaced by “resource” and volume specific parts removed. | ||
|
||
#### REST proxy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/sig api-machinery auth
/cc @jpbetz @deads2k @enj
For visibility to a kube-apiserver <-- kubelet <-- client
proxied API surface proposal
The read part of this is similar in some ways to what was proposed in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4188-kubelet-pod-readiness-api#proposed-api
- the reads / watches are passed through to kube-apiserver with the kubelet's credentials
- the API is a complete proxy, so client-go is usable (though tunneled over gRPC, which I'm not sure we've done before)
- the API allows drivers to write through to kube-apiserver using the kubelet's credentials
A lot of the discussion around that proposal seems relevant to this one (sig-arch 2023-08-24, https://docs.google.com/document/d/1BlmHq5uPyBUDlppYqAAzslVbAO8hilgjqZUTaNXUhKM/edit#bookmark=id.vud1o04xj4iv, https://www.youtube.com/watch?v=toN7t_y4zCk&t=22m45s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read part of this is similar in some ways to what was proposed in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4188-kubelet-pod-readiness-api#proposed-api
One key difference is that I am not proposing to add a new kubelet socket. I was worried about security implications around that. Instead, kubelet connects to plugins the same way as before and takes requests through one stream per plugin which is kept open by each plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given a desire to read/write effectively from the kube-apiserver and the development of validatingadmissionpolicy + serviceaccount node claims + downward API for node name that can be used to self-restrict reads (for convenience) and enforce write requirements (must be from the same node), why is the proxy a better and more secure choice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar enough with that alternative to do a comparison. Can you point me to documentation?
The ask is to ensure that a DRA driver deployed as daemonset gets RBAC permissions which allow it to read/create/update ResourceSlice objects where the "nodeName" field is the same as the node on which the pod is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ResourceClaim, the ask is to ensure that the pod only gets read permission for ResourceClaims referenced by pods which have been scheduled to the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k: I looked through https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy and https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/. It's not clear to me how I can connect the identity of the driver pod with the node that it is running on and that identity with rules that restricts what that pod can access.
The node authorizer uses hand-crafted Go code for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See kubernetes/kubernetes#124711 for the VAP+SA example David is referring to.
Note that admission only covers writes, so reads cannot be restricted in this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the key part:
object.metadata.name == request.userInfo.extra[\"authentication.kubernetes.io/node-name\"][0])
Does object
provide access to any field?
Is it possible to extend what gets added to the credentials? If there was a dra.kubernetes.io/driver-name
, then the same check could be done for the driverName
field in ResourceSlice. That would be better than what the REST proxy can do.
I'm fine with dropping the read check on ResourceClaim. Node authorizer also has a "loop hole" there because it does limit what data gets returned by watches.
One-line PR description: REST proxy
Issue link: DRA: structured parameters #4381
Other comments:
By tunneling REST requests from drivers through kubelet to the API server it becomes possible to update drivers and the control plane to a newer release than the kubelet because the kubelet doesn't need to encode/decode the resource.k8s.io API types that are used by the drivers. This simplifies cluster upgrades.