New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: DRA structured parameters: scheduler fixes #123903
WIP: DRA structured parameters: scheduler fixes #123903
Conversation
The API doesn't require that a resource handle contains a driver name. If unset, the name of the driver in the claim status needs to be used.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
resource := model[structured.NodeName][handle.DriverName] | ||
driverName := handle.DriverName | ||
if driverName == "" { | ||
driverName = claim.Status.DriverName | ||
} | ||
resource := model[structured.NodeName][driverName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this coming up in the past, but I don't remember. When might handle.DriverName == ""
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simply how we defined our API:
// ResourceHandle holds opaque resource data for processing by a specific kubelet plugin.
type ResourceHandle struct {
// DriverName specifies the name of the resource driver whose kubelet
// plugin should be invoked to process this ResourceHandle's data once it
// lands on a node. This may differ from the DriverName set in
// ResourceClaimStatus this ResourceHandle is embedded in.
DriverName string `json:"driverName,omitempty" protobuf:"bytes,1,opt,name=driverName"`
Perhaps we should have made it required to avoid if checks like this one here. Instead, we made it optional to allow avoiding redundant values in the claim status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... validation requires it:
allErrs = append(allErrs, validateResourceDriverName(resourceHandle.DriverName, idxPath.Child("driverName"))...) |
That'll complain if the name is empty.
All questions about whether we could change validation aside (implies API break), I'm leaning towards keeping the validation as-is and fixing the API definition. This would make this PR unnecessary for 1.30 because only a small typo fix remains and also avoids this potential pitfall in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on fixing API definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember now why this is necessary. In "traditional" DRA, the driver's controller is responsible for populating the ResourceHandle. If the driver doesn't actually use the ResourceHandle to communicate information to the kubelet plugin (as is the case with the NVIDIA driver), then it shouldn't have to instantiate one just to set this DriverName field in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the reason we have this in the kubelet:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/dra/manager.go#L135-L140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the driver doesn't actually use the ResourceHandle to communicate information to the kubelet plugin (as is the case with the NVIDIA driver), then it shouldn't have to instantiate one just to set this DriverName field in it.
Then there is no ResourceHandle
to validate. That's unrelated to whether DriverName
must be set when there is a ResourceHandle
.
It's the reason we have this in the kubelet:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/dra/manager.go#L135-L140
Here I don't quite follow. That code only gets called if there is a ResourceHandle
. Or is that using a fake ResourceHandle
in claimInfo.ResourceHandles
that didn't actually from from the claim status? If that is so, then why not copy in the right DriverName
when faking the handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #124075 for the API fix.
/close
@pohly: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/assign @klueska |
@pohly: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This fixes two issues in code added for structured parameters in 1.30. The typo is harmless. The other might affect real drivers once they start to use structured parameters, depending on how they use it.
Special notes for your reviewer:
No release note because the goal is to fix the code before it ever gets released with the problem.
Found while writing unit tests... those will follow in a separate PR, including coverage for for both cases.
Does this PR introduce a user-facing change?
/assign @klueska
/cc @alculquicondor