Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: DRA structured parameters: scheduler fixes #123903

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -83,7 +83,11 @@ func newResourceModel(logger klog.Logger, resourceSliceLister resourcev1alpha2li
if model[structured.NodeName] == nil {
model[structured.NodeName] = make(map[string]ResourceModels)
}
resource := model[structured.NodeName][handle.DriverName]
driverName := handle.DriverName
if driverName == "" {
driverName = claim.Status.DriverName
}
resource := model[structured.NodeName][driverName]
Comment on lines -86 to +90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this coming up in the past, but I don't remember. When might handle.DriverName == ""?

Copy link
Contributor Author

@pohly pohly Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simply how we defined our API:

// ResourceHandle holds opaque resource data for processing by a specific kubelet plugin.
type ResourceHandle struct {
	// DriverName specifies the name of the resource driver whose kubelet
	// plugin should be invoked to process this ResourceHandle's data once it
	// lands on a node. This may differ from the DriverName set in
	// ResourceClaimStatus this ResourceHandle is embedded in.
	DriverName string `json:"driverName,omitempty" protobuf:"bytes,1,opt,name=driverName"`

Perhaps we should have made it required to avoid if checks like this one here. Instead, we made it optional to allow avoiding redundant values in the claim status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... validation requires it:

allErrs = append(allErrs, validateResourceDriverName(resourceHandle.DriverName, idxPath.Child("driverName"))...)

That'll complain if the name is empty.

All questions about whether we could change validation aside (implies API break), I'm leaning towards keeping the validation as-is and fixing the API definition. This would make this PR unnecessary for 1.30 because only a small typo fix remains and also avoids this potential pitfall in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on fixing API definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember now why this is necessary. In "traditional" DRA, the driver's controller is responsible for populating the ResourceHandle. If the driver doesn't actually use the ResourceHandle to communicate information to the kubelet plugin (as is the case with the NVIDIA driver), then it shouldn't have to instantiate one just to set this DriverName field in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the driver doesn't actually use the ResourceHandle to communicate information to the kubelet plugin (as is the case with the NVIDIA driver), then it shouldn't have to instantiate one just to set this DriverName field in it.

Then there is no ResourceHandle to validate. That's unrelated to whether DriverName must be set when there is a ResourceHandle.

It's the reason we have this in the kubelet:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/dra/manager.go#L135-L140

Here I don't quite follow. That code only gets called if there is a ResourceHandle. Or is that using a fake ResourceHandle in claimInfo.ResourceHandles that didn't actually from from the claim status? If that is so, then why not copy in the right DriverName when faking the handle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #124075 for the API fix.

/close

for _, result := range structured.Results {
// Call AddAllocation for each known model. Each call itself needs to check for nil.
namedresourcesmodel.AddAllocation(&resource.NamedResources, result.NamedResources)
Expand Down Expand Up @@ -112,7 +116,7 @@ func newClaimController(logger klog.Logger, class *resourcev1alpha2.ResourceClas
p.parameters = append(p.parameters, request.VendorParameters)
p.requests = append(p.requests, request.ResourceRequestModel.NamedResources)
default:
return nil, fmt.Errorf("claim parameters %s: driverRequersts[%d].requests[%d]: no supported structured parameters found", klog.KObj(claimParameters), i, e)
return nil, fmt.Errorf("claim parameters %s: driverRequests[%d].requests[%d]: no supported structured parameters found", klog.KObj(claimParameters), i, e)
}
}
if len(p.requests) > 0 {
Expand Down