Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dra scheduler: ensure that we never have nil claim/class parameters #123828

Merged
merged 2 commits into from Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/kubelet/cm/dra/manager.go
Expand Up @@ -380,7 +380,7 @@ func (m *ManagerImpl) UnprepareResources(pod *v1.Pod) error {
return fmt.Errorf("NodeUnprepareResources returned result for unknown claim UID %s", claimUID)
}
if result.Error != "" {
return fmt.Errorf("NodeUnprepareResources failed for claim %s/%s: %s", reqClaim.Namespace, reqClaim.Name, err)
return fmt.Errorf("NodeUnprepareResources failed for claim %s/%s: %s", reqClaim.Namespace, reqClaim.Name, result.Error)
}

// Delete last pod UID only if unprepare succeeds.
Expand Down
Expand Up @@ -963,13 +963,15 @@ func (pl *dynamicResources) lookupParameters(logger klog.Logger, class *resource
if status != nil {
return
}
claimParameters, status = pl.lookupClaimParameters(logger, claim)
claimParameters, status = pl.lookupClaimParameters(logger, class, claim)
return
}

func (pl *dynamicResources) lookupClassParameters(logger klog.Logger, class *resourcev1alpha2.ResourceClass) (*resourcev1alpha2.ResourceClassParameters, *framework.Status) {
defaultClassParameters := resourcev1alpha2.ResourceClassParameters{}

if class.ParametersRef == nil {
return nil, nil
return &defaultClassParameters, nil
}

if class.ParametersRef.APIGroup == resourcev1alpha2.SchemeGroupVersion.Group &&
Expand Down Expand Up @@ -1004,9 +1006,31 @@ func (pl *dynamicResources) lookupClassParameters(logger klog.Logger, class *res
return nil, statusUnschedulable(logger, fmt.Sprintf("generated class parameters for %s.%s %s not found", class.ParametersRef.Kind, class.ParametersRef.APIGroup, klog.KRef(class.Namespace, class.ParametersRef.Name)))
}

func (pl *dynamicResources) lookupClaimParameters(logger klog.Logger, claim *resourcev1alpha2.ResourceClaim) (*resourcev1alpha2.ResourceClaimParameters, *framework.Status) {
func (pl *dynamicResources) lookupClaimParameters(logger klog.Logger, class *resourcev1alpha2.ResourceClass, claim *resourcev1alpha2.ResourceClaim) (*resourcev1alpha2.ResourceClaimParameters, *framework.Status) {
defaultClaimParameters := resourcev1alpha2.ResourceClaimParameters{
Shareable: true,
DriverRequests: []resourcev1alpha2.DriverRequests{
{
DriverName: class.DriverName,
Requests: []resourcev1alpha2.ResourceRequest{
{
ResourceRequestModel: resourcev1alpha2.ResourceRequestModel{
// TODO: This only works because NamedResources is
// the only model currently implemented. We need to
// match the default to how the resources of this
// class are being advertized in a ResourceSlice.
NamedResources: &resourcev1alpha2.NamedResourcesRequest{
Selector: "true",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we would have to parse this selector. I would imagine that a case of "no parameters" is fairly common, so we should optimize for by understanding nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the scheduler doesn't inherently know which resource request model a driver is using to advertise its resources. We only know this by parsing this object and seeing which field is set (in this case NamedResources).

As such, I think a better approach would be to allow a cluster admin to define a default ResourceClaimParameters object and link it to a resource class. However, this is a larger chunk of work (and we are past the code freeze), so I proposed adding this PR as a stop-gap for 1.30.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the scheduler doesn't inherently know which resource request model a driver is using.

Does it matter? The point is that it passes the "filter", regardless of the model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @pohly comment on that as I'm not intimately familiar with all of the code in this plugin. All I know is some default should be set and the default should allow sharing. Whether that's through an explicit object or a branch off of detecting nil I'll leave up to Patrick.

Copy link
Contributor

@pohly pohly Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach in this PR better because the rest of the code currently expects to determine the model and thus the resources based on the parameters. Making it handle nil adds more special cases.

There's also a clear path towards what we really want (configurable default parameters). We need those because detecting the model may not be enough, we also need to figure out what the filters for a claim with no parameters should be. This depends on the driver and/or cluster, not the model.

#123858 is tracking this next step.

},
},
},
},
},
},
}

if claim.Spec.ParametersRef == nil {
return nil, nil
return &defaultClaimParameters, nil
}
if claim.Spec.ParametersRef.APIGroup == resourcev1alpha2.SchemeGroupVersion.Group &&
claim.Spec.ParametersRef.Kind == "ResourceClaimParameters" {
Expand Down
Expand Up @@ -128,12 +128,10 @@ func newClaimController(logger klog.Logger, class *resourcev1alpha2.ResourceClas
}
for driverName, perDriver := range namedresourcesRequests {
var filter *resourcev1alpha2.NamedResourcesFilter
if classParameters != nil {
for _, f := range classParameters.Filters {
if f.DriverName == driverName && f.ResourceFilterModel.NamedResources != nil {
filter = f.ResourceFilterModel.NamedResources
break
}
for _, f := range classParameters.Filters {
if f.DriverName == driverName && f.ResourceFilterModel.NamedResources != nil {
filter = f.ResourceFilterModel.NamedResources
break
}
}
controller, err := namedresourcesmodel.NewClaimController(filter, perDriver.requests)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the existing behavior equivalent in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks the same to me. classParameters is now guaranteed to not be nil (a case that I had considered, in contrast to nil claim parameters), so the if check is not needed anymore. If it had been nil before, it's now empty, so the for loop doesn't do anything, just as before.

Expand Down