Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA: structured parameters #123516

Merged
merged 18 commits into from Mar 8, 2024

Commits on Mar 7, 2024

  1. scheduler: fix assume cache with no index

    The assume cache in the volumbinding plugin can be created with no separate
    index, but List then failed because it tried to use the empty index name
    instead of using the store's List function.
    pohly committed Mar 7, 2024
    Copy the full SHA
    eb1470d View commit details
    Browse the repository at this point in the history
  2. dra api: add structured parameters

    NodeResourceSlice will be used by kubelet to publish resource information on
    behalf of DRA drivers on the node. NodeName and DriverName in
    NodeResourceSlice must be immutable. This simplifies tracking the different
    objects because what they are for cannot change after creation.
    
    The new field in ResourceClass tells scheduler and autoscaler that they are
    expected to handle allocation.
    
    ResourceClaimParameters and ResourceClassParameters are new types for telling
    in-tree components how to handle claims.
    pohly committed Mar 7, 2024
    Copy the full SHA
    39bbced View commit details
    Browse the repository at this point in the history
  3. node authorizer: lock down access for NodeResourceSlice

    The kubelet running on one node should not be allowed to access
    NodeResourceSlice objects belonging to some other node, as defined by the
    NodeResourceSlice.NodeName field.
    pohly committed Mar 7, 2024
    Copy the full SHA
    2e34e18 View commit details
    Browse the repository at this point in the history
  4. noderestriction admission: lock down create of NodeResourceSlice

    The proper value of NodeName must be checked here for create because
    the node authorizer cannot do it.
    pohly committed Mar 7, 2024
    Copy the full SHA
    a92d2a4 View commit details
    Browse the repository at this point in the history
  5. dra scheduler: support structured parameters

    When a claim uses structured parameters, as indicated by the resource class
    flag, the scheduler is responsible for allocating it. To do this it needs to
    gather information about available node resources by watching
    NodeResourceSlices and then match the in-tree claim parameters against those
    resources.
    pohly committed Mar 7, 2024
    Copy the full SHA
    096e948 View commit details
    Browse the repository at this point in the history
  6. dra: add "named resources" structured parameter model

    Like the current device plugin interface, a DRA driver using this model
    announces a list of resource instances. In contrast to device plugins, this
    list is made available to the scheduler together with attributes that can be
    used to select suitable instances when they are not all alike.
    
    Because this is the first structured parameter model, some checks that
    previously were not possible, in particular "is one structured parameter field
    set", now gets enabled. Adding another structured parameter model will be
    similar.
    
    The applyconfigs code generator assumes that all types in an API are defined in
    a single package. If it wasn't for that, it would be possible to place the
    "named resources" types in separate packages, which makes their names in the Go
    code more natural and provides an indication of their stability level because
    the package name could include a version.
    pohly committed Mar 7, 2024
    Copy the full SHA
    d4d5ade View commit details
    Browse the repository at this point in the history
  7. Copy the full SHA
    4ed2b3e View commit details
    Browse the repository at this point in the history
  8. kubelet: support structured parameters for preparing resources

    If the resource handle has data from a structured parameter model, then we need
    to pass that to the DRA driver kubelet plugin. Because Kubernetes uses
    gogo/protobuf, we cannot use "optional" for that new optional field and have to
    resort to "repeated" with a single repetition if present.
    
    This is a new, backwards-compatible field.
    
    That extending the resource.k8s.io changes the checksum of a kubelet checkpoint
    is unfortunate. Updating the test cases is a stop-gap measure, the actual
    solution will have to be something else before beta.
    pohly committed Mar 7, 2024
    Copy the full SHA
    6f1ddfc View commit details
    Browse the repository at this point in the history
  9. dra controller: support structured parameters

    When allocation was done by the scheduler, the controller needs to do the
    deallocation because there is no control-plane controller which could react to
    "DeallocationRequested".
    pohly committed Mar 7, 2024
    Copy the full SHA
    3de376e View commit details
    Browse the repository at this point in the history
  10. dra testing: add tests for structured parameters

    The test driver now supports a ConfigMap (as before) and the named resources
    structured parameter model. It doesn't have any instance attributes.
    pohly committed Mar 7, 2024
    Copy the full SHA
    5e40afc View commit details
    Browse the repository at this point in the history
  11. dra kubelet: publish NodeResourceSlices

    The information is received from the DRA driver plugin through a new gRPC
    streaming interface. This is backwards compatible with old DRA driver kubelet
    plugins, their gRPC server will return "not implemented" and that can be
    handled by kubelet. Therefore no API break is needed.
    
    However, DRA drivers need to be updated because the Go API changed. They can
    return
        status.New(codes.Unimplemented, "no node resource support").Err()
    if they don't support the new ListAndWatchResources method and
    structured parameters.
    
    The controller in kubelet then synchronizes this information from the driver
    with NodeResourceSlice objects, creating, updating and deleting them as needed.
    pohly committed Mar 7, 2024
    Copy the full SHA
    d59676a View commit details
    Browse the repository at this point in the history
  12. Copy the full SHA
    234dc1f View commit details
    Browse the repository at this point in the history
  13. dra api: implement semver attribute value type

    This adds support for semantic version comparison to the CEL support in the
    "named resources" structured parameter model. For example, it can be used to
    check that an instance supports a certain API level.
    
    To minimize the risk, the new "semver" type is only defined in the CEL
    environment for DRA expressions, not in the base library. See
    kubernetes#123664 for a PR which
    adds it to the base library.
    
    Validation of semver strings is done with the regular expression from
    semver.org. The actual evaluation at runtime then uses semver/v4.
    pohly committed Mar 7, 2024
    Copy the full SHA
    42ee56f View commit details
    Browse the repository at this point in the history
  14. dra api: rename NodeResourceSlice -> ResourceSlice

    While currently those objects only get published by the kubelet for node-local
    resources, this could change once we also support network-attached
    resources. Dropping the "Node" prefix enables such a future extension.
    
    The NodeName in ResourceSlice and StructuredResourceHandle then becomes
    optional. The kubelet still needs to provide one and it must match its own node
    name, otherwise it doesn't have permission to access ResourceSlice objects.
    pohly committed Mar 7, 2024
    Copy the full SHA
    0b6a0d6 View commit details
    Browse the repository at this point in the history
  15. dra e2e: move ResourceSlice test

    This should better run with multiple nodes, it's more realistic that way.
    pohly committed Mar 7, 2024
    Copy the full SHA
    2c6246c View commit details
    Browse the repository at this point in the history
  16. dra scheduler: consider in-flight allocation for resource calculation

    Storing a modified claim with allocation and the original resource version in
    the assume cache was not reliable: if an update was received, it replaced the
    modified claim and the resource that was reserved for the claim might have been
    used for some other claim.
    
    To fix this, the in-flight claims are now stored in the map instead of just a
    boolean and the status stored there overrides whatever is in the assume cache.
    
    Logging got extended to diagnose this problem better. It started to occur in
    E2E tests after splitting the claim update so that first the finalizer is set
    and then the status, because setting the finalizer triggered an update.
    pohly committed Mar 7, 2024
    Copy the full SHA
    251b385 View commit details
    Browse the repository at this point in the history
  17. dra e2e: enable more tests for usage with structured parameters

    This finishes the shuffling around of test scenarios so that all of them which
    make sense with structured parameters are also executed with those.
    pohly committed Mar 7, 2024
    Copy the full SHA
    7f5566a View commit details
    Browse the repository at this point in the history
  18. dra api: enable new CEL features by faking their version

    There are two approaches for making new versioned CEL features available in the
    release where they get introduced:
    - Always use the environment for "StoredExpressions".
    - Use an older version (typically 1.0) and only bump it up later.
    
    The second approach was used before, so this is now also done here.
    pohly committed Mar 7, 2024
    Copy the full SHA
    6a361e1 View commit details
    Browse the repository at this point in the history