Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LEP: add "Share Manager Scheduling" #8294

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
65 changes: 65 additions & 0 deletions enhancements/20240402-share-manager-scheduling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Share Manager Scheduling

## Summary

In the Longhorn storage system, a share manager pod of an RWX volume is created on a random node, without the ability for users to specify a preferred locality. The purpose of this feature is to enhance the locality of an RWX volume and its share manager pod.

### Related Issues

https://github.com/longhorn/longhorn/issues/7872
https://github.com/longhorn/longhorn/issues/4863
https://github.com/longhorn/longhorn/issues/8255
https://github.com/longhorn/longhorn/issues/2335

james-munson marked this conversation as resolved.
Show resolved Hide resolved
## Motivation

### Goals

- Share manager pod respects the node selector specified in `storageClass.parameters["shareManagerNodeSelector"]`
- Share manager pod complies with the affinity rules defined in `storageClass.allowedTopologies`
- Share manager pod respects the newly introduced `storageClass.Parameters["shareManagerTolerations"]`

### Non-goals [optional]

`None`

## Proposal

### User Stories

A share manager pod of an RWX volume is unable to adhere to the specified node selector and affinity rules, leading to potential inefficiencies and performance issues. The feature aims to enhance this by ensuring share manager pods can be scheduled according to the given node selector and affinity rules.

### User Experience In Detail

The introduction of node selector and affinity rule capabilities in a storage class significantly enhances user control over the scheduling of share manager pods for RWX volumes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no share manager locality parameters are specified, will we respect the corresponding parameters for replica scheduling? Such as node selector, taint, and topology as well? This can have another benefit to have the SM pod, engine and one replica together on the same node.

Respect: setting in storage class for share manager -> settings in storage class for replica scheduling -> system managed locality settings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no share manager locality parameters are specified, will we respect the corresponding parameters for replica scheduling?

Yes. The share manager locality has nothing to do with the replica scheduling. It controls the share-manager pod for a RWX volume. Thus, users can apply the two localities simutaneously for put the all components on the same node.


- Node Selector:
Users can define node selectors directly within the storage class configuration by setting `storageClass.parameters["shareManagerNodeSelector"]`. When a share manager pod is to be scheduled, Kubernetes evaluates the node selectors specified by the user. The scheduler then ensures that the pod is placed only on nodes that match all of the specified labels. This mechanism provides a straightforward way to guide pod placement towards nodes that meet certain criteria, such as hardware capabilities, geographical location, or any other user-defined characteristic.

- Allowed Topologies:
Users can set specific rules for where share manager pods should go in the cluster by using `storageClass.allowedTopologies`. This setting is turned into affinity rules which are applied to a share manager pod. The affinity helps decide which nodes the pod can be placed on, based on the labels of those nodes.

- Tolerations:
Users can define tolerations for share manager pods within the storage class by setting storageClass.parameters["shareManagerTolerations"]. These tolerations allow share manager pods to be scheduled on nodes with matching taints.

### API changes

`None`

## Design

### Implementation Overview

When the share manager controller is in the process of reconciling a share manager, it attempts to search the associated storage class from the `persistentVolume.spec.StorageClassName`. If the storage class is nonexistent, node selectors and allowed topologies will also be absent, leading to a neglect of the share manager pod's locality.
james-munson marked this conversation as resolved.
Show resolved Hide resolved

If the associated storage class is present, the system reads `storageClass.parameters["shareManagerNodeSelector"]` and `storageClass.allowedTopologies`. The node selectors specified in `storageClass.parameters["shareManagerNodeSelector"]` are combined with the global selectors from system-managed-components-node-selector and are applied to the share manager pod. Additionally, the system translates the `storageClass.allowedTopologies` into affinity rules, which are then applied to the configuration of the share manager pod as well.

For tolerations, users are able to to specify these in the storage class through `storageClass.parameters["shareManagerTolerations"]`. These specified tolerations are combined with global tolerations defined under the global setting `taint-toleration`, enabling share manager pods to be allocated to nodes that have compatible taints.

Once the share manager pod is allocated to a suitable node, the node's name is set to the volume attachment ticket.

### Test plan

1. Test RWX volumes from a storage class with `parameters["shareManagerNodeSelector"]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between 1 and 3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shareManagerNodeSelector doesn't handle taint. If a node is tainted, users can use the toleration storageClass.Parameters["shareManagerTolerations"] to make the share-manager pod on the tainted node.

1. Test RWX volumes from a storage class with `storageClass.allowedTopologies`.
1. Test RWX volumes from a storage class with `storageClass.Parameters["shareManagerTolerations"]`.