[occm] Add taint out-of-service:NoExecute when node shutdown #2556

sebastienmusso · 2024-02-21T10:45:18Z

/kind feature

What happened:
Actually only node.cloudprovider.kubernetes.io/shutdown:NoSchedule taint is added to shutdown node.
pod with pvc still int terminating state

What you expected to happen:
When instance is shut taint node.kubernetes.io/out-of-service:NoExecute should be added to node in order to unmap and detach volume

azure cloud provider seems to update taint (https://github.com/kubernetes-sigs/cloud-provider-azure/blob/v1.29.0/pkg/provider/azure.go#L1319C18-L1319C33)

Could the openstack cloud provider handle this function and how ?
Does this code should go into openstack.go package or instances one (https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/openstack.go)
I could made the pr if needed

Environment:
openstack-cloud-controller-manager version: any
OpenStack version: any
Kubernetes version 1.24+

sebastienmusso · 2024-02-22T13:52:33Z

Nobody interested by this feature ?

dulek · 2024-02-28T17:15:55Z

Looks like CAPZ creates Informer for Nodes objects to populate its Node cache and that's how taint manipulations are done. We would need to do the same here, InstancesV2 doesn't really have an interface allowing that.

I think I'm supportive of the feature, though wonder if this is exactly a responsibility of CPO and not CAPO. @mdbooth, @zetaab, what do you think?

dulek · 2024-02-28T17:17:01Z

According to [1] this taint should be set manually by the user, which makes me question even more if its place is in CPO.

[1] https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service

sebastienmusso · 2024-02-28T18:35:47Z

How should we handle non graceful shutdown on vanilla clusters ? How to automatically recover StatefulSet and pvc detach ? Node deletion from the cluster and Iac provider seems to be the only way for the moment with this mannualy applied taint.

zetaab · 2024-02-28T21:12:29Z

https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/instancesv2.go#L102-L103 this is what you are trying to find?

actually that adds taint already https://github.com/kubernetes/cloud-provider/blob/71c6ce7b171ddf09943cb588f6a9427dd3282042/controllers/nodelifecycle/node_lifecycle_controller.go#L190

node.cloudprovider.kubernetes.io/shutdown:NoSchedule

the detach / attach logic is not coming from this repository for the CSI part. So if that taint is not enough, I would say its better to fix things outside of just openstack. It does not make sense that every kubernetes cloudprovider should do their own magic, instead of fixing it in central place (cloud-provider/csi core).

jichenjc · 2024-03-06T02:49:24Z

agree on @zetaab ,if there' something need to be done at CSI/cloud provider side, we need ask them to handle
at least they need provide some spec/guide to each provider if the implementation need to be done in provider only as contract

k8s-triage-robot · 2024-06-04T03:48:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

sebastienmusso · 2024-06-04T05:29:04Z

/close

k8s-ci-robot · 2024-06-04T05:29:09Z

@sebastienmusso: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mdbooth · 2024-06-04T09:36:00Z

I didn't see this first time round. I do actually think we could potentially add something like this, but the feature as implemented by Azure looks dangerous to me and we should not do that.

The NodeOutOfServiceVolumeDetach looks interesting and useful. The problem is that any safe implementation requires fencing. The node.kubernetes.io/out-of-service:NoExecute marks the node as fenced, but it makes no attempt to ensure it is fenced. This is why it's a user action: we don't implement fencing. Only the fencing agent can safely assert that the node is fenced, so only the fencing agent can set this.

'Fenced' in this context means:

The Node is not running. Not just 'not ready', but switched off.
The Node is forced to remain in this state while it is marked as 'fenced'.

These are because there can be no possibility that the node will access shared data while fenced.

An example of where the first is important: there is some control plane malfunction which does not affect the data plane. e.g. Kubelet is crash looping, but the pods it previously started are still running and continuing to access shared data.

For the second, consider any situation which may cause the Node to become unresponsive for a period of time before starting again. For example, the Node may be not ready because somebody bumped a cable or misconfigured a switch, or a CPU lockup caused the node to become completely unresponsive for several minutes. The Node can recover from these and become Ready again.

We mark it as fenced because we are going to perform actions which will compromise data integrity if the fenced node continues to access data.

The Azure code appears to fail this safety test on both counts:

the Node may not be Ready, but we have no assurance that it is not still running workloads
there is nothing to prevent the Node from starting or continuing previously-running workloads

I have thought for a while that cloud-providers could implement a useful fencing API, though. A 'recovery agent' could set a 'fence request' on the Node. The cloud-provider could then shut the Node down and take cloud-specific actions to ensure it does not restart without a deliberate action, and then indicate that the Node is fenced. At this point either the cloud provider could or the 'recovery agent' could safely set node.kubernetes.io/out-of-service:NoExecute.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 21, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 4, 2024

k8s-ci-robot closed this as completed Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[occm] Add taint out-of-service:NoExecute when node shutdown #2556

[occm] Add taint out-of-service:NoExecute when node shutdown #2556

sebastienmusso commented Feb 21, 2024

sebastienmusso commented Feb 22, 2024

dulek commented Feb 28, 2024

dulek commented Feb 28, 2024

sebastienmusso commented Feb 28, 2024

zetaab commented Feb 28, 2024 •

edited

jichenjc commented Mar 6, 2024

k8s-triage-robot commented Jun 4, 2024

sebastienmusso commented Jun 4, 2024

k8s-ci-robot commented Jun 4, 2024

mdbooth commented Jun 4, 2024

[occm] Add taint out-of-service:NoExecute when node shutdown #2556

[occm] Add taint out-of-service:NoExecute when node shutdown #2556

Comments

sebastienmusso commented Feb 21, 2024

sebastienmusso commented Feb 22, 2024

dulek commented Feb 28, 2024

dulek commented Feb 28, 2024

sebastienmusso commented Feb 28, 2024

zetaab commented Feb 28, 2024 • edited

jichenjc commented Mar 6, 2024

k8s-triage-robot commented Jun 4, 2024

sebastienmusso commented Jun 4, 2024

k8s-ci-robot commented Jun 4, 2024

mdbooth commented Jun 4, 2024

zetaab commented Feb 28, 2024 •

edited