Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[occm] Add taint out-of-service:NoExecute when node shutdown #2556

Open
sebastienmusso opened this issue Feb 21, 2024 · 6 comments
Open

[occm] Add taint out-of-service:NoExecute when node shutdown #2556

sebastienmusso opened this issue Feb 21, 2024 · 6 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@sebastienmusso
Copy link

/kind feature

What happened:
Actually only node.cloudprovider.kubernetes.io/shutdown:NoSchedule taint is added to shutdown node.
pod with pvc still int terminating state

What you expected to happen:
When instance is shut taint node.kubernetes.io/out-of-service:NoExecute should be added to node in order to unmap and detach volume

azure cloud provider seems to update taint (https://github.com/kubernetes-sigs/cloud-provider-azure/blob/v1.29.0/pkg/provider/azure.go#L1319C18-L1319C33)

Could the openstack cloud provider handle this function and how ?
Does this code should go into openstack.go package or instances one (https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/openstack.go)
I could made the pr if needed

Environment:
openstack-cloud-controller-manager version: any
OpenStack version: any
Kubernetes version 1.24+

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 21, 2024
@sebastienmusso
Copy link
Author

Nobody interested by this feature ?

@dulek
Copy link
Contributor

dulek commented Feb 28, 2024

Looks like CAPZ creates Informer for Nodes objects to populate its Node cache and that's how taint manipulations are done. We would need to do the same here, InstancesV2 doesn't really have an interface allowing that.

I think I'm supportive of the feature, though wonder if this is exactly a responsibility of CPO and not CAPO. @mdbooth, @zetaab, what do you think?

@dulek
Copy link
Contributor

dulek commented Feb 28, 2024

According to [1] this taint should be set manually by the user, which makes me question even more if its place is in CPO.

[1] https://kubernetes.io/docs/reference/labels-annotations-taints/#node-kubernetes-io-out-of-service

@sebastienmusso
Copy link
Author

How should we handle non graceful shutdown on vanilla clusters ? How to automatically recover StatefulSet and pvc detach ? Node deletion from the cluster and Iac provider seems to be the only way for the moment with this mannualy applied taint.

@zetaab
Copy link
Member

zetaab commented Feb 28, 2024

https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/instancesv2.go#L102-L103 this is what you are trying to find?

actually that adds taint already https://github.com/kubernetes/cloud-provider/blob/71c6ce7b171ddf09943cb588f6a9427dd3282042/controllers/nodelifecycle/node_lifecycle_controller.go#L190

node.cloudprovider.kubernetes.io/shutdown:NoSchedule

the detach / attach logic is not coming from this repository for the CSI part. So if that taint is not enough, I would say its better to fix things outside of just openstack. It does not make sense that every kubernetes cloudprovider should do their own magic, instead of fixing it in central place (cloud-provider/csi core).

@jichenjc
Copy link
Contributor

jichenjc commented Mar 6, 2024

agree on @zetaab ,if there' something need to be done at CSI/cloud provider side, we need ask them to handle
at least they need provide some spec/guide to each provider if the implementation need to be done in provider only as contract

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

5 participants