New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Mounted volumes on downed nodes are de-facto unmounted, treat them as such #2270
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: briantopping The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @briantopping. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gnufied, @lipingxue: This patch should close out Gardener's need to ship with a fork of the CSI driver. How can I help here? 😊 |
cc: @chethanv28 Hopefully this is something that can make it into the 3.0.0 release. The new code simply searches for nodes and is only called by this check. |
@divyenpatel / @chethanv28 Would still like to see if we can get this reviewed for 3.0.0 release. What are the chances? 🤓 |
@divyenpatel / @chethanv28 I have updated the PR problem statement. Could I get some feedback please? |
7627b48
to
83f3ba2
Compare
"Marking ControllerUnpublishVolume for Volume: %q as successful.", req.NodeId, req.VolumeId) | ||
return &csi.ControllerUnpublishVolumeResponse{}, "", nil | ||
} else { | ||
if err == cnsnode.ErrNodeNotFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we get cnsvsphere.ErrVMNotFound
at line 2268, it is already confirmed that VM is not present in the VC inventory. Why do we need to check it again using DNS name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when UseCSINodeId feature is enabled, when get can call to detach the volume we will get Node VM UUID in the controllerunpublish request. Using Node VM UUID we can look up if Node VM is present on the system or not. if it is present we will get VM MoRef and using which we will issue detach call on the vCenter.
Using DNS to look up a node VM may not be a reliable method, especially if there are multiple VMs with the same DNS name on the vCenter. A better option is to use the UUID of the VM as a unique identifier to look up the node VM on the vCenter. The UUID is a unique identifier for each VM, making it the best option for verifying the presence of a specific node VM on the vCenter. Therefore, it is recommended to use the UUID of the VM instead of DNS when looking up a node VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the first issue, the transitive closure of calls under c.nodeMgr.GetNodeByName()
may return errors for a large number of reasons outside of node failure. The code should not respond that the volume has been released unless it is confirmed that the node was actually lost.
I will update the patch with the UUID changes, thank you for that feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the use of DNS vs. node UUID, this code has been in production for some time and I forgot about the mechanics mentioned in line 2271: "As the VM UUID is not known anymore, need to search by using NodeID as DNS Name". Do you have some consideration of this state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Node can be returned using c.nodeMgr.GetNodeByName
, because Node object is deleted from API server. We do deeper look into vCenter using c.nodeMgr.GetNodeByUuid(ctx, req.NodeId)
.
Refer to following implementation and let us know what is missing in checking Node VM existence before we mark detach as scuess.
nodevm, err = c.nodeMgr.GetNodeByName(ctx, req.NodeId)
if err == node.ErrNodeNotFound {
log.Infof("Performing node VM lookup using node VM UUID: %q", req.NodeId)
nodevm, err = c.nodeMgr.GetNodeByUuid(ctx, req.NodeId)
}
func (nodes *Nodes) GetNodeByUuid(ctx context.Context, nodeUuid string) (*cnsvsphere.VirtualMachine, error) {
return nodes.cnsNodeManager.GetNode(ctx, nodeUuid, nil)
}
func (m *defaultManager) GetNode(ctx context.Context,
nodeUUID string, dc *vsphere.Datacenter) (*vsphere.VirtualMachine, error) {
log := logger.GetLogger(ctx)
vmInf, discovered := m.nodeVMs.Load(nodeUUID)
if !discovered {
log.Infof("Node hasn't been discovered yet with nodeUUID %s", nodeUUID)
var vm *vsphere.VirtualMachine
var err error
if dc != nil {
vm, err = dc.GetVirtualMachineByUUID(ctx, nodeUUID, false)
if err != nil {
log.Errorf("failed to find node with nodeUUID %s on datacenter: %+v with err: %v", nodeUUID, dc, err)
return nil, err
}
m.nodeVMs.Store(nodeUUID, vm)
} else {
if err = m.DiscoverNode(ctx, nodeUUID); err != nil {
log.Errorf("failed to discover node with nodeUUID %s with err: %v", nodeUUID, err)
return nil, err
}
vmInf, _ = m.nodeVMs.Load(nodeUUID)
vm = vmInf.(*vsphere.VirtualMachine)
}
log.Infof("Node was successfully discovered with nodeUUID %s in vm %v", nodeUUID, vm)
return vm, nil
}
vm := vmInf.(*vsphere.VirtualMachine)
log.Debugf("Renewing virtual machine %v with nodeUUID %q", vm, nodeUUID)
if err := vm.Renew(ctx, true); err != nil {
log.Errorf("failed to renew VM %v with nodeUUID %q with err: %v", vm, nodeUUID, err)
return nil, err
}
log.Debugf("VM %v was successfully renewed with nodeUUID %q", vm, nodeUUID)
return vm, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @divyenpatel I misread your initial response. Understand what you are after now and will get that resolved. Switching this to draft.
Signed-off-by: Brian Topping <brian.topping@sap.com>
abf04f8
to
b50770b
Compare
@briantopping {"level":"error","time":"2023-03-28T10:16:46.643922378Z","caller":"node/manager.go:226","msg":"failed to discover node with nodeUUID 4207bfa2-96c2-345b-90a1-5404602c4788 with err: virtual machine wasn't found","TraceId":"98fadc37-c821-44e9-bd53-b6c444992a77","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*defaultManager).GetNode\n\t/build/pkg/common/cns-lib/node/manager.go:226\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/common/cns-lib/node.(*Nodes).GetNodeByUuid\n\t/build/pkg/common/cns-lib/node/nodes.go:203\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume.func1\n\t/build/pkg/csi/service/vanilla/controller.go:2262\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).ControllerUnpublishVolume\n\t/build/pkg/csi/service/vanilla/controller.go:2285\ngithub.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerUnpublishVolume_Handler\n\t/go/pkg/mod/github.com/container-storage-interface/spec@v1.7.0/lib/go/csi/csi.pb.go:5725\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922"}
{"level":"info","time":"2023-03-28T10:16:46.643984979Z","caller":"vanilla/controller.go:2269","msg":"Virtual Machine for Node ID: 4207bfa2-96c2-345b-90a1-5404602c4788 is not present in the VC Inventory. Marking ControllerUnpublishVolume for Volume: \"e331b82f-48c6-439a-bf12-39fcb1f09e63\" as successful.","TraceId":"98fadc37-c821-44e9-bd53-b6c444992a77"}
{"level":"info","time":"2023-03-28T10:16:46.644047899Z","caller":"vanilla/controller.go:2294","msg":"Volume \"e331b82f-48c6-439a-bf12-39fcb1f09e63\" detached successfully from node \"4207bfa2-96c2-345b-90a1-5404602c4788\".","TraceId":"98fadc37-c821-44e9-bd53-b6c444992a77"} |
@MartinWeindel this is excellent news. I was in the process of setting up a test rig for this and now I guess that won't be immediately necessary. Will set it up anyway for next time. Thanks a bunch for your efforts on this issue over the last year!! |
Problem
If a node is running on a virtual machine that is shutdown, halted or crashes, Kubernetes is unaware and doesn't release the a pod's claim (PVC) on the PV. This is a very common occurrence in clusters with default shutdown configurations. A node's
systemd
(for example) sees the node CRI as a monolithic process, does not know that containers need to shut down gracefully to release a PV claim, and typically just kills the CRI and all the containers far too quickly.The PVC is binding of the PV to a pod in
etcd
and that state is persisted beyond the destruction of the node.At some point, the vSphere CSI driver will be sent an
ControllerUnpublishVolume
event. This will of course fail and without this fix, the CSI driver will propagate the error that it was unable to unmount the volume at the ESX level. This is partially correct, but the call will be repeated and halt graceful shutdown of a cluster indefinitely.What this PR does / why we need it:
A CSI volume is in fact unmounted if the volume is mounted on a VM hosted on a downed node. When we In
controller.ControllerUnpublishVolume()
, when an unmount request is received and we cannot find the node that the volume should be mounted through, the volume is in fact unmounted (albeit ungracefully) and should reported as such without error. This will cause the control plane to properly delete the PVC and another pod may now bind with it or the cluster may continue to shut down.If the unmount is unsuccessful, this fix confirms the node is lost and returns that the volume is unmounted as requested.
If the node is backed with some form of HA and the node was migrated elsewhere, the ControllerUnpublishVolume event (once issued) will succeed without this fix because the node it was issued to will have been transparently recovered. This fix only activates when the unmount is unsuccessful, and then only if the ESX inquiry indicates the node (VM) is gone.
Works around kubernetes/kubernetes#51835 which appears to remain unresolved and administratively closed.
Which issue this PR fixes
fixes #15
Release note: