-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trident-csi crashloops on invalid snapshot references #490
Comments
Workaround is to delete the trident transactions manually and restart the trident pods. this is still a bug that needs to be addressed. oc get tridenttransaction -n tridentoc get tridenttransaction -n trident -o json[corona@stablrebco2 ~]$ oc get ttx -n trident
To edit the CRD, use a vi editor: (Remember to save the change) kubectl (oc) edit tridenttransaction pvc-63836175-1515-4326-b73f-cae3e0963be7-snapshot-0462e2ea-f167-4846-86a3-10cc6599b4a8 -n trident Delete the line under the finalizers containing entry "trident.netapp.io".
oc delete ttx pvc-63836175-1515-4326-b73f-cae3e0963be7-snapshot-0462e2ea-f167-4846-86a3-10cc6599b4a8 -n tridentConfirm the tridenttransaction has been deleted. oc get ttx -n trident
oc get pods -n tridentoc delete pod -n trident |
will there be a patch release to fix the bug |
Hi @promothp, This bug along all other bugs are evaluated based on severity and prioritized to be fixed accordingly. If possible a bug fix for this issue would be included in the Trident v21.01 release at the end of January. |
Add my vote to having a patch sooner than later! |
This fix is included in the Trident v21.01 release with commit 0ce1aaf. |
I had this very same problem with Trident 22.10.0. |
confirmed still happening in 23.04.0 as well |
This issue is present in 23.10 as well |
I'm still getting this issue as well. It rarely blocks pvc provisioning and complains again about: |
Describe the bug
The trident-csi deployment was crashlooping because of
For some reason trident-csi was following leftover references to a snapshot on a PVC backend that didn't support snapshots.
The snapshot itself didn't exist for that PVC anymore (it might have been mistakenly created in kubernetes in the past [>1 month ago]). I even had deleted the original PVC. But the real problem was that deleting the "volumesnapshot" object for that snapshot, didn't seem to delete all the other references to it.
The backend was "ontap-nas-economy" ( https://netapp-trident.readthedocs.io/en/latest/kubernetes/operations/tasks/backends/ontap/drivers.html , the q-tree one).
It sort of looked like CSI was trying to locate a snapshot for a PV (qtree) provisioned on an 'economy' backend, but it was actually checking for the pvc volume in the regular ontap-nas backend. Which is set as our default too. I suspect this issue happened because the default storage class was the economy one, when the snapshot was created. That changed later on to set the default to ontap-nas which support backend, but the references were probably broken/not properly cleaned up at that point (?).
Those other references include:
Deleting the past references fixes the issue.
Initial state:
Some time ago, a volumesnapshot resource was created for a backend that didn't support it. Deleting that volumesnapshot didn't seem to delete the references to it on other trident crd objects.
What triggered the bug:
Restarting some nodes, restarted the kubelet on one of the nodes that had the volumeattachment for the snapshot/pvc.
Environment
Openshift 4.6.*
To Reproduce
Expected behavior
No crashlooping on broken TridentTransaction references.
Print a warning and continue OR cleanup the broken reference OR add a flag to toggle this behaviour on/off.
The text was updated successfully, but these errors were encountered: