Skip to content
This repository has been archived by the owner on Jan 19, 2023. It is now read-only.

When a resource is deleted, if finalizers get stuck, Octant is unclear about remedy. #2015

Closed
mklanjsek opened this issue Feb 17, 2021 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@mklanjsek
Copy link
Contributor

Looks like deleting resources is broken in 0.17. After deleting the resource, it becomes red in the UI but it never gets deleted and it stays red forever. Verified that deleing with kubectl works well on that cluster.

Screen Shot 2021-02-17 at 8 55 32 AM

@wwitzel3
Copy link
Contributor

Odd, yeah, I'm seeing this under Linux as well.

2021-02-17T11:31:47.018-0500    DEBUG   configuration/delete_object.go:30       deleting object {"action": "action.octant.dev/deleteObject", "payload": {"action":"action.octant.dev/deleteObject","apiVersion":"apps/v1","kind":"Deployment","name":"hello-kubernetes","namespace":"default"}}

The action is being handled. So we'll have to dig in to our Delete method on the DynamicCache itself.

@wwitzel3 wwitzel3 added the bug Something isn't working label Feb 17, 2021
@wwitzel3 wwitzel3 added this to To do in 0.18 via automation Feb 17, 2021
@wwitzel3 wwitzel3 added this to Groomed in Product Excellence Feb 17, 2021
@wwitzel3 wwitzel3 moved this from To do to In progress in 0.18 Feb 18, 2021
@wwitzel3 wwitzel3 self-assigned this Feb 18, 2021
@GuessWhoSamFoo
Copy link
Contributor

GuessWhoSamFoo commented Feb 20, 2021

xref: kubernetes/kubernetes#51835

Running on kind to delete a replica set, then exporting the kublet logs, I can see:

Feb 20 23:33:07 kind-control-plane kubelet[719]: E0220 23:33:07.486114     719 kubelet_pods.go:1256] Failed killing the pod "nginx-6d4cf56db6-xhbwp": failed to "KillContainer" for "nginx" with KillContainerError: "rpc error: code = NotFound desc = an error occurred when try to find container \"870485e8f2d28f1afef9073cc965983eb760f78bcb3e8a050197bf472d2b0844\": not found"
Feb 20 23:33:07 kind-control-plane kubelet[719]: E0220 23:33:07.486164     719 kubelet_pods.go:1256] Failed killing the pod "nginx-6d4cf56db6-lfx4n": failed to "KillContainer" for "nginx" with KillContainerError: "rpc error: code = NotFound desc = an error occurred when try to find container \"342c38990554a250206d0e988a91964c7c0c2499985d4c907da2f0e9523441c0\": not found"
Feb 20 23:33:07 kind-control-plane kubelet[719]: E0220 23:33:07.486215     719 kubelet_pods.go:1256] Failed killing the pod "nginx-6d4cf56db6-km287": failed to "KillContainer" for "nginx" with KillContainerError: "rpc error: code = NotFound desc = an error occurred when try to find container \"328109bdafba8a5fed021218105f2d388c722419fa2552ff84d6379761f4d5e3\": not found"
Feb 20 23:33:07 kind-control-plane kubelet[719]: E0220 23:33:07.486428     719 kubelet_pods.go:1256] Failed killing the pod "nginx-6d4cf56db6-wfbml": failed to "KillContainer" for "nginx" with KillContainerError: "rpc error: code = NotFound desc = an error occurred when try to find container \"b9f465f608ccb748bce664845df9139c3a8dcd3c68f9a1868ba16f74b498ac14\": not found"
Feb 20 23:33:07 kind-control-plane kubelet[719]: E0220 23:33:07.486641     719 kubelet_pods.go:1256] Failed killing the pod "nginx-6d4cf56db6-dx26m": failed to "KillContainer" for "nginx" with KillContainerError: "rpc error: code = NotFound desc = an error occurred when try to find container \"208cb109d90cea25a67ace1b998d5c428b0de9b278edded26e20e7c05e4da6d3\": not found"

In my case the reconciler eventually handles this and recreates the replicaset in 3-4 mins.

Edit: Also worth checking if this particular object has finalizers which can also cause it to appear "stuck"

@wwitzel3 wwitzel3 removed this from In progress in 0.18 Feb 24, 2021
@wwitzel3
Copy link
Contributor

I removed this from 0.18. Octant is deleting exactly as required via the API

I explored this more just to be sure, this was a result of stuck finalizers for the workload. If you use the YAML tab to edit the resource and remove the finalizer the deletion completes as expected.

This is essentially what happens when you issues a kubectl delete again.

That said, we may be able to provide better insight with this in Octant. We know when deletion was scheduled from the timestamp. So can see there are finalizers pending ... if deleteionTimpstamp > now() - someAmountOfTime and finalizers are present .. show some useful hints.

@wwitzel3 wwitzel3 changed the title Deleting resources not working in 0.17 When a resource is deleted, if finalizers get stuck, Octant is unclear about remedy. Feb 24, 2021
@GuessWhoSamFoo
Copy link
Contributor

xref: #1408

@wwitzel3
Copy link
Contributor

This is covered by #1408

Product Excellence automation moved this from Groomed to Done Apr 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants