PV is stuck at terminating after PVC is deleted #69697

leakingtapan · 2018-10-11T20:39:29Z

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
I was testing EBS CSI driver. I created a PV using PVC. Then I deleted the PVC. However, PV deletion is stuck in Terminating state. Both PVC and volume is deleted without any issue. CSI driver is kept being called with DeleteVolume even if it returns success when volume not found (because it is already gone).

CSI Driver log:

I1011 20:37:29.778380       1 controller.go:175] ControllerGetCapabilities: called with args &csi.ControllerGetCapabilitiesRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
I1011 20:37:29.780575       1 controller.go:91] DeleteVolume: called with args: &csi.DeleteVolumeRequest{VolumeId:"vol-0ea6117ddb69e78fb", ControllerDeleteSecrets:map[string]string(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
I1011 20:37:29.930091       1 controller.go:99] DeleteVolume: volume not found, returning with success

external attacher log:

I1011 19:15:14.931769       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931794       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931808       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931823       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931905       1 csi_handler.go:179] PV finalizer is already set on "pvc-069128c6ccdc11e8"
I1011 19:15:14.931947       1 csi_handler.go:156] VA finalizer is already set on "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931962       1 connection.go:235] GRPC call: /csi.v0.Controller/ControllerPublishVolume
I1011 19:15:14.931966       1 connection.go:236] GRPC request: volume_id:"vol-0ea6117ddb69e78fb" node_id:"i-06d0e08c9565c4db7" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_attributes:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1539123546345-8081-com.amazon.aws.csi.ebs" >
I1011 19:15:14.935053       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 19:15:14.935072       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 19:15:14.935106       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 19:15:14.952590       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 19:15:14.952613       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 19:15:14.952654       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 19:15:15.048026       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 19:15:15.048048       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 19:15:15.048167       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 19:15:15.269955       1 connection.go:238] GRPC response:
I1011 19:15:15.269986       1 connection.go:239] GRPC error: rpc error: code = Internal desc = Could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": InvalidVolume.NotFound: The volume 'vol-0ea6117ddb69e78fb' does not exist.
        status code: 400, request id: 634b33d1-71cb-4901-8ee0-98933d2a5b47
I1011 19:15:15.269998       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274440       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274464       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: rpc error: code = Internal desc = Could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": InvalidVolume.NotFound: The volume 'vol-0ea6117ddb69e78fb' does not exist.
        status code: 400, request id: 634b33d1-71cb-4901-8ee0-98933d2a5b47
I1011 19:15:15.274505       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274516       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274522       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274528       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274536       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.278318       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.278339       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: PersistentVolume "pvc-069128c6ccdc11e8" is marked for deletion

I1011 20:37:23.328696       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328709       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328715       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328721       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328730       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.330919       1 reflector.go:286] github.com/kubernetes-csi/external-attacher/vendor/k8s.io/client-go/informers/factory.go:87: forcing resync
I1011 20:37:23.330975       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 20:37:23.330990       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 20:37:23.331030       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 20:37:23.346007       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346033       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: PersistentVolume "pvc-069128c6ccdc11e8" is marked for deletion
I1011 20:37:23.346069       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346077       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346082       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346088       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346096       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.351068       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.351090       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: PersistentVolume "pvc-069128c6ccdc11e8" is marked for deletion

>> kk get pv
NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM            STORAGECLASS   REASON   AGE
pvc-069128c6ccdc11e8   4Gi        RWO            Delete           Terminating   default/claim1   late-sc                 22h

>> kk describe pv
Name:            pvc-069128c6ccdc11e8
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: com.amazon.aws.csi.ebs
Finalizers:      [external-attacher/com-amazon-aws-csi-ebs]
StorageClass:    late-sc
Status:          Terminating (lasts <invalid>)
Claim:           default/claim1
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        4Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            com.amazon.aws.csi.ebs
    VolumeHandle:      vol-0ea6117ddb69e78fb
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1539123546345-8081-com.amazon.aws.csi.ebs
Events:                <none>

Storageclass:

kind: StorageClass
apiVersion: storage.k8s.io/v1                                                                                                                                                                                                                             metadata:
  name: late-sc
provisioner: com.amazon.aws.csi.ebs
volumeBindingMode: WaitForFirstConsumer

Claim:

apiVersion: v1                                                                                                                                                                                                                                            kind: PersistentVolumeClaim
metadata:
  name: claim1
spec:                                                                                                                                                                                                                                                       accessModes:
    - ReadWriteOnce
  storageClassName: late-sc
  resources:                                                                                                                                                                                                                                                  requests:
      storage: 4Gi

What you expected to happen:
After PVC is deleted, PV should be deleted along with the EBS volume (since my Reclaim Policy is delete)

How to reproduce it (as minimally and precisely as possible):
Non-deterministic so far

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): client: v1.12.0 server: v1.12.1
Cloud provider or hardware configuration: aws
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools: cluster is set up using kops
Others:

The text was updated successfully, but these errors were encountered:

leakingtapan · 2018-10-11T20:40:24Z

Several questions:

how can I get out of this situation?
should PV be terminated successfully after driver returns success even when the volume is already gone?

leakingtapan · 2018-10-11T20:40:35Z

/sig storage

msau42 · 2018-10-11T20:45:30Z

It looks like you PV still has a finalizer from the attacher. Can you verify that the volume got successfully detached from the node?

msau42 · 2018-10-11T20:46:10Z

It may be good to get logs from the external-attacher, and also the AD controller

msau42 · 2018-10-11T20:46:39Z

cc @jsafrane

msau42 · 2018-10-11T20:48:00Z

What version of the external-attacher are you using?

leakingtapan · 2018-10-11T21:01:48Z

it's v0.3.0. And all the other side cars are in v0.3.0 as well. I was using v0.4.0 earlier and this issue happen after I recreate the side cars in v0.3.0.

leakingtapan · 2018-10-11T21:06:04Z

Updated the description with attacher log

leakingtapan · 2018-10-11T21:20:03Z

It looks like you PV still has a finalizer from the attacher. Can you verify that the volume got successfully detached from the node?

The volume should be detached successfully. Since it is successfully deleted from AWS (don't think it could be deleted without detaching). Also verified on the node that the device is gone using lsblk.

msau42 · 2018-10-11T21:26:50Z

It looks like the volume was marked for deletion before an attach ever succeeded. Maybe there is some bug with handling that scenario.

Do you still see a VolumeAttachment object?

leakingtapan · 2018-10-11T21:34:47Z

Do you still see a VolumeAttachment object?

How can I check this?

msau42 · 2018-10-11T21:42:12Z

kubectl get volumeattachment

leakingtapan · 2018-10-11T21:52:09Z

Yep. Its still there:

>> kubectl get volumeattachment
NAME                                                                   CREATED AT
csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53   2018-10-10T22:30:09Z

jsafrane · 2018-10-12T07:02:26Z

Reading the logs, it seems like A/D controller tried to attach the volume and got error from external attacher. Why it did not delete the VolumeAttachment afterwards? Do you still have a pod that uses the volume? If so, it blocks PV deletion.

leakingtapan · 2018-10-12T16:59:23Z

There is no pod uses the volume. And PVC is gone also. How can I find A/D controller log?

msau42 · 2018-10-12T17:30:06Z

It's on the master node, controller-manager.log. You can try to filter by searching for the volume name.

leakingtapan · 2018-10-12T22:50:45Z

Here is the controller log:

E1011 19:14:10.336074       1 daemon_controller.go:304] default/csi-node failed with : error storing status for daemon set &v1.DaemonSet{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, O
bjectMeta:v1.ObjectMeta{Name:"csi-node", GenerateName:"", Namespace:"default", SelfLink:"/apis/apps/v1/namespaces/default/daemonsets/csi-node", UID:"d4e56145-cd89-11e8-9e90-0abab70c948
0", ResourceVersion:"467814", Generation:1, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63674882050, loc:(*time.Location)(0x5b9b560)}}, DeletionTimestamp:(*v1.Time)(nil), De
letionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"deprecated.daemonset.template.generation":"1"}, OwnerReferences:[]v1.OwnerReferenc
e(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.DaemonSetSpec{Selector:(*v1.LabelSelector)(0xc4233ac360), Template:v1.PodTemplateSpec{O
bjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*t
ime.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app":"csi-node"}, Annotations:map[string]string(nil), Owner
References:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.PodSpec{Volumes:[]v1.Volume{v1.Volume{Name:"kubelet-dir",
VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(0xc4233ac380), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AW
SElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1
.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v
1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(
nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}, v1.Volume{Name:"plugin-dir", VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(0xc4233ac3a0), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}, v1.Volume{Name:"device-dir", VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(0xc4233ac3c0), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}}, InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"csi-driver-registrar", Image:"quay.io/k8scsi/driver-registrar:v0.3.0", Command:[]string(nil), Args:[]string{"--v=5", "--csi-address=$(ADDRESS)"}, WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar{v1.EnvVar{Name:"ADDRESS", Value:"/csi/csi.sock", ValueFrom:(*v1.EnvVarSource)(nil)}, v1.EnvVar{Name:"KUBE_NODE_NAME", Value:"", ValueFrom:(*v1.EnvVarSource)(0xc4233ac400)}}, Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount{v1.VolumeMount{Name:"plugin-dir", ReadOnly:false, MountPath:"/csi", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}}, VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"Always", SecurityContext:(*v1.SecurityContext)(0xc422ccc050), Stdin:false, StdinOnce:false, TTY:false}, v1.Container{Name:"ebs-plugin", Image:"quay.io/bertinatto/ebs-csi-driver:testing", Command:[]string(nil), Args:[]string{"--endpoint=$(CSI_ENDPOINT)", "--logtostderr", "--v=5"}, WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar{v1.EnvVar{Name:"CSI_ENDPOINT", Value:"unix:/csi/csi.sock", ValueFrom:(*v1.EnvVarSource)(nil)}, v1.EnvVar{Name:"AWS_ACCESS_KEY_ID", Value:"", ValueFrom:(*v1.EnvVarSource)(0xc4233ac460)}, v1.EnvVar{Name:"AWS_SECRET_ACCESS_KEY", Value:"", ValueFrom:(*v1.EnvVarSource)(0xc4233ac480)}}, Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount{v1.VolumeMount{Name:"kubelet-dir", ReadOnly:false, MountPath:"/var/lib/kubelet", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(0xc422c717e0)}, v1.VolumeMount{Name:"plugin-dir", ReadOnly:false, MountPath:"/csi", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}, v1.VolumeMount{Name:"device-dir", ReadOnly:false, MountPath:"/dev", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}}, VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"Always", SecurityContext:(*v1.SecurityContext)(0xc422ccc0f0), Stdin:false, StdinOnce:false, TTY:false}}, RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc422d68b30), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"csi-node-sa", DeprecatedServiceAccount:"csi-node-sa", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:true, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(0xc42325ec60), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil)}}, UpdateStrategy:v1.DaemonSetUpdateStrategy{Type:"RollingUpdate", RollingUpdate:(*v1.RollingUpdateDaemonSet)(0xc424139a40)}, MinReadySeconds:0, RevisionHistoryLimit:(*int32)(0xc422d68b38)}, Status:v1.DaemonSetStatus{CurrentNumberScheduled:2, NumberMisscheduled:0, DesiredNumberScheduled:3, NumberReady:0, ObservedGeneration:1, UpdatedNumberScheduled:2, NumberAvailable:0, NumberUnavailable:3, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}}: Operation cannot be fulfilled on daemonsets.apps "csi-node": the object has been modified; please apply your changes to the latest version and try again
I1011 19:15:14.740106       1 pv_controller.go:601] volume "pvc-069128c6ccdc11e8" is released and reclaim policy "Delete" will be executed
I1011 19:15:14.756316       1 pv_controller.go:824] volume "pvc-069128c6ccdc11e8" entered phase "Released"
I1011 19:15:14.759557       1 pv_controller.go:1294] isVolumeReleased[pvc-069128c6ccdc11e8]: volume is released
I1011 19:15:14.939461       1 pv_controller.go:1294] isVolumeReleased[pvc-069128c6ccdc11e8]: volume is released
I1011 19:15:14.954828       1 pv_controller.go:1294] isVolumeReleased[pvc-069128c6ccdc11e8]: volume is released

The last line got repeated infinitely.

leakingtapan · 2018-10-13T03:15:10Z

I encountered this issue two more times now. All on v1.12

chandraprakash1392 · 2018-11-17T18:03:31Z

I got rid of this issue by performing the following actions:

pvc-5124cf7a-e9dc-11e8-93a1-02551748eea0   1Gi        RWO            Retain           Bound         kafka/data-pzoo-0                                         kafka-zookeeper             21h
pvc-639023b2-e9dc-11e8-93a1-02551748eea0   1Gi        RWO            Retain           Bound         kafka/data-pzoo-1                                         kafka-zookeeper             21h
pvc-7d88b184-e9dc-11e8-93a1-02551748eea0   1Gi        RWO            Retain           Bound         kafka/data-pzoo-2                                         kafka-zookeeper             21h
pvc-9ea68541-e9dc-11e8-93a1-02551748eea0   100Gi      RWO            Delete           Terminating   kafka/data-kafka-0                                        kafka-broker                21h
pvc-ae795177-e9dc-11e8-93a1-02551748eea0   100Gi      RWO            Delete           Terminating   kafka/data-kafka-1                                        kafka-broker                21h

Then I manually edited the pv individually and then removing the finalizers which looked something like this:

  - kubernetes.io/pv-protection```

Once done, the PVs those were in Terminating condition were all gone!!!

abdennour · 2018-12-14T03:31:53Z

Answer of @chandraprakash1392 is still valid when pvc stucks also in Terminating status.
You just need to edit the pvc object and remove finalizers object.

msau42 · 2018-12-14T16:04:50Z

Removing the finalizers is just a workaround. @bertinatto @leakingtapan could you help repro this issue and save detailed CSI driver and controller-manager logs?

murdav · 2018-12-19T10:14:05Z

examples removing for finalizers

kubectl patch pvc db-pv-claim -p '{"metadata":{"finalizers":null}}'
kubectl patch pod db-74755f6698-8td72 -p '{"metadata":{"finalizers":null}}'

then you can delete them

!!! IMPORTANT !!!:
Read also #78106
The patch commands are a workaround and something is not working properly.
Ther volumes are still attached: kubectl get volumeattachments!

bertinatto · 2018-12-21T11:50:31Z

Removing the finalizers is just a workaround. @bertinatto @leakingtapan could you help repro this issue and save detailed CSI driver and controller-manager logs?

I managed to reproduce it after a few tries, although the log messages seem a bit different from the ones reported by @leakingtapan:

Plugin (provisioner): https://gist.github.com/bertinatto/16f5c1f76b1c2577cd66dbedfa4e0c7c
Plugin (attacher): https://gist.github.com/bertinatto/25ebd591ffc88d034f5b4419c0bfa040
Controller manager: https://gist.github.com/bertinatto/a2d82fdbccbf7ec0bb5e8ab65d47dcf3

pulpbill · 2019-01-06T02:31:58Z

Same here, had to delete the finalizer, here's a describe for the pv:

[root@ip-172-31-44-98 stateful]# k describe pv pvc-1c6625e2-1157-11e9-a8fc-0275b365cbce Name: pvc-1c6625e2-1157-11e9-a8fc-0275b365cbce Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1a Annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner pv.kubernetes.io/bound-by-controller: yes pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs Finalizers: [kubernetes.io/pv-protection] StorageClass: default Status: Terminating (lasts <invalid>) Claim: monitoring/storage-es-data-0 Reclaim Policy: Delete Access Modes: RWO Capacity: 12Gi Node Affinity: <none> Message: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1a/vol-0a20e4f50b60df855 FSType: ext4 Partition: 0 ReadOnly: false Events: <none>

leakingtapan · 2019-01-07T20:56:44Z

Reading the logs, it seems like A/D controller tried to attach the volume and got error from external attacher. Why it did not delete the VolumeAttachment afterwards? Do you still have a pod that uses the volume? If so, it blocks PV deletion.

@jsafrane I only have one pod, and I delete the PVC after Pod is deleted

DMXGuru · 2020-09-25T09:56:11Z

No, when this happens to my PV, the pod no longer exists. On Friday, September 25, 2020, 04:26:04 AM CDT, lsambolino <notifications@github.com> wrote: Solved the issue of PVC and PV stucked in "terminating" state by deleting the pod using it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

wolfewicz · 2020-10-08T18:48:34Z

I had multiple pvcs stuck in terminating status.
kubectl describe pvcname (To get pod its attached to.)
kubectl patch pvc pvcname -p '{"metadata":{"finalizers":null}}'
kubectl patch pod podname -p '{"metadata":{"finalizers":null}}'
This worked in my K8S cluster

Thanks for posting these commands to get rid of the pvc

jingxu97 · 2020-10-08T19:29:43Z

@wolfewicz @DMXGuru if pods are deleted, pvc should not stuck in terminating states. User should not need to remove finalizer manually.
Could you reproduce your case and give some details here, so that we can help triage?

DMXGuru · 2020-10-08T20:29:54Z

How and what details would you like? The kubectl commands and output showing this behavior and then a kubectl describe and kubectl get -o yaml for the resultant PV?

…

Sent from my iPhone

On Oct 8, 2020, at 14:30, Jing Xu ***@***.***> wrote: @wolfewicz @DMXGuru if pods are deleted, pvc should not stuck in terminating states. User should not need to remove finalizer manually. Could you reproduce your case and give some details here, so that we can help triage? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

jingxu97 · 2020-10-08T21:38:37Z

@DMXGuru first thing I want to verify is that there are no pods running and no VolumeSnapshots are taken during PVC/PV terminating.

kubectl describe pod | grep ClaimName
kubectl describe volumesnapshot | grep persistentVolumeClaimName

Second, could you describe in what sequence did you perform pod or pvc deletion? Thanks!

fejta-bot · 2021-01-06T22:24:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-02-05T23:08:32Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2021-03-07T23:54:05Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-03-07T23:54:13Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-07-06T13:33:36Z

@misanthropicat: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

misanthropicat · 2021-07-06T13:41:40Z

So, still no solution? I hit by this too.

chrisdoherty4 · 2021-08-13T17:39:47Z

We've hit what seems to be the same problem using Amazon EBS (we have all the symptoms at least). Should this be raised as a new issue?

jleni · 2021-09-15T14:19:19Z

/reopen

k8s-ci-robot · 2021-09-15T14:19:34Z

@jleni: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jvleminc · 2021-09-26T14:57:24Z

Still happening with 1.21. pv's stuck in terminating, after deleting corresponding pods, pv's and volumeattachments. Storage driver is Longhorn.
What worked was patching manually:

kubectl get pv --> to get names
kubectl patch pv <pvname> -p '{"metadata":{"finalizers":null}}'

Omniscience619 · 2021-10-13T14:42:42Z

Encountered this error in v1.21.4 too. @chandraprakash1392 's guide worked!

Although, I have been delete and creating a high number of PV and PVCs in a short span of time. Maybe something gets bottlenecked and results in this bug?

pixelicous · 2021-11-03T17:48:32Z

Encountered this as well on 1.18 with ebs..

getkub · 2021-12-04T21:51:28Z

A more handsfree option in case if someone finds useful

# Current situation where it was Stuck
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                                      STORAGECLASS   REASON   AGE
pvc-d1a578d8-a120-4b4c-b18c-54f594ed28c9   8Gi        RWO            Delete           Terminating   default/data-my-release-mariadb-galera-0   standard                40m
pvc-e31f7ae2-b421-489e-8cc1-a3e2e7606dcc   8Gi        RWO            Delete           Terminating   default/data-my-release-mariadb-galera-1   standard                39m

Finding pvc's which are in Terminating and patching them in a loop

for mypv in $(kubectl get pv -o jsonpath="{.items[*].metadata.name}" | grep -v Terminating);
do
  kubectl patch pv $mypv -p '{"metadata":{"finalizers":null}}'
done

Check to see if anything still remains

kubectl get pv

stasyanich · 2022-01-07T16:35:12Z

kubectl get pv -> <pvname>
kubectl patch pv <pvname> -p "{\"metadata\":{\"finalizers\":null}}"

escaping quotes
single quotes -> quotes

InonS · 2022-05-31T17:30:22Z

@getkub , the code you gave patches all PVs, the grep -v Terminating acts on the PV name, not its status...

My suggestion:

# Loop over all PV names
for mypv in $(kubectl get pv -o jsonpath="{.items[*].metadata.name}");
do
    # If the Status of a given PV isn't Terminating, skip to the next one
    if [ -z $(kubectl get pv $mypv -o jsonpath="{.status.phase}" | grep Terminating) ] ; then continue ; fi

    # Patch the PV in Terminating Status
    kubectl patch pv $mypv -p "{\"metadata\":{\"finalizers\":null}}"
done

And you should also take care of the Lost PVCs which were Bound to those PVs

VGerris · 2022-10-04T14:47:50Z

Apologies for responding to a closed ticket but I noticed a similar issue and have some questions.

was this ever solved? I experienced similar behaviour using the cinder driver and OpenStack for storage.
The issue occurs only in a certain order:

when a PVC is created in OKD and a Pod setup to use it, a PV is created in OKD and in OpenStack ( as a volume )
then, when the PV is deleted from OKD, and then the PVC, the volume in OpenStack is not deleted, but gets status available instead of ready )
when instead one deletes the PVC first and then the PV, things work as expected and the volume gets deleted

I am not sure if this is as designed and if not, if it should be reported as a bug/improvement here or at OpenStack.
It seems the calls are initiated from the Kubernetes side.
Thanks for any info regarding this.

zolovin2022 · 2023-07-31T06:50:16Z

Removing the finalizer worked for me.

brianbraunstein · 2024-04-09T13:21:41Z

Earlier comments seem to point at the right answer: delete the pod using it.

Reposting links to those comments for better visibility:

PV is stuck at terminating after PVC is deleted #69697 (comment) from @lsambolino
PV is stuck at terminating after PVC is deleted #69697 (comment) from @jsafrane

DMXGuru · 2024-04-09T13:55:55Z

I did that. That’s what stuck the pv. I’ve given up on k8s and gone to proxmox. Sent from my iPhoneOn Apr 9, 2024, at 08:22, brianbraunstein ***@***.***> wrote: Earlier comments seem to point at the right answer: delete the pod using it. Reposting links to those comments for better visibility: #69697 (comment) from @lsambolino #69697 (comment) from @jsafrane —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 11, 2018

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 11, 2018

leakingtapan mentioned this issue Jan 7, 2019

Pre-provisioned volumes is not deleted with persistentVolumeReclaimPolicy Delete kubernetes-sigs/aws-ebs-csi-driver#164

Closed

gohilankit mentioned this issue Nov 16, 2020

Long delay in DetachVolume invocation after the pod is deleted #96620

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 5, 2021

k8s-ci-robot closed this as completed Mar 7, 2021

tamilselvan1102 mentioned this issue Jun 7, 2023

system disk volume unstage not invoked when virtual machine which created by kubevirt is deleted #118503

Closed

PV is stuck at terminating after PVC is deleted #69697

PV is stuck at terminating after PVC is deleted #69697

Comments

leakingtapan commented Oct 11, 2018 • edited

leakingtapan commented Oct 11, 2018

leakingtapan commented Oct 11, 2018

msau42 commented Oct 11, 2018

msau42 commented Oct 11, 2018

msau42 commented Oct 11, 2018

msau42 commented Oct 11, 2018

leakingtapan commented Oct 11, 2018

leakingtapan commented Oct 11, 2018

leakingtapan commented Oct 11, 2018

msau42 commented Oct 11, 2018

leakingtapan commented Oct 11, 2018

msau42 commented Oct 11, 2018

leakingtapan commented Oct 11, 2018

jsafrane commented Oct 12, 2018

leakingtapan commented Oct 12, 2018

msau42 commented Oct 12, 2018

leakingtapan commented Oct 12, 2018 • edited

leakingtapan commented Oct 13, 2018 • edited

chandraprakash1392 commented Nov 17, 2018

abdennour commented Dec 14, 2018

msau42 commented Dec 14, 2018

murdav commented Dec 19, 2018 • edited

bertinatto commented Dec 21, 2018 • edited

pulpbill commented Jan 6, 2019

leakingtapan commented Jan 7, 2019

DMXGuru commented Sep 25, 2020 via email

wolfewicz commented Oct 8, 2020

jingxu97 commented Oct 8, 2020

DMXGuru commented Oct 8, 2020 via email

jingxu97 commented Oct 8, 2020

fejta-bot commented Jan 6, 2021

fejta-bot commented Feb 5, 2021

fejta-bot commented Mar 7, 2021

k8s-ci-robot commented Mar 7, 2021

k8s-ci-robot commented Jul 6, 2021

misanthropicat commented Jul 6, 2021

chrisdoherty4 commented Aug 13, 2021

jleni commented Sep 15, 2021

k8s-ci-robot commented Sep 15, 2021

jvleminc commented Sep 26, 2021

Omniscience619 commented Oct 13, 2021

pixelicous commented Nov 3, 2021

getkub commented Dec 4, 2021

Finding pvc's which are in Terminating and patching them in a loop

Check to see if anything still remains

stasyanich commented Jan 7, 2022 • edited

InonS commented May 31, 2022 • edited

VGerris commented Oct 4, 2022

zolovin2022 commented Jul 31, 2023

brianbraunstein commented Apr 9, 2024

DMXGuru commented Apr 9, 2024 via email

leakingtapan commented Oct 11, 2018 •

edited

leakingtapan commented Oct 12, 2018 •

edited

leakingtapan commented Oct 13, 2018 •

edited

murdav commented Dec 19, 2018 •

edited

bertinatto commented Dec 21, 2018 •

edited

stasyanich commented Jan 7, 2022 •

edited

InonS commented May 31, 2022 •

edited