Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PV is stuck at terminating after PVC is deleted #69697

Closed
leakingtapan opened this issue Oct 11, 2018 · 78 comments
Closed

PV is stuck at terminating after PVC is deleted #69697

leakingtapan opened this issue Oct 11, 2018 · 78 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@leakingtapan
Copy link

leakingtapan commented Oct 11, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

/kind feature

What happened:
I was testing EBS CSI driver. I created a PV using PVC. Then I deleted the PVC. However, PV deletion is stuck in Terminating state. Both PVC and volume is deleted without any issue. CSI driver is kept being called with DeleteVolume even if it returns success when volume not found (because it is already gone).

CSI Driver log:

I1011 20:37:29.778380       1 controller.go:175] ControllerGetCapabilities: called with args &csi.ControllerGetCapabilitiesRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
I1011 20:37:29.780575       1 controller.go:91] DeleteVolume: called with args: &csi.DeleteVolumeRequest{VolumeId:"vol-0ea6117ddb69e78fb", ControllerDeleteSecrets:map[string]string(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
I1011 20:37:29.930091       1 controller.go:99] DeleteVolume: volume not found, returning with success

external attacher log:

I1011 19:15:14.931769       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931794       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931808       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931823       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931905       1 csi_handler.go:179] PV finalizer is already set on "pvc-069128c6ccdc11e8"
I1011 19:15:14.931947       1 csi_handler.go:156] VA finalizer is already set on "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:14.931962       1 connection.go:235] GRPC call: /csi.v0.Controller/ControllerPublishVolume
I1011 19:15:14.931966       1 connection.go:236] GRPC request: volume_id:"vol-0ea6117ddb69e78fb" node_id:"i-06d0e08c9565c4db7" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_attributes:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1539123546345-8081-com.amazon.aws.csi.ebs" >
I1011 19:15:14.935053       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 19:15:14.935072       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 19:15:14.935106       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 19:15:14.952590       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 19:15:14.952613       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 19:15:14.952654       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 19:15:15.048026       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 19:15:15.048048       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 19:15:15.048167       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 19:15:15.269955       1 connection.go:238] GRPC response:
I1011 19:15:15.269986       1 connection.go:239] GRPC error: rpc error: code = Internal desc = Could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": InvalidVolume.NotFound: The volume 'vol-0ea6117ddb69e78fb' does not exist.
        status code: 400, request id: 634b33d1-71cb-4901-8ee0-98933d2a5b47
I1011 19:15:15.269998       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274440       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274464       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: rpc error: code = Internal desc = Could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": could not attach volume "vol-0ea6117ddb69e78fb" to node "i-06d0e08c9565c4db7": InvalidVolume.NotFound: The volume 'vol-0ea6117ddb69e78fb' does not exist.
        status code: 400, request id: 634b33d1-71cb-4901-8ee0-98933d2a5b47
I1011 19:15:15.274505       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274516       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274522       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274528       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.274536       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.278318       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 19:15:15.278339       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: PersistentVolume "pvc-069128c6ccdc11e8" is marked for deletion
I1011 20:37:23.328696       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328709       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328715       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328721       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.328730       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.330919       1 reflector.go:286] github.com/kubernetes-csi/external-attacher/vendor/k8s.io/client-go/informers/factory.go:87: forcing resync
I1011 20:37:23.330975       1 controller.go:197] Started PV processing "pvc-069128c6ccdc11e8"
I1011 20:37:23.330990       1 csi_handler.go:350] CSIHandler: processing PV "pvc-069128c6ccdc11e8"
I1011 20:37:23.331030       1 csi_handler.go:386] CSIHandler: processing PV "pvc-069128c6ccdc11e8": VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53" found
I1011 20:37:23.346007       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346033       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: PersistentVolume "pvc-069128c6ccdc11e8" is marked for deletion
I1011 20:37:23.346069       1 controller.go:167] Started VA processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346077       1 csi_handler.go:76] CSIHandler: processing VA "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346082       1 csi_handler.go:103] Attaching "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346088       1 csi_handler.go:208] Starting attach operation for "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.346096       1 csi_handler.go:320] Saving attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.351068       1 csi_handler.go:330] Saved attach error to "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53"
I1011 20:37:23.351090       1 csi_handler.go:86] Error processing "csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53": failed to attach: PersistentVolume "pvc-069128c6ccdc11e8" is marked for deletion
>> kk get pv
NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM            STORAGECLASS   REASON   AGE
pvc-069128c6ccdc11e8   4Gi        RWO            Delete           Terminating   default/claim1   late-sc                 22h

>> kk describe pv
Name:            pvc-069128c6ccdc11e8
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: com.amazon.aws.csi.ebs
Finalizers:      [external-attacher/com-amazon-aws-csi-ebs]
StorageClass:    late-sc
Status:          Terminating (lasts <invalid>)
Claim:           default/claim1
Reclaim Policy:  Delete
Access Modes:    RWO
Capacity:        4Gi
Node Affinity:   <none>
Message:
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            com.amazon.aws.csi.ebs
    VolumeHandle:      vol-0ea6117ddb69e78fb
    ReadOnly:          false
    VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1539123546345-8081-com.amazon.aws.csi.ebs
Events:                <none>

Storageclass:

kind: StorageClass
apiVersion: storage.k8s.io/v1                                                                                                                                                                                                                             metadata:
  name: late-sc
provisioner: com.amazon.aws.csi.ebs
volumeBindingMode: WaitForFirstConsumer

Claim:

apiVersion: v1                                                                                                                                                                                                                                            kind: PersistentVolumeClaim
metadata:
  name: claim1
spec:                                                                                                                                                                                                                                                       accessModes:
    - ReadWriteOnce
  storageClassName: late-sc
  resources:                                                                                                                                                                                                                                                  requests:
      storage: 4Gi

What you expected to happen:
After PVC is deleted, PV should be deleted along with the EBS volume (since my Reclaim Policy is delete)

How to reproduce it (as minimally and precisely as possible):
Non-deterministic so far

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): client: v1.12.0 server: v1.12.1
  • Cloud provider or hardware configuration: aws
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools: cluster is set up using kops
  • Others:
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 11, 2018
@leakingtapan
Copy link
Author

Several questions:

  1. how can I get out of this situation?
  2. should PV be terminated successfully after driver returns success even when the volume is already gone?

@leakingtapan
Copy link
Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 11, 2018
@msau42
Copy link
Member

msau42 commented Oct 11, 2018

It looks like you PV still has a finalizer from the attacher. Can you verify that the volume got successfully detached from the node?

@msau42
Copy link
Member

msau42 commented Oct 11, 2018

It may be good to get logs from the external-attacher, and also the AD controller

@msau42
Copy link
Member

msau42 commented Oct 11, 2018

cc @jsafrane

@msau42
Copy link
Member

msau42 commented Oct 11, 2018

What version of the external-attacher are you using?

@leakingtapan
Copy link
Author

it's v0.3.0. And all the other side cars are in v0.3.0 as well. I was using v0.4.0 earlier and this issue happen after I recreate the side cars in v0.3.0.

@leakingtapan
Copy link
Author

Updated the description with attacher log

@leakingtapan
Copy link
Author

It looks like you PV still has a finalizer from the attacher. Can you verify that the volume got successfully detached from the node?

The volume should be detached successfully. Since it is successfully deleted from AWS (don't think it could be deleted without detaching). Also verified on the node that the device is gone using lsblk.

@msau42
Copy link
Member

msau42 commented Oct 11, 2018

It looks like the volume was marked for deletion before an attach ever succeeded. Maybe there is some bug with handling that scenario.

Do you still see a VolumeAttachment object?

@leakingtapan
Copy link
Author

Do you still see a VolumeAttachment object?

How can I check this?

@msau42
Copy link
Member

msau42 commented Oct 11, 2018

kubectl get volumeattachment

@leakingtapan
Copy link
Author

Yep. Its still there:

>> kubectl get volumeattachment
NAME                                                                   CREATED AT
csi-3b15269e725f727786c5aec3b4da3f2eebc2477dec53d3480a3fe1dd01adea53   2018-10-10T22:30:09Z

@jsafrane
Copy link
Member

Reading the logs, it seems like A/D controller tried to attach the volume and got error from external attacher. Why it did not delete the VolumeAttachment afterwards? Do you still have a pod that uses the volume? If so, it blocks PV deletion.

@leakingtapan
Copy link
Author

There is no pod uses the volume. And PVC is gone also. How can I find A/D controller log?

@msau42
Copy link
Member

msau42 commented Oct 12, 2018

It's on the master node, controller-manager.log. You can try to filter by searching for the volume name.

@leakingtapan
Copy link
Author

leakingtapan commented Oct 12, 2018

Here is the controller log:

E1011 19:14:10.336074       1 daemon_controller.go:304] default/csi-node failed with : error storing status for daemon set &v1.DaemonSet{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, O
bjectMeta:v1.ObjectMeta{Name:"csi-node", GenerateName:"", Namespace:"default", SelfLink:"/apis/apps/v1/namespaces/default/daemonsets/csi-node", UID:"d4e56145-cd89-11e8-9e90-0abab70c948
0", ResourceVersion:"467814", Generation:1, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:63674882050, loc:(*time.Location)(0x5b9b560)}}, DeletionTimestamp:(*v1.Time)(nil), De
letionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string{"deprecated.daemonset.template.generation":"1"}, OwnerReferences:[]v1.OwnerReferenc
e(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.DaemonSetSpec{Selector:(*v1.LabelSelector)(0xc4233ac360), Template:v1.PodTemplateSpec{O
bjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*t
ime.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string{"app":"csi-node"}, Annotations:map[string]string(nil), Owner
References:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Spec:v1.PodSpec{Volumes:[]v1.Volume{v1.Volume{Name:"kubelet-dir",
VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(0xc4233ac380), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AW
SElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1
.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v
1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(
nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}, v1.Volume{Name:"plugin-dir", VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(0xc4233ac3a0), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}, v1.Volume{Name:"device-dir", VolumeSource:v1.VolumeSource{HostPath:(*v1.HostPathVolumeSource)(0xc4233ac3c0), EmptyDir:(*v1.EmptyDirVolumeSource)(nil), GCEPersistentDisk:(*v1.GCEPersistentDiskVolumeSource)(nil), AWSElasticBlockStore:(*v1.AWSElasticBlockStoreVolumeSource)(nil), GitRepo:(*v1.GitRepoVolumeSource)(nil), Secret:(*v1.SecretVolumeSource)(nil), NFS:(*v1.NFSVolumeSource)(nil), ISCSI:(*v1.ISCSIVolumeSource)(nil), Glusterfs:(*v1.GlusterfsVolumeSource)(nil), PersistentVolumeClaim:(*v1.PersistentVolumeClaimVolumeSource)(nil), RBD:(*v1.RBDVolumeSource)(nil), FlexVolume:(*v1.FlexVolumeSource)(nil), Cinder:(*v1.CinderVolumeSource)(nil), CephFS:(*v1.CephFSVolumeSource)(nil), Flocker:(*v1.FlockerVolumeSource)(nil), DownwardAPI:(*v1.DownwardAPIVolumeSource)(nil), FC:(*v1.FCVolumeSource)(nil), AzureFile:(*v1.AzureFileVolumeSource)(nil), ConfigMap:(*v1.ConfigMapVolumeSource)(nil), VsphereVolume:(*v1.VsphereVirtualDiskVolumeSource)(nil), Quobyte:(*v1.QuobyteVolumeSource)(nil), AzureDisk:(*v1.AzureDiskVolumeSource)(nil), PhotonPersistentDisk:(*v1.PhotonPersistentDiskVolumeSource)(nil), Projected:(*v1.ProjectedVolumeSource)(nil), PortworxVolume:(*v1.PortworxVolumeSource)(nil), ScaleIO:(*v1.ScaleIOVolumeSource)(nil), StorageOS:(*v1.StorageOSVolumeSource)(nil)}}}, InitContainers:[]v1.Container(nil), Containers:[]v1.Container{v1.Container{Name:"csi-driver-registrar", Image:"quay.io/k8scsi/driver-registrar:v0.3.0", Command:[]string(nil), Args:[]string{"--v=5", "--csi-address=$(ADDRESS)"}, WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar{v1.EnvVar{Name:"ADDRESS", Value:"/csi/csi.sock", ValueFrom:(*v1.EnvVarSource)(nil)}, v1.EnvVar{Name:"KUBE_NODE_NAME", Value:"", ValueFrom:(*v1.EnvVarSource)(0xc4233ac400)}}, Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount{v1.VolumeMount{Name:"plugin-dir", ReadOnly:false, MountPath:"/csi", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}}, VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"Always", SecurityContext:(*v1.SecurityContext)(0xc422ccc050), Stdin:false, StdinOnce:false, TTY:false}, v1.Container{Name:"ebs-plugin", Image:"quay.io/bertinatto/ebs-csi-driver:testing", Command:[]string(nil), Args:[]string{"--endpoint=$(CSI_ENDPOINT)", "--logtostderr", "--v=5"}, WorkingDir:"", Ports:[]v1.ContainerPort(nil), EnvFrom:[]v1.EnvFromSource(nil), Env:[]v1.EnvVar{v1.EnvVar{Name:"CSI_ENDPOINT", Value:"unix:/csi/csi.sock", ValueFrom:(*v1.EnvVarSource)(nil)}, v1.EnvVar{Name:"AWS_ACCESS_KEY_ID", Value:"", ValueFrom:(*v1.EnvVarSource)(0xc4233ac460)}, v1.EnvVar{Name:"AWS_SECRET_ACCESS_KEY", Value:"", ValueFrom:(*v1.EnvVarSource)(0xc4233ac480)}}, Resources:v1.ResourceRequirements{Limits:v1.ResourceList(nil), Requests:v1.ResourceList(nil)}, VolumeMounts:[]v1.VolumeMount{v1.VolumeMount{Name:"kubelet-dir", ReadOnly:false, MountPath:"/var/lib/kubelet", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(0xc422c717e0)}, v1.VolumeMount{Name:"plugin-dir", ReadOnly:false, MountPath:"/csi", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}, v1.VolumeMount{Name:"device-dir", ReadOnly:false, MountPath:"/dev", SubPath:"", MountPropagation:(*v1.MountPropagationMode)(nil)}}, VolumeDevices:[]v1.VolumeDevice(nil), LivenessProbe:(*v1.Probe)(nil), ReadinessProbe:(*v1.Probe)(nil), Lifecycle:(*v1.Lifecycle)(nil), TerminationMessagePath:"/dev/termination-log", TerminationMessagePolicy:"File", ImagePullPolicy:"Always", SecurityContext:(*v1.SecurityContext)(0xc422ccc0f0), Stdin:false, StdinOnce:false, TTY:false}}, RestartPolicy:"Always", TerminationGracePeriodSeconds:(*int64)(0xc422d68b30), ActiveDeadlineSeconds:(*int64)(nil), DNSPolicy:"ClusterFirst", NodeSelector:map[string]string(nil), ServiceAccountName:"csi-node-sa", DeprecatedServiceAccount:"csi-node-sa", AutomountServiceAccountToken:(*bool)(nil), NodeName:"", HostNetwork:true, HostPID:false, HostIPC:false, ShareProcessNamespace:(*bool)(nil), SecurityContext:(*v1.PodSecurityContext)(0xc42325ec60), ImagePullSecrets:[]v1.LocalObjectReference(nil), Hostname:"", Subdomain:"", Affinity:(*v1.Affinity)(nil), SchedulerName:"default-scheduler", Tolerations:[]v1.Toleration(nil), HostAliases:[]v1.HostAlias(nil), PriorityClassName:"", Priority:(*int32)(nil), DNSConfig:(*v1.PodDNSConfig)(nil), ReadinessGates:[]v1.PodReadinessGate(nil), RuntimeClassName:(*string)(nil)}}, UpdateStrategy:v1.DaemonSetUpdateStrategy{Type:"RollingUpdate", RollingUpdate:(*v1.RollingUpdateDaemonSet)(0xc424139a40)}, MinReadySeconds:0, RevisionHistoryLimit:(*int32)(0xc422d68b38)}, Status:v1.DaemonSetStatus{CurrentNumberScheduled:2, NumberMisscheduled:0, DesiredNumberScheduled:3, NumberReady:0, ObservedGeneration:1, UpdatedNumberScheduled:2, NumberAvailable:0, NumberUnavailable:3, CollisionCount:(*int32)(nil), Conditions:[]v1.DaemonSetCondition(nil)}}: Operation cannot be fulfilled on daemonsets.apps "csi-node": the object has been modified; please apply your changes to the latest version and try again
I1011 19:15:14.740106       1 pv_controller.go:601] volume "pvc-069128c6ccdc11e8" is released and reclaim policy "Delete" will be executed
I1011 19:15:14.756316       1 pv_controller.go:824] volume "pvc-069128c6ccdc11e8" entered phase "Released"
I1011 19:15:14.759557       1 pv_controller.go:1294] isVolumeReleased[pvc-069128c6ccdc11e8]: volume is released
I1011 19:15:14.939461       1 pv_controller.go:1294] isVolumeReleased[pvc-069128c6ccdc11e8]: volume is released
I1011 19:15:14.954828       1 pv_controller.go:1294] isVolumeReleased[pvc-069128c6ccdc11e8]: volume is released

The last line got repeated infinitely.

@leakingtapan
Copy link
Author

leakingtapan commented Oct 13, 2018

I encountered this issue two more times now. All on v1.12

@chandraprakash1392
Copy link

I got rid of this issue by performing the following actions:

pvc-5124cf7a-e9dc-11e8-93a1-02551748eea0   1Gi        RWO            Retain           Bound         kafka/data-pzoo-0                                         kafka-zookeeper             21h
pvc-639023b2-e9dc-11e8-93a1-02551748eea0   1Gi        RWO            Retain           Bound         kafka/data-pzoo-1                                         kafka-zookeeper             21h
pvc-7d88b184-e9dc-11e8-93a1-02551748eea0   1Gi        RWO            Retain           Bound         kafka/data-pzoo-2                                         kafka-zookeeper             21h
pvc-9ea68541-e9dc-11e8-93a1-02551748eea0   100Gi      RWO            Delete           Terminating   kafka/data-kafka-0                                        kafka-broker                21h
pvc-ae795177-e9dc-11e8-93a1-02551748eea0   100Gi      RWO            Delete           Terminating   kafka/data-kafka-1                                        kafka-broker                21h

Then I manually edited the pv individually and then removing the finalizers which looked something like this:

  - kubernetes.io/pv-protection```

Once done, the PVs those were in Terminating condition were all gone!!!

@abdennour
Copy link

Answer of @chandraprakash1392 is still valid when pvc stucks also in Terminating status.
You just need to edit the pvc object and remove finalizers object.

@msau42
Copy link
Member

msau42 commented Dec 14, 2018

Removing the finalizers is just a workaround. @bertinatto @leakingtapan could you help repro this issue and save detailed CSI driver and controller-manager logs?

@murdav
Copy link

murdav commented Dec 19, 2018

examples removing for finalizers

kubectl patch pvc db-pv-claim -p '{"metadata":{"finalizers":null}}'
kubectl patch pod db-74755f6698-8td72 -p '{"metadata":{"finalizers":null}}'

then you can delete them

!!! IMPORTANT !!!:
Read also #78106
The patch commands are a workaround and something is not working properly.
Ther volumes are still attached: kubectl get volumeattachments!

@bertinatto
Copy link
Member

bertinatto commented Dec 21, 2018

Removing the finalizers is just a workaround. @bertinatto @leakingtapan could you help repro this issue and save detailed CSI driver and controller-manager logs?

I managed to reproduce it after a few tries, although the log messages seem a bit different from the ones reported by @leakingtapan:

Plugin (provisioner): https://gist.github.com/bertinatto/16f5c1f76b1c2577cd66dbedfa4e0c7c
Plugin (attacher): https://gist.github.com/bertinatto/25ebd591ffc88d034f5b4419c0bfa040
Controller manager: https://gist.github.com/bertinatto/a2d82fdbccbf7ec0bb5e8ab65d47dcf3

@pulpbill
Copy link

pulpbill commented Jan 6, 2019

Same here, had to delete the finalizer, here's a describe for the pv:

[root@ip-172-31-44-98 stateful]# k describe pv pvc-1c6625e2-1157-11e9-a8fc-0275b365cbce Name: pvc-1c6625e2-1157-11e9-a8fc-0275b365cbce Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1a Annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner pv.kubernetes.io/bound-by-controller: yes pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs Finalizers: [kubernetes.io/pv-protection] StorageClass: default Status: Terminating (lasts <invalid>) Claim: monitoring/storage-es-data-0 Reclaim Policy: Delete Access Modes: RWO Capacity: 12Gi Node Affinity: <none> Message: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1a/vol-0a20e4f50b60df855 FSType: ext4 Partition: 0 ReadOnly: false Events: <none>

@leakingtapan
Copy link
Author

Reading the logs, it seems like A/D controller tried to attach the volume and got error from external attacher. Why it did not delete the VolumeAttachment afterwards? Do you still have a pod that uses the volume? If so, it blocks PV deletion.

@jsafrane I only have one pod, and I delete the PVC after Pod is deleted

@DMXGuru
Copy link

DMXGuru commented Sep 25, 2020 via email

@wolfewicz
Copy link

I had multiple pvcs stuck in terminating status.
kubectl describe pvcname (To get pod its attached to.)
kubectl patch pvc pvcname -p '{"metadata":{"finalizers":null}}'
kubectl patch pod podname -p '{"metadata":{"finalizers":null}}'
This worked in my K8S cluster

Thanks for posting these commands to get rid of the pvc

@jingxu97
Copy link
Contributor

jingxu97 commented Oct 8, 2020

@wolfewicz @DMXGuru if pods are deleted, pvc should not stuck in terminating states. User should not need to remove finalizer manually.
Could you reproduce your case and give some details here, so that we can help triage?

@DMXGuru
Copy link

DMXGuru commented Oct 8, 2020 via email

@jingxu97
Copy link
Contributor

jingxu97 commented Oct 8, 2020

@DMXGuru first thing I want to verify is that there are no pods running and no VolumeSnapshots are taken during PVC/PV terminating.

kubectl describe pod | grep ClaimName
kubectl describe volumesnapshot | grep persistentVolumeClaimName

Second, could you describe in what sequence did you perform pod or pvc deletion? Thanks!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 5, 2021
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@misanthropicat: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@misanthropicat
Copy link

So, still no solution? I hit by this too.

@chrisdoherty4
Copy link
Member

We've hit what seems to be the same problem using Amazon EBS (we have all the symptoms at least). Should this be raised as a new issue?

@jleni
Copy link

jleni commented Sep 15, 2021

/reopen

@k8s-ci-robot
Copy link
Contributor

@jleni: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jvleminc
Copy link

Still happening with 1.21. pv's stuck in terminating, after deleting corresponding pods, pv's and volumeattachments. Storage driver is Longhorn.
What worked was patching manually:

kubectl get pv --> to get names
kubectl patch pv <pvname> -p '{"metadata":{"finalizers":null}}'

@Omniscience619
Copy link

Encountered this error in v1.21.4 too. @chandraprakash1392 's guide worked!

Although, I have been delete and creating a high number of PV and PVCs in a short span of time. Maybe something gets bottlenecked and results in this bug?

@pixelicous
Copy link

Encountered this as well on 1.18 with ebs..

@getkub
Copy link

getkub commented Dec 4, 2021

A more handsfree option in case if someone finds useful

# Current situation where it was Stuck
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                                      STORAGECLASS   REASON   AGE
pvc-d1a578d8-a120-4b4c-b18c-54f594ed28c9   8Gi        RWO            Delete           Terminating   default/data-my-release-mariadb-galera-0   standard                40m
pvc-e31f7ae2-b421-489e-8cc1-a3e2e7606dcc   8Gi        RWO            Delete           Terminating   default/data-my-release-mariadb-galera-1   standard                39m

Finding pvc's which are in Terminating and patching them in a loop

for mypv in $(kubectl get pv -o jsonpath="{.items[*].metadata.name}" | grep -v Terminating);
do
  kubectl patch pv $mypv -p '{"metadata":{"finalizers":null}}'
done

Check to see if anything still remains

kubectl get pv

@stasyanich
Copy link

stasyanich commented Jan 7, 2022

kubectl get pv -> <pvname>
kubectl patch pv <pvname> -p "{\"metadata\":{\"finalizers\":null}}"

  1. escaping quotes
  2. single quotes -> quotes

@InonS
Copy link

InonS commented May 31, 2022

@getkub , the code you gave patches all PVs, the grep -v Terminating acts on the PV name, not its status...

My suggestion:

# Loop over all PV names
for mypv in $(kubectl get pv -o jsonpath="{.items[*].metadata.name}");
do
    # If the Status of a given PV isn't Terminating, skip to the next one
    if [ -z $(kubectl get pv $mypv -o jsonpath="{.status.phase}" | grep Terminating) ] ; then continue ; fi

    # Patch the PV in Terminating Status
    kubectl patch pv $mypv -p "{\"metadata\":{\"finalizers\":null}}"
done

And you should also take care of the Lost PVCs which were Bound to those PVs

@VGerris
Copy link

VGerris commented Oct 4, 2022

Apologies for responding to a closed ticket but I noticed a similar issue and have some questions.

was this ever solved? I experienced similar behaviour using the cinder driver and OpenStack for storage.
The issue occurs only in a certain order:

  • when a PVC is created in OKD and a Pod setup to use it, a PV is created in OKD and in OpenStack ( as a volume )
  • then, when the PV is deleted from OKD, and then the PVC, the volume in OpenStack is not deleted, but gets status available instead of ready )
  • when instead one deletes the PVC first and then the PV, things work as expected and the volume gets deleted

I am not sure if this is as designed and if not, if it should be reported as a bug/improvement here or at OpenStack.
It seems the calls are initiated from the Kubernetes side.
Thanks for any info regarding this.

@zolovin2022
Copy link

Removing the finalizer worked for me.

@brianbraunstein
Copy link

Earlier comments seem to point at the right answer: delete the pod using it.

Reposting links to those comments for better visibility:

@DMXGuru
Copy link

DMXGuru commented Apr 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests