All Persistant Volumes fail permanently after NAS reboot #232

cbc02009 · 2022-08-31T13:56:42Z

Whenever I reboot the OS on the NAS that hosts my ISCSI democratic-csi volumes, all containers that rely on those volumes fail consistently even after the NAS comes back online with the following error:

  Warning  FailedMount  37s               kubelet            MountVolume.MountDevice failed for volume "pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount  34s               kubelet            Unable to attach or mount volumes: unmounted volumes=[config], unattached volumes=[config media transcode kube-api-access-2c2w7 backup]: timed out waiting for the condition
  Warning  FailedMount  5s (x6 over 37s)  kubelet            MountVolume.MountDevice failed for volume "pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e" : rpc error: code = Aborted desc = operation locked due to in progress operation(s): ["volume_id_pvc-da280e70-9bcb-41ba-bbbd-cbf973580c6e"]

I have tried suspending all pods with kubectl scale -n media deploy/plex --replicas 0 to try and ensure that nothing is using the volume during the reboot.

Unfortunately I know almost nothing about ISCSI, so it's entirely possible this is 100% my fault. What is the proper process with ISCSI for rebooting either the NAS, or the nodes using PVs on the NAS to prevent this lockup? Is there an iscsiadm command I can use to remove this deadlock and let the new container access the PV?

my democratic-csi config is:

---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: csi-iscsi
  namespace: storage
spec:
  interval: 5m
  chart:
    spec:
      chart: democratic-csi
      version: 0.13.4
      sourceRef:
        kind: HelmRepository
        name: democratic-csi-charts
        namespace: flux-system
      interval: 5m
  values:
    csiDriver:
      name: "org.democratic-csi.iscsi"

    storageClasses:
    - name: tank-iscsi-csi
      defaultClass: true
      reclaimPolicy: Delete
      ## For testing
      # reclaimPolicy: Retain
      volumeBindingMode: Immediate
      allowVolumeExpansion: true
      parameters:
        fsType: ext4

    driver:
      image: docker.io/democraticcsi/democratic-csi:v1.7.6
      imagePullPolicy: IfNotPresent
      config:
        driver: zfs-generic-iscsi
      existingConfigSecret: zfs-generic-iscsi-config

and the driver config is:

apiVersion: v1
kind: Secret
metadata:
    name: zfs-generic-iscsi-config
    namespace: storage
stringData:
    driver-config-file.yaml: |
        driver: zfs-generic-iscsi
        sshConnection:
            host: ${UIHARU_IP}
            port: 22
            username: root
            privateKey: |
                -----BEGIN OPENSSH PRIVATE KEY-----
                ...
                -----END OPENSSH PRIVATE KEY-----
        zfs:
            datasetParentName: sltank/k8s/iscsiv
            detachedSnapshotsDatasetParentName: sltank/k8s/iscsis
        iscsi:
            shareStrategy: "targetCli"
            shareStrategyTargetCli:
                basename: "iqn.2016-04.com.open-iscsi:a6b73d4196"
                tpg:
                    attributes:
                        authentication: 0
                        generate_node_acls: 1
                        cache_dynamic_acls: 1
                        demo_mode_write_protect: 0
            targetPortal: "${UIHARU_IP}"

Not sure what other info is important, but I'd be happy to provide anything else that might help troubleshoot the issue.

The text was updated successfully, but these errors were encountered:

travisghansen · 2022-08-31T14:41:53Z

Ah this is a tricky one and I'm glad you opened this. So there are a couple issues at play here:

democratic-csi ensures no 2 (possibly conflicting) operations happen at the same time and thus creates an in-memory lock
iscsi as a protocol will generally not handle this situation well and actually would require all your pods using iscsi volumes to restart

The first can be remedied by deleting all the democratic-csi pods and just letting them restart. The latter requires you to handle each workload in a case by case basis.

Essentially if the nas goes down and comes back up the iscsi sessions on the node (assuming they recover) go to read-only. The only way to remedy that (via k8s) is to just restart the pods as appropriate..and even then in some cases that may not be enough and would require forcing the workload to a new node. I'll do some research on possible ways to just go to the cli of the nodes directly and get them back into a rw state manually without any other intervention at the k8s layer.

cbc02009 · 2022-09-03T13:54:02Z

For the record deleting all democratic-csi pods and the pod using the PVC did not solve the issue.

Would the NFS version have the same issue? I'm hesitant to use it for something like plex because of the hundreds of thousands of small files, but if it doesn't break on reboot it may be worth it.

travisghansen · 2022-09-03T14:52:12Z

I haven't been able to find an iscsiadm command that will take a device that's become ro and make it rw (maybe it's not needed). I don't recall the exact behavior in this case...does the mount show as ro? If so maybe just simply remount the fs as rw will make existing connections clear up.

There is some generic k8s/csi work being done that would hopefully help correct thees situations automatically, but all the pieces haven't come together yet.

To mitigate the issue you could tweak some things like this: https://wiki.archlinux.org/title/ISCSI/Boot#Make_the_iSCSI_daemon_resilient_to_network_problems

Regarding nfs, it recovers from reboots of the nas much better yes, but indeed file-based storage has performance implications in certain scenarios vs block-based.

cbc02009 · 2022-09-03T18:35:13Z

I haven't been able to find an iscsiadm command that will take a device that's become ro and make it rw (maybe it's not needed). I don't recall the exact behavior in this case...does the mount show as ro? If so maybe just simply remount the fs as rw will make existing connections clear up.

I would definitely try that out and get you the information, but I'm completely clueless about iscsiadm. Could you let me know what command I should run to get the output for you and to remount the volumes?

travisghansen · 2022-09-03T19:48:43Z

Can you send me the output of the mount command from a node with a volume that is currently read only?

cbc02009 · 2022-09-05T12:45:30Z

there were hundreds of lines, so I grep-ed it down to only the democratic-csi ones. let me know if you need the whole output.

❯ mount | grep csi
/dev/sda on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/6a7617911e8723d36cf2ce2d4761552ec9fc45909df51b38650ac81fbe1da466/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sda on /var/lib/kubelet/pods/c65aa595-96f9-4d5c-8b49-b8f31dfab417/volumes/kubernetes.io~csi/pvc-4beb4d11-a72c-4e64-872a-d4964de2dedc/mount type ext4 (rw,relatime,stripe=4)
/dev/sdb on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/9c1a1d9f6c298ef3438d4061d2c8e667ae891384e602e95f36afaaa7a5eadd98/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdb on /var/lib/kubelet/pods/e383b3c7-7f05-4d0e-818c-422736df9a6b/volumes/kubernetes.io~csi/pvc-ec386a17-e734-4d8f-a8d6-d8d87354c0c0/mount type ext4 (rw,relatime,stripe=4)
/dev/sdc on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/9097a1f0acddce7985644883763118cd78d31ed9ae97136f99a5d63e952badff/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdc on /var/lib/kubelet/pods/43c5146a-3ac6-4fd0-95fd-4c9924eae010/volumes/kubernetes.io~csi/pvc-f8832fc7-1cff-4faf-9e46-e6a73a24eae2/mount type ext4 (rw,relatime,stripe=4)
/dev/sdd on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/1a30803184ddc3874711e3f48b3e5d328680e443ee128d2320a1702f3cf47a0a/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdd on /var/lib/kubelet/pods/ab99ad4b-7062-433b-a0e5-a0a69543719c/volumes/kubernetes.io~csi/pvc-91213a25-0e9b-4ff1-8ae1-1afbfd59dfe9/mount type ext4 (rw,relatime,stripe=4)
/dev/sde on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/5b7345571bc8bcae92cd9663f382e958ff021773a204777294ab82f0ceb09910/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sde on /var/lib/kubelet/pods/a6f20f40-a274-4ac4-8914-6be9417f9b37/volumes/kubernetes.io~csi/pvc-995f8b55-25e4-4ea7-8f08-28e2513558cf/mount type ext4 (rw,relatime,stripe=4)
/dev/sdf on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/3616fabfaeee654e2adbcb545e495673d0254b1eb35479dab243c75b0945de00/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdf on /var/lib/kubelet/pods/235a485b-15d3-4b53-8171-da0a19b40e82/volumes/kubernetes.io~csi/pvc-126e0a9e-2b55-485c-963e-e7cd3e034012/mount type ext4 (rw,relatime,stripe=4)
/dev/sdg on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/3d3eb6033f51ac88ae8fcd05424eeb50c5af2148140218f118871a7e7dc25aa7/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdg on /var/lib/kubelet/pods/d9af64fc-23df-4f2b-96f5-7381a0170e5e/volumes/kubernetes.io~csi/pvc-31882371-9879-4ff4-80ee-6c911a8d063a/mount type ext4 (rw,relatime,stripe=4)
/dev/sdh on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/4be6f7fb94031f3b7178fd31ba56bf4f9c747aa87a11a8a7a5b5f90acd4a6804/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdh on /var/lib/kubelet/pods/a219c677-af88-4067-9141-31d52f967f8b/volumes/kubernetes.io~csi/pvc-a968d1ac-43f4-417a-a408-6914215fe73b/mount type ext4 (rw,relatime,stripe=4)
/dev/sdi on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/8658ea4d3f685c05833b8b0d7348a22c4bb4f2a6d47fec418ea681d2cef16597/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdi on /var/lib/kubelet/pods/6a55b542-69b4-41b9-81b4-85a3ef5d5eeb/volumes/kubernetes.io~csi/pvc-4e11ae12-65ec-45ac-bbf5-38a5a19d7e09/mount type ext4 (rw,relatime,stripe=4)
/dev/sdj on /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/5bcf2cf492e91a2da6f903ed7491887ece9f3849a7570628157f9864e14f0cd7/globalmount type ext4 (rw,relatime,stripe=4)
/dev/sdj on /var/lib/kubelet/pods/d62a3627-840a-4dfc-8297-fa4c1305d46f/volumes/kubernetes.io~csi/pvc-ba9d436e-87b3-4166-9b7d-804d523c6635/mount type ext4 (rw,relatime,stripe=4)

travisghansen · 2022-09-05T12:54:09Z

Those mounts are currently non-writable/non-functional?

cbc02009 · 2022-09-05T14:09:22Z

Yes, although the new pod got assigned to another host:

Normal   Scheduled    3m43s               default-scheduler  Successfully assigned organizarrs/sonarr-6b58cd8764-ft5mm to uiharu
  Warning  FailedMount  103s                kubelet            MountVolume.MountDevice failed for volume "pvc-c7c23d7e-8fe1-4ca1-8bb8-718c436e2212" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount  100s                kubelet            Unable to attach or mount volumes: unmounted volumes=[config], unattached volumes=[kube-api-access-k5cg6 backup bittorrent config media]: timed out waiting for the condition
  Warning  FailedMount  39s (x7 over 102s)  kubelet            MountVolume.MountDevice failed for volume "pvc-c7c23d7e-8fe1-4ca1-8bb8-718c436e2212" : rpc error: code = Aborted desc = operation locked due to in progress operation(s): ["volume_id_pvc-c7c23d7e-8fe1-4ca1-8bb8-718c436e2212"]

The volume is still attached to the old host:

❯ mount | grep c7c23d7e-8fe1-4ca1-8bb8-718c436e2212
/dev/sdj on /var/lib/kubelet/pods/3dfa970d-3abe-4766-8279-ee2eaa424448/volumes/kubernetes.io~csi/pvc-c7c23d7e-8fe1-4ca1-8bb8-718c436e2212/mount type ext4 (rw,relatime,stripe=4)

and the cpu on the old host is now going crazy:

I haven't tested to see what happens if it re-mounts to the same host (I wasn't paying attention to the host during the rest of the tests...)

Also, this is after making the changes to iscsid.conf from the article that you recommended to me, in case that makes a difference.

travisghansen · 2022-09-05T14:42:14Z

Yeah, that's a dangerous situation (which is why when iscsi goes down the volumes go into ro mode). 2 nodes using the same block device simultaneously is not something you want happening. I would use something like kured (https://github.com/weaveworks/kured) or similar to simply trigger alll your nodes to cycle so the workloads shift around and everything comes up clean.

theautomation · 2022-09-18T22:07:38Z

Yeah, that's a dangerous situation (which is why when iscsi goes down the volumes go into ro mode). 2 nodes using the same block device simultaneously is not something you want happening. I would use something like kured (https://github.com/weaveworks/kured) or similar to simply trigger alll your nodes to cycle so the workloads shift around and everything comes up clean.

Any tips on how to tell kured when to reboot as soon as a Iscsi mount becomes into read-only mode?

travisghansen · 2022-09-18T23:11:51Z

That could be a not a great assumption either (there are legitimate cases for ro iscsi). I probably wouldn’t fully automate that but if I were to do so I would use your iac tool of choice to just touch the reboot-required file on all the nodes when it’s clear the storage system was rebooted. For example I have an ansible playbook that does only that…but I only run it manually when I know an outage has occurred.

If you really wish to detect a read-only scsi connections however I would probably write up a little script to detect that and put it on a cron/systemd timer right in the nodes.

djjudas21 · 2023-01-11T12:38:10Z

Maybe not the answer you were hoping for, but the best solution I've found is to do a rolling reboot of all my kube nodes whenever I reboot my NAS. It's a pain, but it's also an opportunity for patching etc.

rouke-broersma · 2024-03-22T13:35:25Z

@travisghansen It looks like Longhorn has chosen to support simply deleting the pods managed by a controller when it identifies that a volume is no longer available. Could this be something you would be willing to support in democratic csi?

See: https://longhorn.io/docs/archives/1.2.0/references/settings/#automatically-delete-workload-pod-when-the-volume-is-detached-unexpectedly

travisghansen · 2024-03-23T01:56:56Z

Interesting, something to consider for sure. I think this could be handled by the health service endpoint. I am hesitant to get into such a thing but think it merits some discussion for sure.

eaglesemanation · 2024-03-29T06:36:08Z

Longhorn solution looks promising for my usecase, would appreciate it getting implemented, unfortunately I'm too intimidated by huge JS codebase to try to contribute anything. Instead, I wrote a small HTTP server that will delete all pods that mount PVCs with given storage class, given hardcoded bearer token in authorization header. I'm using startup script on TrueNAS side to trigger that whenever NAS turns on.

If anyone wants to use it - here is server itself: https://github.com/eaglesemanation/k8s-csi-restarter
Here is example configuration for my k8s cluster: https://github.com/eaglesemanation/ops.emnt.dev/tree/main/k8s/apps/storage/k8s-csi-restarter
And on TrueNAS side it's basically curl --header 'Authorization:Bearer password' http://ingress-or-loadbalancerip/delete

Edit: This does not delete democratic-csi controller and node pods themselves, didn't think about that. Will add that functionality soon

travisghansen · 2024-03-30T16:15:38Z

@eaglesemanation thanks for sharing!

Mr-Quin mentioned this issue Jul 9, 2023

Changing iscsi portal IP address #305

Closed

travisghansen closed this as completed Mar 27, 2024

travisghansen reopened this Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All Persistant Volumes fail permanently after NAS reboot #232

All Persistant Volumes fail permanently after NAS reboot #232

cbc02009 commented Aug 31, 2022

travisghansen commented Aug 31, 2022

cbc02009 commented Sep 3, 2022 •

edited

travisghansen commented Sep 3, 2022

cbc02009 commented Sep 3, 2022

travisghansen commented Sep 3, 2022

cbc02009 commented Sep 5, 2022

travisghansen commented Sep 5, 2022

cbc02009 commented Sep 5, 2022 •

edited

travisghansen commented Sep 5, 2022

theautomation commented Sep 18, 2022

travisghansen commented Sep 18, 2022

djjudas21 commented Jan 11, 2023

rouke-broersma commented Mar 22, 2024

travisghansen commented Mar 23, 2024

eaglesemanation commented Mar 29, 2024 •

edited

travisghansen commented Mar 30, 2024

All Persistant Volumes fail permanently after NAS reboot #232

All Persistant Volumes fail permanently after NAS reboot #232

Comments

cbc02009 commented Aug 31, 2022

travisghansen commented Aug 31, 2022

cbc02009 commented Sep 3, 2022 • edited

travisghansen commented Sep 3, 2022

cbc02009 commented Sep 3, 2022

travisghansen commented Sep 3, 2022

cbc02009 commented Sep 5, 2022

travisghansen commented Sep 5, 2022

cbc02009 commented Sep 5, 2022 • edited

travisghansen commented Sep 5, 2022

theautomation commented Sep 18, 2022

travisghansen commented Sep 18, 2022

djjudas21 commented Jan 11, 2023

rouke-broersma commented Mar 22, 2024

travisghansen commented Mar 23, 2024

eaglesemanation commented Mar 29, 2024 • edited

travisghansen commented Mar 30, 2024

cbc02009 commented Sep 3, 2022 •

edited

cbc02009 commented Sep 5, 2022 •

edited

eaglesemanation commented Mar 29, 2024 •

edited