Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NFS mount in node manual mode ends up stuck when node reboots uncleanly. #344

Open
tobz opened this issue Nov 18, 2023 · 17 comments
Open

NFS mount in node manual mode ends up stuck when node reboots uncleanly. #344

tobz opened this issue Nov 18, 2023 · 17 comments

Comments

@tobz
Copy link

tobz commented Nov 18, 2023

(I'll backfill more details in here, just jotting down the skeleton of the problem.)

Problem

I've encountered an issue with the node-manual driver -- a few times at this point -- where an unclean node reboot leads to NFS-backed PV/PVCs becoming stuck in an inconsistent state that ultimately leads to the PVCs not being able to be mounted to their respective pods when the node comes back online.

Context

My setup involves use of the node-manual driver to mount specific NFS paths with dedicated PV/PVCs on application pods. This is all fine and good when things are working.

When one of the aforementioned node reboots happen, naturally the pods are still hanging around in the API, and then eventually the node comes back and the node tries to bring up these pods as it is still set as being responsible for them.

When this occurs, there's an error shown when describing the pod (which I've lost at this point, unfortunately) when contains the string "staging path is not mounted", and refers to a path on the node ending in globalmount. This seems to be related to the two-step process where a volume is staged on a a node and then "published" so it can actually be used by a workload? Checking the node in question, the path it was referring to indeed didn't exist, although the directory it referenced did.

I ended up deleting the PV/PVCs (which were stuck in terminating, until...) and then deleting the pods using the stuck volumes, which ended up clearing everything out and allowed me to recreate the pods, which then ended up being able to properly mount the volumes.

@Routhinator
Copy link

I'm seeing this with a samba mount on one node as well. Works on one node but not the other even though it should be RWX - it seems almost like the driver is treating them like RWO - it mounts them successfully on one node or the other, but not both. They work so long as all the pods that need them pile onto one node.

@travisghansen
Copy link
Member

Is this possibly related to the volume attachment 'issue' I need to resolve as well?

@Routhinator
Copy link

Possibly. Since the OP here hd success with deleting and recreating the manual mounts I just tried that - and it once again lead to a race condition - first node that had the SMB Manual mount up got it and the pods trying to mount it on the other node are once again stuck with the same message:

20s         Warning   FailedMount              pod/calibre-web-8b456d9b-z79gv        MountVolume.SetUp failed for volume "truenas-manual-smb-file-server" : rpc error: code = FailedPrecondition desc = staging path is not mounted: /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/8d553e2ff90718153acf837afe96e657b79b5d01799daf80a1253483f79799bc/globalmount

@travisghansen
Copy link
Member

Hmm, that's kinda odd :( each node should stage before publish. Is stage even being invoked on the problematic node(s)?

@Routhinator
Copy link

This seems to be the source of the pain on the problem node, the last two fails have been on one now consistently.

{"host":"andromeda02.home.routh.io","level":"info","message":"new request - driver: NodeManualDriver method: NodeUnpublishVolume call: {\"metadata\":{\"user-agent\":[\"grpc-go/1.51.0\"],\"x-forwarded-host\":[\"localhost\"]},\"request\":{\"volume_id\":\"samba-share-csi-file-server-volume\",\"target_path\":\"/var/lib/kubelet/pods/f0b94372-c2d7-4c47-9e43-c59cf72decbf/volumes/kubernetes.io~csi/truenas-manual-smb-file-server/mount\"},\"cancelled\":false}","service":"democratic-csi","timestamp":"2024-01-29T03:06:24.827Z"}
executing mount command: findmnt --mountpoint /var/lib/kubelet/pods/f0b94372-c2d7-4c47-9e43-c59cf72decbf/volumes/kubernetes.io~csi/truenas-manual-smb-file-server/mount --output source,target,fstype,label,options -b -J --nofsroot
executing filesystem command: rmdir /var/lib/kubelet/pods/f0b94372-c2d7-4c47-9e43-c59cf72decbf/volumes/kubernetes.io~csi/truenas-manual-smb-file-server/mount
failed to execute filesystem command: rmdir /var/lib/kubelet/pods/f0b94372-c2d7-4c47-9e43-c59cf72decbf/volumes/kubernetes.io~csi/truenas-manual-smb-file-server/mount, response: {"code":1,"stdout":"","stderr":"rmdir: failed to remove '/var/lib/kubelet/pods/f0b94372-c2d7-4c47-9e43-c59cf72decbf/volumes/kubernetes.io~csi/truenas-manual-smb-file-server/mount': Directory not empty\n","timeout":false}
retry - failed condition, not trying again

It's trying to unstage the mount so it can stage it - but the mount is already there - I'm checking the dir now - and it looks like the mount may have disconnected at somepoint in it's lifecyle and the directories for the sub-path mounts in the pods mounting this have been written to the /mount dir - which causes the unstage to fail.

I managed get it out of that loop by running an rf -rf * in the mount directory of that pod, and then the storage driver just spun with the staging step, spitting the below over and over ..


{"host":"andromeda02.home.routh.io","level":"info","message":"new request - driver: NodeManualDriver method: NodePublishVolume call: {\"metadata\":{\"user-agent\":[\"grpc-go/1.51.0\"],\"x-forwarded-host\":[\"localhost\"]},\"request\":{\"publish_context\":{},\"secrets\":\"redacted\",\"volume_context\":{\"server\":\"truenas01\",\"csi.storage.k8s.io/pod.name\":\"radarr-77f97bb447-r2mdj\",\"csi.storage.k8s.io/pod.uid\":\"bd832e28-8641-45b8-b065-996b628fa82b\",\"csi.storage.k8s.io/serviceAccount.name\":\"default\",\"node_attach_driver\":\"smb\",\"provisioner_driver\":\"node-manual\",\"share\":\"file-server\",\"csi.storage.k8s.io/pod.namespace\":\"media-server\",\"csi.storage.k8s.io/ephemeral\":\"false\"},\"volume_id\":\"samba-share-csi-file-server-volume\",\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/8d553e2ff90718153acf837afe96e657b79b5d01799daf80a1253483f79799bc/globalmount\",\"target_path\":\"/var/lib/kubelet/pods/bd832e28-8641-45b8-b065-996b628fa82b/volumes/kubernetes.io~csi/truenas-manual-smb-file-server/mount\",\"volume_capability\":{\"access_mode\":{\"mode\":\"MULTI_NODE_MULTI_WRITER\"},\"mount\":{\"mount_flags\":[\"username=csi\",\"password=SCRUBBED\",\"uid=568\",\"gid=568\",\"nobrl\"],\"fs_type\":\"cifs\",\"volume_mount_group\":\"\"},\"access_type\":\"mount\"},\"readonly\":false},\"cancelled\":false}","service":"democratic-csi","timestamp":"2024-01-29T03:48:09.079Z"}
retry - failed condition, not trying again
executing filesystem command: mkdir -p -m 0750 /var/lib/kubelet/pods/bd832e28-8641-45b8-b065-996b628fa82b/volumes/kubernetes.io~csi/truenas-manual-smb-file-server/mount

Bouncing the manual CSI pod on that node did not resolve that spin, but rebooting the node after that did - and now pods are finally able to mount that pvc on that node again.

@travisghansen
Copy link
Member

That is very helpful info. That also may be very tricky to determine how to properly cope with the scenario. Did you happen to capture what files were in the dir before the rm rf? On the host did it show the global mount as active and was it active in reality?

@Routhinator
Copy link

Routhinator commented Jan 29, 2024

It was not active, there were no files in the mount dir, just empty directory structure mapping to the mount paths of the pods mounting that PVC on that node.

IE:

mount/
    audio/
        music/
        books/
    documents/
        ebooks/

Nothing else. It seems like the pods somehow started and some part of the scheduling process created the mount points even though the mount was not there.

The mount list on the host showed it was not mounted.

@tobz
Copy link
Author

tobz commented Feb 23, 2024

I've hit this again, and I'm posting this contemporaneously:

  1. observed an issue with pods trying to mount their node-manual-based NFS mounts after a node restarts

  2. delete workload pod and observe the following in the node-manual controller logs:

{"host":"tobyplex-gabagool","level":"info","message":"new request - driver: NodeManualDriver method: NodePublishVolume call: {\"metadata\":{\"user-agent\":[\"grpc-go/1.40.0\"],\"x-forwarded-host\":[\"/var/lib/kubelet/plugins/org.democratic-csi.node-manual/csi.sock\"]},\"request\":{\"publish_context\":{},\"secrets\":\"redacted\",\"volume_context\":{\"csi.storage.k8s.io/pod.namespace\":\"tobyplex-prod\",\"csi.storage.k8s.io/ephemeral\":\"false\",\"share\":\"/mnt/tank/storage/tobyplex/prod/media\",\"csi.storage.k8s.io/pod.name\":\"radarr-radarr-6b64fdc78-9qdw2\",\"server\":\"larder.catdad.science\",\"csi.storage.k8s.io/pod.uid\":\"b86f809e-0652-4e52-a123-22805ab846f9\",\"csi.storage.k8s.io/serviceAccount.name\":\"default\",\"node_attach_driver\":\"nfs\",\"provisioner_driver\":\"node-manual\"},\"volume_id\":\"radarr-mnt-media-pv-prod\",\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b/globalmount\",\"target_path\":\"/var/lib/kubelet/pods/b86f809e-0652-4e52-a123-22805ab846f9/volumes/kubernetes.io~csi/radarr-mnt-media-pv-prod/mount\",\"volume_capability\":{\"access_mode\":{\"mode\":\"MULTI_NODE_MULTI_WRITER\"},\"mount\":{\"mount_flags\":[\"nconnect=8\",\"nfsvers=4.1\"],\"fs_type\":\"nfs\",\"volume_mount_group\":\"\"},\"access_type\":\"mount\"},\"readonly\":false},\"cancelled\":false}","service":"democratic-csi","timestamp":"2024-02-23T00:26:03.353Z"}
retry - failed condition, not trying again
executing filesystem command: mkdir -p -m 0750 /var/lib/kubelet/pods/b86f809e-0652-4e52-a123-22805ab846f9/volumes/kubernetes.io~csi/radarr-mnt-media-pv-prod/mount
executing mount command: findmnt --mountpoint /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b/globalmount --output source,target,fstype,label,options -b -J --nofsroot
{"host":"tobyplex-gabagool","level":"error","message":"handler error - driver: NodeManualDriver method: NodePublishVolume error: {\"name\":\"GrpcError\",\"code\":9,\"message\":\"staging path is not mounted: /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/9e07259c243649e629bc2d569e1996291209071ed73aba54da8843d870338e66/globalmount\"}","service":"democratic-csi","timestamp":"2024-02-23T00:26:03.364Z"}
{"host":"tobyplex-gabagool","level":"error","message":"handler error - driver: NodeManualDriver method: NodePublishVolume error: {\"name\":\"GrpcError\",\"code\":9,\"message\":\"staging path is not mounted: /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b/globalmount\"}","service":"democratic-csi","timestamp":"2024-02-23T00:26:03.366Z"}
  1. observe that the path referenced does exist on the host, and looks like this:
tobyplex-gabagool [~]# ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b
total 8
drwxr-x--- 2 root root 4096 Dec  5 15:33 globalmount
-rw-r--r-- 1 root root  115 Jan  6 01:49 vol_data.json
  1. try deleting globalmount and even the whole directory (the one above) and then deleting the workload pod to jumpstart things... to no avail

  2. scale down the workload deployment to zero, delete the PV/PVC in question, and scale back up

  3. node-manual controller does the right thing, and workload pod now running

This jives with my original experience, and seems to perhaps point to an issue with stale/cached data used for the given PV/PVC that democratic-csi ends up trying to use, to no avail, causing the issue that ultimately requires deleting the PV/PVC to resolve.

@travisghansen
Copy link
Member

Can you run ls on the globalmount dir next time? I am interested in what’s in there.

@Routhinator
Copy link

Routhinator commented Feb 24, 2024

I hit this on the freenas-nfs driver (not manual) recently as well. It was similar to the node-manual one - the mountpoints were in the mount dir, no files.

So i can now confirm this issue is not exclusive to the node-manual driver.

@Routhinator
Copy link

Hit this again on the freenas-nfs driver today, no node reboot - actually looks like the gigabit adapter on node got overloaded with writes from files moving from one mount to another. This triggered a volume remount.

The remount wasn't clean and the application loaded a /mount directory that was local to the host without the remote nfs mount attached and mounted to it, this caused the application to initialize it's mount structure an config in the local /mount dir, which then caused the nfs controller to spin on the same message we've seen before:

executing filesystem command: rmdir /var/lib/kubelet/pods/a16c4e84-a133-471e-8908-7a153712e6e8/volumes/kubernetes.io~csi/pvc-d832c53e-e18e-40b9-90e9-dd0789a40bdb/mount
failed to execute filesystem command: rmdir /var/lib/kubelet/pods/a16c4e84-a133-471e-8908-7a153712e6e8/volumes/kubernetes.io~csi/pvc-d832c53e-e18e-40b9-90e9-dd0789a40bdb/mount, response: {"code":1,"stdout":"","stderr":"rmdir: failed to remove '/var/lib/kubelet/pods/a16c4e84-a133-471e-8908-7a153712e6e8/volumes/kubernetes.io~csi/pvc-d832c53e-e18e-40b9-90e9-dd0789a40bdb/mount': Directory not empty\n","timeout":false}

This was being seen for multiple pods attempting this mount, this was one example.

Contents of the /mount dir:

root@andromeda01:/home/routhinator# ls -lha /var/lib/kubelet/pods/a16c4e84-a133-471e-8908-7a153712e6e8/volumes/kubernetes.io~csi/pvc-d832c53e-e18e-40b9-90e9-dd0789a40bdb/mount
total 20K
drwxr-x--- 5 root root 4.0K Feb 29 14:33 .
drwxr-x--- 3 root root 4.0K Feb 29 04:56 ..
drwxr-x--- 2  568  568 4.0K Feb 29 14:36 config
drwxr-x--- 3  568  568 4.0K Feb 29 14:36 server
drwxr-x--- 2  568  568 4.0K Feb 29 14:33 transcode
root@andromeda01:/home/routhinator# ls -lha /var/lib/kubelet/pods/a16c4e84-a133-471e-8908-7a153712e6e8/volumes/kubernetes.io~csi/pvc-d832c53e-e18e-40b9-90e9-dd0789a40bdb/mount/config
total 12K
drwxr-x--- 2  568  568 4.0K Feb 29 14:36 .
drwxr-x--- 5 root root 4.0K Feb 29 14:33 ..
-rw-rw-r-- 1  568  568  238 Feb 29 14:36 Tdarr_Server_Config.json
root@andromeda01:/home/routhinator# ls -lha /var/lib/kubelet/pods/a16c4e84-a133-471e-8908-7a153712e6e8/volumes/kubernetes.io~csi/pvc-d832c53e-e18e-40b9-90e9-dd0789a40bdb/mount/server
total 20K
drwxr-x--- 3  568  568 4.0K Feb 29 14:36 .
drwxr-x--- 5 root root 4.0K Feb 29 14:33 ..
drwxrwxr-x 8  568  568 4.0K Feb 29 14:37 Tdarr
-rw-r--r-- 1 root root    4 Feb 29 14:35 pgid
-rw-r--r-- 1 root root    4 Feb 29 14:35 puid

Which causes the staging to not happen. There was nothing in the global mount dir.

I fixed this by scaling the pods trying to use this down, running rm -Rf against one of the pods complaining about the mount, and then waited until the csi-node pod on that node stopped spinning on the other pods. Removing the files once was enough to resolve it for all. This also allowed pods that were stuck in terminating to finally terminate.

Then scaled the pods back up, and things were back to normal.

@tobz
Copy link
Author

tobz commented Mar 12, 2024

Yet another predictable failure when a node is rebooted. -_-

From the node-manual controller pod on the affected node (tobyplex-gabagool):

{"host":"tobyplex-gabagool","level":"info","message":"new request - driver: NodeManualDriver method: NodePublishVolume call: {\"metadata\":{\"user-agent\":[\"grpc-go/1.40.0\"],\"x-forwarded-host\":[\"/var/lib/kubelet/plugins/org.democratic-csi.node-manual/csi.sock\"]},\"request\":{\"publish_context\":{},\"secrets\":\"redacted\",\"volume_context\":{\"csi.storage.k8s.io/pod.namespace\":\"tobyplex-prod\",\"csi.storage.k8s.io/serviceAccount.name\":\"default\",\"csi.storage.k8s.io/ephemeral\":\"false\",\"node_attach_driver\":\"nfs\",\"provisioner_driver\":\"node-manual\",\"server\":\"larder.catdad.science\",\"share\":\"/mnt/tank/storage/tobyplex/prod/media\",\"csi.storage.k8s.io/pod.name\":\"plex-plex-f76b7d777-qbf5x\",\"csi.storage.k8s.io/pod.uid\":\"943f14dc-c19b-42c3-ad9a-89ef4030f3ce\"},\"volume_id\":\"plex-mnt-media-pv-prod\",\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/9e07259c243649e629bc2d569e1996291209071ed73aba54da8843d870338e66/globalmount\",\"target_path\":\"/var/lib/kubelet/pods/943f14dc-c19b-42c3-ad9a-89ef4030f3ce/volumes/kubernetes.io~csi/plex-mnt-media-pv-prod/mount\",\"volume_capability\":{\"access_mode\":{\"mode\":\"MULTI_NODE_MULTI_WRITER\"},\"mount\":{\"mount_flags\":[\"nconnect=8\",\"nfsvers=4.1\"],\"fs_type\":\"nfs\",\"volume_mount_group\":\"\"},\"access_type\":\"mount\"},\"readonly\":false},\"cancelled\":false}","service":"democratic-csi","timestamp":"2024-03-12T14:24:21.151Z"}
retry - failed condition, not trying again
executing filesystem command: mkdir -p -m 0750 /var/lib/kubelet/pods/943f14dc-c19b-42c3-ad9a-89ef4030f3ce/volumes/kubernetes.io~csi/plex-mnt-media-pv-prod/mount
{"host":"tobyplex-gabagool","level":"info","message":"new request - driver: NodeManualDriver method: NodePublishVolume call: {\"metadata\":{\"user-agent\":[\"grpc-go/1.40.0\"],\"x-forwarded-host\":[\"/var/lib/kubelet/plugins/org.democratic-csi.node-manual/csi.sock\"]},\"request\":{\"publish_context\":{},\"secrets\":\"redacted\",\"volume_context\":{\"server\":\"larder.catdad.science\",\"share\":\"/mnt/tank/storage/tobyplex/prod/media\",\"csi.storage.k8s.io/pod.name\":\"radarr-radarr-6b64fdc78-h6cmd\",\"csi.storage.k8s.io/serviceAccount.name\":\"default\",\"csi.storage.k8s.io/ephemeral\":\"false\",\"node_attach_driver\":\"nfs\",\"provisioner_driver\":\"node-manual\",\"csi.storage.k8s.io/pod.namespace\":\"tobyplex-prod\",\"csi.storage.k8s.io/pod.uid\":\"fe7d9ae5-1d54-49ce-9bda-420fc6b9fb66\"},\"volume_id\":\"radarr-mnt-media-pv-prod\",\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b/globalmount\",\"target_path\":\"/var/lib/kubelet/pods/fe7d9ae5-1d54-49ce-9bda-420fc6b9fb66/volumes/kubernetes.io~csi/radarr-mnt-media-pv-prod/mount\",\"volume_capability\":{\"access_mode\":{\"mode\":\"MULTI_NODE_MULTI_WRITER\"},\"mount\":{\"mount_flags\":[\"nconnect=8\",\"nfsvers=4.1\"],\"fs_type\":\"nfs\",\"volume_mount_group\":\"\"},\"access_type\":\"mount\"},\"readonly\":false},\"cancelled\":false}","service":"democratic-csi","timestamp":"2024-03-12T14:24:21.160Z"}
retry - failed condition, not trying again
executing filesystem command: mkdir -p -m 0750 /var/lib/kubelet/pods/fe7d9ae5-1d54-49ce-9bda-420fc6b9fb66/volumes/kubernetes.io~csi/radarr-mnt-media-pv-prod/mount
executing mount command: findmnt --mountpoint /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/9e07259c243649e629bc2d569e1996291209071ed73aba54da8843d870338e66/globalmount --output source,target,fstype,label,options -b -J --nofsroot
{"host":"tobyplex-gabagool","level":"error","message":"handler error - driver: NodeManualDriver method: NodePublishVolume error: {\"name\":\"GrpcError\",\"code\":9,\"message\":\"staging path is not mounted: /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/9e07259c243649e629bc2d569e1996291209071ed73aba54da8843d870338e66/globalmount\"}","service":"democratic-csi","timestamp":"2024-03-12T14:24:21.184Z"}
executing mount command: findmnt --mountpoint /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b/globalmount --output source,target,fstype,label,options -b -J --nofsroot
{"host":"tobyplex-gabagool","level":"error","message":"handler error - driver: NodeManualDriver method: NodePublishVolume error: {\"name\":\"GrpcError\",\"code\":9,\"message\":\"staging path is not mounted: /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b/globalmount\"}","service":"democratic-csi","timestamp":"2024-03-12T14:24:21.321Z"}

Output from the failed node related to the aforementioned directories:

tobyplex-gabagool [~]# ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b/globalmount
total 0
tobyplex-gabagool [~]# ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/19983d5207f27959a9619459a97cd8a528e302bd1094e18c524569c6f7328f8b
total 8
drwxr-x--- 2 root root 4096 Feb 23 00:29 globalmount
-rw-r--r-- 1 root root  115 Feb 23 00:29 vol_data.json
tobyplex-gabagool [~]# ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/9e07259c243649e629bc2d569e1996291209071ed73aba54da8843d870338e66/globalmount
total 0
tobyplex-gabagool [~]# ls -l /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.node-manual/9e07259c243649e629bc2d569e1996291209071ed73aba54da8843d870338e66
total 8
drwxr-x--- 2 root root 4096 Feb 23 00:28 globalmount
-rw-r--r-- 1 root root  113 Feb 23 00:28 vol_data.json

This is all prior to doing any manual operations, and no, I can't run anymore commands on this environment to debug because I've already gone ahead and fixed it manually and described previously. 😅

@Routhinator
Copy link

@tobz have you tried the updated node-manual config example in #324 (comment) ?

I've been using this without issue for a bit now, no issues with unclean reboots.

@travisghansen
Copy link
Member

I think the volume attachments using the new config may help with that yes.

@tobz
Copy link
Author

tobz commented Mar 14, 2024

I updated to the latest Helm chart today, including the aforementioned node-manual example... not that I want a node to crash, but, we'll see what happens next time.

@travisghansen
Copy link
Member

Ok, did you update the csiDriver to ensure attach required etc?

@tobz
Copy link
Author

tobz commented Mar 15, 2024

Didn't specify attachRequired explicitly but it seems to be set to true by default?

toby@consigliera:~/src/catdad-science-infra$ k get csidriver
NAME                             ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
driver.longhorn.io               true             true             false             <unset>         false               Persistent   124d
org.democratic-csi.node-manual   true             true             false             <unset>         false               Persistent   24h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants