Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows CSI with iSCSI not reboot proof #342

Open
aarnaud opened this issue Nov 9, 2023 · 3 comments
Open

Windows CSI with iSCSI not reboot proof #342

aarnaud opened this issue Nov 9, 2023 · 3 comments

Comments

@aarnaud
Copy link

aarnaud commented Nov 9, 2023

HI,

Thanks for the windows support, I don't like windows but sometime we don't have the choice ;-)

I notice that the CSI don't support windows node reboot without drain (un-plan reboot), and I can reproduce this issue.

  Warning  FailedMount       28m (x15 over 38m)        kubelet            MountVolume.MountDevice failed for volume "pvc-6f282245-6dda-4001-8103-e9782848d217" : kubernetes.io/csi: attacher.MountDevice failed to create dir "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\org.democratic-csi.iscsi\\4eb42bec25479cbedfb550e626d766fcd90f97d452b7a11b0bf3331e3ab26cd2\\globalmount":  mkdir \var\lib\kubelet\plugins\kubernetes.io\csi\org.democratic-csi.iscsi\4eb42bec25479cbedfb550e626d766fcd90f97d452b7a11b0bf3331e3ab26cd2\globalmount: Cannot create a file when that file already exists.

State folder in C:\var\lib\kubelet\plugins\kubernetes.io\csi\org.democratic-csi.iscsi stay present that block the container to be respawn.

In my case just removing the folder 4eb42bec25479cbedfb550e626d766fcd90f97d452b7a11b0bf3331e3ab26cd2 solved the issue and Kubernetes success to reconcile and start the container

Maybe a test on globalmount link to see if the disk is mount may help to clean this issue ? or remove every folder on startup

Screenshot from 2023-11-09 15-52-08
Screenshot from 2023-11-09 15-58-47

Have a good day.

@aarnaud
Copy link
Author

aarnaud commented Nov 9, 2023

Add some details:

  • Kubernetes 1.28.3
  • Windows Server 2022
  • helm.sh/chart=democratic-csi-0.14.1
  • docker.io/democraticcsi/csi-grpc-proxy:v0.5.3
  • docker.io/democraticcsi/democratic-csi:latest => sha256:57eb874d0619987b67c6c0a1a04a644479a1a72a3bc717a318ed6f53a26266fe

@travisghansen
Copy link
Member

Welcome! I feel your pain trying to use windows nodes ;) That may need to be startup job handled outside the project to clean those up on boot :(

Do you have the logs for the driver when the error was happening?

@aarnaud
Copy link
Author

aarnaud commented Nov 9, 2023

I don't found relevant inside csi-driver log from my point of view:

zfs-iscsi-democratic-csi-node-windows

Defaulted container "csi-driver" out of: csi-driver, csi-proxy, driver-registrar
Warning: Ignoring extra certs from `/tmp/certs/extra-ca-certs.crt`, load failed: error:02001003:system library:fopen:No such process
grpc implementation: @grpc/grpc-js
failed finding config file realpath: Error: ENOENT: no such file or directory, lstat 'C:\??'
info: initializing csi driver: zfs-generic-iscsi {"timestamp":"2023-11-09T21:46:34.268Z"}
info: starting csi server - node version: v16.18.0, package version: 1.8.3, config file: C:\C\9a5a80f6e88fff38906e028579fadd837c8937896e891d0b960d525e16a5946f\\config\driver-config-file.yaml, csi-name: org.democratic-csi.iscsi, csi-driver: zfs-generic-iscsi, csi-mode: node, csi-version: 1.5.0, address: , socket: unix:////./pipe/democratic-csi/org.democratic-csi.iscsi/csi.sock {"timestamp":"2023-11-09T21:46:37.769Z"}
info: new request - driver: ControllerZfsGenericDriver method: GetPluginInfo call: {"metadata":{"user-agent":["grpc-go/1.54.0"],"x-forwarded-host":["localhost"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:46:38.444Z"}
info: new response - driver: ControllerZfsGenericDriver method: GetPluginInfo response: {"name":"org.democratic-csi.iscsi","vendor_version":"1.8.3"} {"timestamp":"2023-11-09T21:46:38.445Z"}
info: new request - driver: ControllerZfsGenericDriver method: NodeGetInfo call: {"metadata":{"user-agent":["grpc-go/1.54.0"],"x-forwarded-host":["localhost"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:46:38.982Z"}
info: new response - driver: ControllerZfsGenericDriver method: NodeGetInfo response: {"node_id":"win-gecksl1t6us","max_volumes_per_node":0} {"timestamp":"2023-11-09T21:46:38.983Z"}
info: new request - driver: ControllerZfsGenericDriver method: NodeGetCapabilities call: {"metadata":{"x-forwarded-host":["localhost"],"user-agent":["grpc-go/1.54.0"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:46:43.090Z"}
info: new response - driver: ControllerZfsGenericDriver method: NodeGetCapabilities response: {"capabilities":[{"rpc":{"type":"STAGE_UNSTAGE_VOLUME"}},{"rpc":{"type":"GET_VOLUME_STATS"}},{"rpc":{"type":"EXPAND_VOLUME"}},{"rpc":{"type":"SINGLE_NODE_MULTI_WRITER"}}]} {"timestamp":"2023-11-09T21:46:43.091Z"}
info: new request - driver: ControllerZfsGenericDriver method: NodeGetCapabilities call: {"metadata":{"user-agent":["grpc-go/1.54.0"],"x-forwarded-host":["localhost"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:46:59.214Z"}
info: new response - driver: ControllerZfsGenericDriver method: NodeGetCapabilities response: {"capabilities":[{"rpc":{"type":"STAGE_UNSTAGE_VOLUME"}},{"rpc":{"type":"GET_VOLUME_STATS"}},{"rpc":{"type":"EXPAND_VOLUME"}},{"rpc":{"type":"SINGLE_NODE_MULTI_WRITER"}}]} {"timestamp":"2023-11-09T21:46:59.214Z"}
info: new request - driver: ControllerZfsGenericDriver method: NodeGetCapabilities call: {"metadata":{"user-agent":["grpc-go/1.54.0"],"x-forwarded-host":["localhost"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:47:31.266Z"}
info: new response - driver: ControllerZfsGenericDriver method: NodeGetCapabilities response: {"capabilities":[{"rpc":{"type":"STAGE_UNSTAGE_VOLUME"}},{"rpc":{"type":"GET_VOLUME_STATS"}},{"rpc":{"type":"EXPAND_VOLUME"}},{"rpc":{"type":"SINGLE_NODE_MULTI_WRITER"}}]} {"timestamp":"2023-11-09T21:47:31.268Z"}
info: new request - driver: ControllerZfsGenericDriver method: Probe call: {"metadata":{"user-agent":["grpc-node-js/1.8.13"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:47:37.504Z"}
info: new response - driver: ControllerZfsGenericDriver method: Probe response: {"ready":{"value":true}} {"timestamp":"2023-11-09T21:47:37.505Z"}
logging memory usages due to LOG_MEMORY_USAGE env var
info: new request - driver: ControllerZfsGenericDriver method: NodeGetCapabilities call: {"metadata":{"user-agent":["grpc-go/1.54.0"],"x-forwarded-host":["localhost"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:48:35.728Z"}
info: new response - driver: ControllerZfsGenericDriver method: NodeGetCapabilities response: {"capabilities":[{"rpc":{"type":"STAGE_UNSTAGE_VOLUME"}},{"rpc":{"type":"GET_VOLUME_STATS"}},{"rpc":{"type":"EXPAND_VOLUME"}},{"rpc":{"type":"SINGLE_NODE_MULTI_WRITER"}}]} {"timestamp":"2023-11-09T21:48:35.728Z"}
info: new request - driver: ControllerZfsGenericDriver method: Probe call: {"metadata":{"user-agent":["grpc-node-js/1.8.13"]},"request":{},"cancelled":false} {"timestamp":"2023-11-09T21:48:37.560Z"}
info: new response - driver: ControllerZfsGenericDriver method: Probe response: {"ready":{"value":true}} {"timestamp":"2023-11-09T21:48:37.560Z"}

csi-proxy:

Log file created at: 2023/11/09 13:46:20
Running on machine: WIN-GECKSL1T6US
Binary: Built with gc go1.21.4 for windows/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I1109 13:46:20.425886    2784 main.go:154] Windows Service initialized through SCM
I1109 13:46:20.676352    2784 main.go:144] Running as a Windows service.
I1109 13:46:20.689482    2784 main.go:65] Starting CSI-Proxy Server ...
I1109 13:46:20.689482    2784 main.go:66] Version: v1.1.3-12-g2cfaa0b
I1109 13:46:20.738519    2784 main.go:85] Working directories: [C:\var\lib\kubelet]
I1109 13:46:20.738519    2784 main.go:86] Require privacy: true

kubelet log:

I1109 13:46:59.446599    4652 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-6f282245-6dda-4001-8103-e9782848d217\" (UniqueName: \"kubernetes.io/csi/org.democratic-csi.iscsi^pvc-6f282245-6dda-4001-8103-e9782848d217\") pod \"windows-pod-74ff8b6dc4-7kk29\" (UID: \"a99ed9da-d567-449c-8a2b-f37e4105c7a9\") DevicePath \"\"" pod="default/windows-pod-74ff8b6dc4-7kk29"
I1109 13:46:59.467856    4652 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-6f282245-6dda-4001-8103-e9782848d217\" (UniqueName: \"kubernetes.io/csi/org.democratic-csi.iscsi^pvc-6f282245-6dda-4001-8103-e9782848d217\") pod \"windows-pod-74ff8b6dc4-7kk29\" (UID: \"a99ed9da-d567-449c-8a2b-f37e4105c7a9\") DevicePath \"csi-661e6007a8782f0355b37158dd09ff6919b8c4bf3d70380f5a678f8b5e811617\"" pod="default/windows-pod-74ff8b6dc4-7kk29"
E1109 13:46:59.481563    4652 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/org.democratic-csi.iscsi^pvc-6f282245-6dda-4001-8103-e9782848d217 podName: nodeName:}" failed. No retries permitted until 2023-11-09 13:47:31.4815634 -0800 PST m=+67.094733201 (durationBeforeRetry 32s). Error: MountVolume.MountDevice failed for volume "pvc-6f282245-6dda-4001-8103-e9782848d217" (UniqueName: "kubernetes.io/csi/org.democratic-csi.iscsi^pvc-6f282245-6dda-4001-8103-e9782848d217") pod "windows-pod-74ff8b6dc4-7kk29" (UID: "a99ed9da-d567-449c-8a2b-f37e4105c7a9") : kubernetes.io/csi: attacher.MountDevice failed to create dir "\\var\\lib\\kubelet\\plugins\\kubernetes.io\\csi\\org.democratic-csi.iscsi\\4eb42bec25479cbedfb550e626d766fcd90f97d452b7a11b0bf3331e3ab26cd2\\globalmount":  mkdir \var\lib\kubelet\plugins\kubernetes.io\csi\org.democratic-csi.iscsi\4eb42bec25479cbedfb550e626d766fcd90f97d452b7a11b0bf3331e3ab26cd2\globalmount: Cannot create a file when that file already exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants