Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s-csi-cleaner empty string error #301

Open
valkenrath opened this issue May 17, 2023 · 21 comments
Open

k8s-csi-cleaner empty string error #301

valkenrath opened this issue May 17, 2023 · 21 comments

Comments

@valkenrath
Copy link

K8s distro = Talos 1.4.x
TrueNAS distro = SCALE Bluefin (22.12.2-1)
Pod: democratic-csi/csi-driver

/home/csi/app# ./bin/k8s-csi-cleaner

101 k8s PVs discovered

Error: 13 INTERNAL: empty string
    at callErrorFromStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /home/csi/app/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
    at processTicksAndRejections (node:internal/process/task_queues:78:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client.js:160:34)
    at ServiceClientImpl.<anonymous> (/home/csi/app/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /home/csi/app/bin/k8s-csi-cleaner:71:37
    at new Promise (<anonymous>)
    at executeRPC (/home/csi/app/bin/k8s-csi-cleaner:70:10)
    at runControllerListVolumes (/home/csi/app/bin/k8s-csi-cleaner:90:10)
    at main (/home/csi/app/bin/k8s-csi-cleaner:120:17)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async /home/csi/app/bin/k8s-csi-cleaner:172:7 {
  code: 13,
  details: 'empty string',
  metadata: Metadata {
    internalRepr: Map(3) {
      'content-type' => [Array],
      'date' => [Array],
      'content-length' => [Array]
    },
    options: {}
  }
}

Has been true for a while (many minor Talos/SCALE revisions), just been lazy with logging the fault.

This is particularly painful when paired with the lack of an onDelete param for the storageClass; many dangling child datasets, no programmatic way to clean them up.

Happy to provide more info/logs, please advise!

@travisghansen
Copy link
Member

#289

Can you ls -l /csi-data/ inside that container?

@valkenrath
Copy link
Author

/home/csi/app# ls -l /csi-data/
total 0
srwxr-xr-x 1 root root 0 May  2 20:22 csi.sock
srw------- 1 root root 0 May  2 20:22 csi.sock.internal

@travisghansen
Copy link
Member

Do any interesting logs show up from the respective csi-driver container of that pod?

@valkenrath
Copy link
Author

Apologies for letting this linger. Even after rebuilding on TrueNAS-SCALE-Cobia (latest release), ie importing the zpools onto a fresh install, this is still an issue.

Current environment:
democratic-csi chart v0.14.2
Talos v1.5.4
Kubernetes v1.27.6
TrueNAS SCALE v22.12.3

The error (triggered in same way) seems slightly different.

root@democratic-csi-nfs-controller-6f6c645d46-5z564:/home/csi/app/bin# ./k8s-csi-cleaner
150 k8s PVs discovered
Error: 13 INTERNAL: TypeError: response.body.children is not iterable TypeError: response.body.children is not iterable
    at FreeNASApiDriver.ListVolumes (/home/csi/app/src/driver/freenas/api.js:3397:39)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async requestHandlerProxy (/home/csi/app/bin/democratic-csi:217:18)
    at callErrorFromStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
    at Object.onReceiveStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client.js:192:76)
    at Object.onReceiveStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:360:141)
    at Object.onReceiveStatus (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
    at /home/csi/app/node_modules/@grpc/grpc-js/build/src/resolving-call.js:94:78
    at processTicksAndRejections (node:internal/process/task_queues:78:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/home/csi/app/node_modules/@grpc/grpc-js/build/src/client.js:160:34)
    at ServiceClientImpl.<anonymous> (/home/csi/app/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
    at /home/csi/app/bin/k8s-csi-cleaner:71:37
    at new Promise (<anonymous>)
    at executeRPC (/home/csi/app/bin/k8s-csi-cleaner:70:10)
    at runControllerListVolumes (/home/csi/app/bin/k8s-csi-cleaner:90:10)
    at main (/home/csi/app/bin/k8s-csi-cleaner:120:17)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async /home/csi/app/bin/k8s-csi-cleaner:172:7 {
  code: 13,
  details: 'TypeError: response.body.children is not iterable TypeError: response.body.children is not iterable\n' +
    '    at FreeNASApiDriver.ListVolumes (/home/csi/app/src/driver/freenas/api.js:3397:39)\n' +
    '    at runMicrotasks (<anonymous>)\n' +
    '    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n' +
    '    at async requestHandlerProxy (/home/csi/app/bin/democratic-csi:217:18)',
  metadata: Metadata {
    internalRepr: Map(3) {
      'content-type' => [Array],
      'date' => [Array],
      'content-length' => [Array]
    },
    options: {}
  }
}

To very belatedly answer your question, that command results in the following csi-driver pod output:

{"host":"democratic-csi-nfs-controller-6f6c645d46-5z564","level":"info","message":"new request - driver: FreeNASApiDriver method: ListVolumes call: {\"metadata\":{\"x-forwarded-host\":[\"localhost\"],\"user-agent\":[\"grpc-node-js/1.8.13\"]},\"request\":{\"max_entries\":0,\"starting_token\":\"\"},\"cancelled\":false}","service":"democratic-csi","timestamp":"2023-11-08T04:14:38.023Z"}

{"host":"democratic-csi-nfs-controller-6f6c645d46-5z564","level":"error","message":"handler error - driver: FreeNASApiDriver method: ListVolumes error: TypeError: response.body.children is not iterable TypeError: response.body.children is not iterable\n    at FreeNASApiDriver.ListVolumes (/home/csi/app/src/driver/freenas/api.js:3397:39)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at async requestHandlerProxy (/home/csi/app/bin/democratic-csi:217:18)","service":"democratic-csi","timestamp":"2023-11-08T04:14:38.156Z"}

@travisghansen
Copy link
Member

Can you turn on debug logging? It will show the the actual request/response from the TrueNAS API.

@travisghansen
Copy link
Member

This is an example response from CI:

{
  "id": "tank/ci/ce25fabf/v",
  "type": "FILESYSTEM",
  "name": "tank/ci/ce25fabf/v",
  "pool": "tank",
  "encrypted": false,
  "encryption_***": null,
  "key_loaded": false,
  "children": [],
...

If you search ListVolume here: https://github.com/democratic-csi/democratic-csi/actions/runs/6645915657/job/18061660688 you can see what the request/response should look like. The codepath is programmed relatively defensively so it's checking for a response code of 200 before hand. The combination of a 200 response code and an invalid value for .children seems quite odd.

@valkenrath
Copy link
Author

Debug logging output (from nfs-controller/csi-driver, using aforementioned versions) is here

@travisghansen
Copy link
Member

I can't get that link to open :(

@valkenrath
Copy link
Author

Think that pastebin choked on syntax highlighting the (lengthy) debug response.
Another link, this one backed by your project :)

@travisghansen
Copy link
Member

hmm, I don't see that error in the output. Can you recreate the error and send over the respective logs?

@valkenrath
Copy link
Author

valkenrath commented Feb 5, 2024

Apologies again, time to lab is fleeting at present :(

Debug output captured from csi-driver pod (democraticcsi/democratic-csi:v1.8.4), running against TrueNAS SCALE 23.10.1, though this has been consistent across many TrueNAS (SCALE) and Democratic CSI releases.

[2024-02-12 edit: pastebin url without expiry]

@travisghansen
Copy link
Member

We need to turn on debug logging so you see the raw requests/responses :( this issue seems really odd. I can add a small snippet of code to maybe try and catch it with more robust detail.

@valkenrath
Copy link
Author

My fault! Had node.driver.logLevel = debug, didn't realise I needed to do the same for controller.driver.logLevel.
With that change made, new logs, inclusive of "level":"debug" lines → here.

@travisghansen
Copy link
Member

Well that response seems to have completed successfully. Can you run those logs out further to the point where the cleaner breaks?

@travisghansen
Copy link
Member

Make sure to run the cleaner from the controller pod if that's not clear.

@valkenrath
Copy link
Author

:( Sorry this is so back and forth, really not sure what I'm doing wrong to upload healthy logs when the the error is so repeatable. Will try to be verbose in hope of it helping you identify my misstep:

  • exec into democratic-csi/democratic-csi-nfs-controller/csi-driver
  • run ./bin/k8s-csi-cleaner (or cd and run it from /home/csi/app/bin, same outcome, ditto for prefixing DRY_RUN=1)
  • get the error found here (inside shell session)
  • pod logs, from same 1min timespan as command was run, found here

Relevant 5min timespan of the entire democratic-csi/democratic-csi-nfs-controller Deployment can be found here.

If I can provide more context, logs, anything, please do let me know.
Thanks again :)

@valkenrath
Copy link
Author

valkenrath commented Mar 3, 2024

Switching from driver: freenas-api-nfs to driver: freenas-nfs and adding a driver.config.sshConnection block in values.yaml 'resolves' this; k8s-csi-cleaner works fine both in DRY_RUN and AUTO_DELETE modes.

My freenas-api-iscsi deployment still fails when running the binary. NFS and iSCSI deployments share an API key, and both provisioned datasets/PVC's just fine prior to the change, so it seems the API key and HTTPS connectivity are both in working order.

Is it possible k8s-csi-cleaner requires a sshConnection block for some reason, ie incompatible with pure API mode config/driver?

@travisghansen
Copy link
Member

No. I looked through the previous logs sent and still cannot find a response where the children attribute is not present. Crazy stuff. If you actually did the clean operation with the ssh config have you tried going back to the api config and testing if it still blows up?

@valkenrath
Copy link
Author

Just tried, no dice.
Similarly, making the same two changes (and none other) to the freenas-api-iscsi deployment 'fixed' k8s-csi-cleaner running from it's csi-driver pod, in the same way it did for the freenas-api-nfs one.

  • driver: freenas-api-iscsidriver: freenas-iscsi
  • adding sshConnection block

If there are no significant downsides to using the SSH reliant drivers over the API-only ones (with TrueNAS SCALE), perhaps this working configuration can be considered a resolution?

@travisghansen
Copy link
Member

There are no downsides if you don't mind allowing ssh access to the node. I would like to discover the issue. The last set of logs you sent do not include a respective ListVolumes calls. Can you run the tool again against the api configuration and look for the ListVolumes request coming through along with the http debug immediately following it?

@travisghansen
Copy link
Member

Can you also send over your full config (cleansed of secrets of course)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants