Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSINode does not contain driver csi.trident.netapp.io #473

Closed
alexandru-ersenie opened this issue Oct 21, 2020 · 9 comments
Closed

CSINode does not contain driver csi.trident.netapp.io #473

alexandru-ersenie opened this issue Oct 21, 2020 · 9 comments
Labels

Comments

@alexandru-ersenie
Copy link

alexandru-ersenie commented Oct 21, 2020

Description
Using the trident operator and kubernetes 1.17.6, i am able to create persistent volumes, but not able to mount them into the pods.

When getting the pod description, following error is returned:
CSINode does not contain driver csi.trident.netapp.io

Environment

  • Trident version: [20.07.1]
  • Trident installation flags used: [no custom flags, since we use the default /var/lib/kubelet location]
  • Container runtime: [Docker 19.3.12]
  • Kubernetes version: [ 1.17.6]
  • Kubernetes orchestrator: [none]
  • Kubernetes enabled feature gates: [none needed]
  • OS: [Centos 7 - 3.10.0-1062.12.1.el7.x86_64]
  • NetApp backend types: [ OnTap 9.7 ]
  • Other:

To Reproduce
Install operator as provided here: https://netapp-trident.readthedocs.io/en/stable-v20.07/kubernetes/deploying/operator-deploy.html

After creating storage class, and consumer, the pv gets bound, but the pod cannot attach the volume locally to the worker

Expected behavior
Pod was expected to mount the volume and be running. instead it just remains in "pending"
Additional context
Pod description:

Warning  FailedScheduling    11s (x2 over 12s)              default-scheduler        error while running "VolumeBinding" filter plugin for pod "web-0": pod has unbound immediate PersistentVolumeClaims
  Normal   Scheduled           9s                             default-scheduler        Successfully assigned test/web-0 to hh1kbw02x
  Warning  FailedAttachVolume  <invalid> (x6 over <invalid>)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-934230b9-900c-4539-bb0c-8feff6e18628" : CSINode hh1kbw02x does not contain driver csi.trident.netapp.io
  Warning  FailedAttachVolume  <invalid> (x6 over <invalid>)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-f4c2b654-ff73-4dd5-84ef-a31491b83f26" : CSINode hh1kbw02x does not contain driver csi.trident.netapp.io

Logs from trident on this worker:

kubectl -n trident logs trident-csi-9sgrt -c trident-main -f
time="2020-10-21T17:15:31Z" level=debug msg="\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\nPUT https://10.111.4.90:34571/trident/v1/node/hh1kbw02x\nHeaders: map[Content-Type:[application/json]]\nBody: {\n  \"name\": \"hh1kbw02x\",\n  \"ips\": [\n    \"10.49.12.102\",\n    \"172.17.0.1\"\n  ]\n}\n--------------------------------------------------------------------------------"
time="2020-10-21T17:15:32Z" level=debug msg="\n<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<\nStatus: 201 Created\nHeaders: map[Content-Length:[21] Content-Type:[application/json; charset=UTF-8] Date:[Wed, 21 Oct 2020 17:15:32 GMT]]\nBody: {\n  \"name\": \"hh1kbw02x\"\n}\n\n================================================================================"
time="2020-10-21T17:15:32Z" level=debug msg="Communication with controller established, node registered." node=hh1kbw02x

Logs from registrar sidecar on this worker:

kubectl -n trident logs trident-csi-9sgrt -c driver-registrar
I1021 17:14:18.636803    6672 main.go:110] Version: v1.3.0-0-g6e9fff3e
I1021 17:14:18.636888    6672 main.go:120] Attempting to open a gRPC connection with: "/plugin/csi.sock"
I1021 17:14:18.636908    6672 connection.go:151] Connecting to unix:///plugin/csi.sock
I1021 17:14:18.637420    6672 main.go:127] Calling CSI driver to discover driver name
I1021 17:14:18.637435    6672 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I1021 17:14:18.637442    6672 connection.go:181] GRPC request: {}
I1021 17:14:18.639851    6672 connection.go:183] GRPC response: {"name":"csi.trident.netapp.io","vendor_version":"20.07.1"}
I1021 17:14:18.640235    6672 connection.go:184] GRPC error: <nil>
I1021 17:14:18.640242    6672 main.go:137] CSI driver name: "csi.trident.netapp.io"
I1021 17:14:18.648537    6672 node_register.go:51] Starting Registration Server at: /registration/csi.trident.netapp.io-reg.sock
I1021 17:14:18.648666    6672 node_register.go:60] Registration Server started at: /registration/csi.trident.netapp.io-reg.sock

Description of csi node

kubectl get csinode hh1kbw02x  -n trident -o yaml
apiVersion: storage.k8s.io/v1
kind: CSINode
metadata:
  creationTimestamp: 2020-09-10T07:58:40Z
  name: hh1kbw02x
  ownerReferences:
  - apiVersion: v1
    kind: Node
    name: hh1kbw02x
    uid: d3db28d6-e2be-4ad4-8534-c853b2e025b5
  resourceVersion: "30914526"
  selfLink: /apis/storage.k8s.io/v1/csinodes/hh1kbw02x
  uid: a764977a-be67-4ee9-8b7e-9aac304e0890
spec:
  drivers: null 
Kubelet logs:
Nov  6 10:14:18 hh1kbw01x kubelet: I1106 10:14:18.883059    2393 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-3bcc4e38-2e69-4541-9910-711d2c086671" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-3bcc4e38-2e69-4541-9910-711d2c086671") pod "web-0" (UID: "a235d5f9-05bf-4c77-8a84-e48f2f657d98")
Nov  6 10:14:18 hh1kbw01x kubelet: E1106 10:14:18.883223    2393 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.trident.netapp.io^pvc-3bcc4e38-2e69-4541-9910-711d2c086671 podName: nodeName:}" failed. No retries permitted until 2020-11-06 10:14:19.383183856 +0100 CET m=+1353591.737508759 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-3bcc4e38-2e69-4541-9910-711d2c086671\" (UniqueName: \"kubernetes.io/csi/csi.trident.netapp.io^pvc-3bcc4e38-2e69-4541-9910-711d2c086671\") pod \"web-0\" (UID: \"a235d5f9-05bf-4c77-8a84-e48f2f657d98\") "
Nov  6 10:14:18 hh1kbw01x kubelet: I1106 10:14:18.983580    2393 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-f138b6cc-988b-455c-bb2e-fce022755634" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-f138b6cc-988b-455c-bb2e-fce022755634") pod "web-0" (UID: "a235d5f9-05bf-4c77-8a84-e48f2f657d98")
Nov  6 10:14:18 hh1kbw01x kubelet: E1106 10:14:18.983662    2393 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.trident.netapp.io^pvc-f138b6cc-988b-455c-bb2e-fce022755634 podName: nodeName:}" failed. No retries permitted until 2020-11-06 10:14:19.483629729 +0100 CET m=+1353591.837954619 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-f138b6cc-988b-455c-bb2e-fce022755634\" (UniqueName: \"kubernetes.io/csi/csi.trident.netapp.io^pvc-f138b6cc-988b-455c-bb2e-fce022755634\") pod \"web-0\" (UID: \"a235d5f9-05bf-4c77-8a84-e48f2f657d98\") "
Nov  6 10:14:19 hh1kbw01x kubelet: I1106 10:14:19.385072    2393 reconciler.go:209] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-3bcc4e38-2e69-4541-9910-711d2c086671" (UniqueName: "kubernetes.io/csi/csi.trident.netapp.io^pvc-3bcc4e38-2e69-4541-9910-711d2c086671") pod "web-0" (UID: "a235d5f9-05bf-4c77-8a84-e48f2f657d98")
@smangelkramer
Copy link

Identical issue here with Rancher RKE and K8S (v1.18.10) and nodes running Ubuntu 18.04.4 LTS with Docker 19.3.13 rest matches stated environment above...

@AndreasDeCrinis
Copy link

same here with upstream k8s on ubuntu 18.04

@AndreasDeCrinis
Copy link

I was able to "solve" it by redeploying both the trident-csi daemonset and deployment and restart kubelet afterwards

@smangelkramer
Copy link

yep. i used tridentctl instead of operator.

@alexandru-ersenie
Copy link
Author

So i fixed it, if i may say so. After reading how the external storage provisioner work, and understanding the concept of driver registration using the sidecar container i reviewed our setup.

It was very misleading, since we configure our kubelets to start with the configuration files residing under /var/lib/kubelet, which is the default root-dir.

Couple of months ago we decided to split the brain, and move the pods and containers into a separate storage location, so we split the management from the operation

Therefore we changed the root-dir in the configuration file to point to /containers instead of /var/lib/kubelet

The default trident provisioner will look in the default location, and "embed" the plugins ,so to say.

So you need to check on two things:

  1. ps aux | grep kubelet | grep -e 'root-dir -> take the configured folder (in my case it was /container)
  2. change the trident_provisioner_cr.yaml, and customize it by adding the parameter "kubeletDir"
    apiVersion: trident.netapp.io/v1 kind: TridentProvisioner metadata: name: trident namespace: trident spec: debug: true kubeletDir: /container

Good luck.I'm closing this.

@balaramesh
Copy link
Contributor

Observed something similar when the trident-csi daemonset pods are unable to communicate with the trident-controller. In this case, it was due to a network policy that prevented it.

@eselvam
Copy link

eselvam commented Aug 4, 2022

Shall we know how to fix it? is it with network policy? what needs be added in trident namespace/project.

@balaramesh
Copy link
Contributor

Hello @eselvam

It is unlikely that you need to add anything to the Trident namespace/project. As pointed out by @alexandru-ersenie , I would recommend you check the path for your kubelet directory. If your k8s distribution uses a path other than /var/lib/kubelet, you will need to deploy Trident with the kubeletDir parameter.

@lindhe
Copy link

lindhe commented Nov 3, 2023

In my case (running RKE2), do have it in /var/lib/kubelet yet I run into the same issue. 😕

EDIT: Turns out it was NetworkPolicies that was the culprit for me! I'm running Rancher and have activated Project Network Isolation. 😄 Great. I'll just try to figure out what things I should open for things to work.

Possibly related to #638

Turns out I'm affected by this three year old bug in Cilium: cilium/cilium#12277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants