Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing vsphere-csi-controller with RWX (ReadWtiteMany) PV #2755

Open
dzanto opened this issue Jan 13, 2024 · 10 comments
Open

Crashing vsphere-csi-controller with RWX (ReadWtiteMany) PV #2755

dzanto opened this issue Jan 13, 2024 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@dzanto
Copy link

dzanto commented Jan 13, 2024

/kind bug

What happened:

I have install vSphere Cloud Provider Interface (CPI) and vSphere Container Storage Interface (CSI) to kubernetes cluster from Rancher Apps, and mount RWO (ReadWtiteOnce), it's works fine.
Then I try to create PVC (PersistentVolumeClaim) with RWX (ReadWtiteMany) mode, but csi-provisioner and vsphere-csi-controller begin restarting with logs:

csi-provisioner:

controller.go:860] Started provisioner controller csi.vsphere.vmware.com_vsphere-csi-controller-568b9cb986-vpv7c_aa38ac7f-1fce-4826-bf03-38744d6cbf38!
controller.go:1337] provision "default/rwx-storage" class "vsphere-csi-sc": started
controller.go:568] skip translation of storage class for plugin: csi.vsphere.vmware.com
event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"rwx-storage", UID:"72ae5017-3dfc-4e2d-ba54-d228584064ab", APIVersion:"v1", ResourceVersion:"22586568", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/rwx-storage"
controller.go:1082] Temporary error received, adding PVC 72ae5017-3dfc-4e2d-ba54-d228584064ab to claims in progress
controller.go:934] Retrying syncing claim "72ae5017-3dfc-4e2d-ba54-d228584064ab", failure 0
controller.go:957] error syncing claim "72ae5017-3dfc-4e2d-ba54-d228584064ab": failed to provision volume with StorageClass "vsphere-csi-sc": rpc error: code = Unavailable desc = error reading from server: EOF
event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"rwx-storage", UID:"72ae5017-3dfc-4e2d-ba54-d228584064ab", APIVersion:"v1", ResourceVersion:"22586568", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "vsphere-csi-sc": rpc error: code = Unavailable desc = error reading from server: EOF
controller.go:1337] provision "default/rwx-storage" class "vsphere-csi-sc": started
controller.go:568] skip translation of storage class for plugin: csi.vsphere.vmware.com
connection.go:132] Lost connection to unix:///csi/csi.sock.
event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"rwx-storage", UID:"72ae5017-3dfc-4e2d-ba54-d228584064ab", APIVersion:"v1", ResourceVersion:"22586568", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/rwx-storage"
connection.go:87] Lost connection to CSI driver, exiting

vsphere-csi-controller

2024-01-13T15:56:11.197184179Z {"level":"info","time":"2024-01-13T15:56:11.197118888Z","caller":"vanilla/controller.go:2718","msg":"ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"f14c9ed4-b8b4-4994-8ea7-acc4e868637b"}
{"level":"info","time":"2024-01-13T15:56:31.050911804Z","caller":"vanilla/controller.go:1805","msg":"CreateVolume: called with args {Name:pvc-72ae5017-3dfc-4e2d-ba54-d228584064ab CapacityRange:required_bytes:11534336  VolumeCapabilities:[mount:<fs_type:\"ext4\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"fd9fdf44-45fa-4112-ae25-1b0efd285d4d"}
panic: runtime error: invalid memory address or nil pointer dereference
2024-01-13T15:56:31.059233875Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1b22a25]
2024-01-13T15:56:31.059236584Z 
2024-01-13T15:56:31.059238504Z goroutine 419 [running]:
2024-01-13T15:56:31.059240375Z sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).createFileVolume(0xc0000a20a0, {0x26af658, 0xc000247920}, 0xc0004ec2a0)
2024-01-13T15:56:31.059242305Z 	/build/pkg/csi/service/vanilla/controller.go:1736 +0xd05
2024-01-13T15:56:31.059244155Z sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume.func1()
2024-01-13T15:56:31.059245885Z 	/build/pkg/csi/service/vanilla/controller.go:1848 +0x3d7
2024-01-13T15:56:31.059247605Z sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume(0xc0000a20a0, {0x26af658, 0xc000b07b60}, 0xc0004ec2a0)
2024-01-13T15:56:31.059249395Z 	/build/pkg/csi/service/vanilla/controller.go:1858 +0x1bb
2024-01-13T15:56:31.059253975Z github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler({0x229bba0?, 0xc0000a20a0}, {0x26af658, 0xc000b07b60}, 0xc00022ef00, 0x0)
2024-01-13T15:56:31.059256265Z 	/go/pkg/mod/github.com/container-storage-interface/spec@v1.7.0/lib/go/csi/csi.pb.go:5671 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00022aa80, {0x26b6218, 0xc000d16d00}, 0xc00052fd40, 0xc000d48d20, 0x38db8a0, 0x0)
2024-01-13T15:56:31.059260375Z 	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283 +0xcfe
google.golang.org/grpc.(*Server).handleStream(0xc00022aa80, {0x26b6218, 0xc000d16d00}, 0xc00052fd40, 0x0)
	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620 +0xa2f
2024-01-13T15:56:31.059270455Z google.golang.org/grpc.(*Server).serveStreams.func1.2()
2024-01-13T15:56:31.059272495Z 	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922 +0x98
2024-01-13T15:56:31.059274495Z created by google.golang.org/grpc.(*Server).serveStreams.func1
2024-01-13T15:56:31.059276525Z 	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:920 +0x28a

Environment:

  • csi-vsphere version: 3.0.1
  • vsphere-cloud-controller-manager version: 1.24.5
  • Kubernetes version: RKE v1.24.17
  • vSphere version: 8.0.2
  • OS (e.g. from /etc/os-release): Oracle Linux Server 8.8
  • Kernel (e.g. uname -a): 5.15.0-106.131.4.el8uek.x86_64
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2024
@chethanv28
Copy link
Collaborator

@dzanto Did you disable multi-vcenter-csi-topology flag in internal-feature-states.csi.vsphere.vmware.com configmap after the driver was first initialized ?

@dzanto
Copy link
Author

dzanto commented Jan 17, 2024

multi-vcenter-csi-topology option absent in internal-feature-states.csi.vsphere.vmware.com configmap.

kind: ConfigMap
apiVersion: v1
metadata:
  annotations:
    meta.helm.sh/release-name: vsphere-csi
    meta.helm.sh/release-namespace: kube-system
  labels:
    app.kubernetes.io/managed-by: Helm
  name: internal-feature-states.csi.vsphere.vmware.com
  namespace: kube-system
data:
  async-query-volume: 'false'
  block-volume-snapshot: 'false'
  cnsmgr-suspend-create-volume: 'false'
  csi-auth-check: 'false'
  csi-migration: 'false'
  csi-windows-support: 'false'
  improved-csi-idempotency: 'false'
  improved-volume-topology: 'false'
  list-volumes: 'false'
  max-pvscsi-targets-per-vm: 'false'
  online-volume-extend: 'false'
  pv-to-backingdiskobjectid-mapping: 'false'
  topology-preferential-datastores: 'false'
  trigger-csi-fullsync: 'false'
  use-csinode-id: 'true'

I create custom StorageClass with csi.storage.k8s.io/fstype: nfs4 and crashes go away. Default StorageClass (from rancher helm chart) not contain this parameter.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: vsphere-nfs
parameters:
  csi.storage.k8s.io/fstype: nfs4
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

But when I create PVC, PV doesn't appear.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-client-pvc
spec:
  storageClassName: vsphere-nfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Mi

RWX work only when I create PVC and PV manually, how there https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/example/vanilla-k8s-RWM-filesystem-volumes/example-static-fileshare-provisioning.yaml

How automatically create PV?

@shalini-b
Copy link
Collaborator

Looks like the vSphere CSI driver was not deployed properly. vSphere CSI driver v3.0.1 has multi-vcenter-csi-topology feature gate set to true.
Refer to https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/v3.0.1/manifests/vanilla/vsphere-csi-driver.yaml#L164C39-L164C39

@dzanto
Copy link
Author

dzanto commented Jan 19, 2024

@shalini-b
Copy link
Collaborator

shalini-b commented Jan 19, 2024

The topology flag in provisioner is set to false by default in our YAML as well. It is only set to true when a customer chooses to use topology in their environment.

The multi-vcenter-csi-topology feature gate we are talking about is present in a configmap with name internal-feature-states.csi.vsphere.vmware.com in namespace vmware-system-csi. This should be set to true if you are using vSphere CSI driver v3.0.1

@dzanto
Copy link
Author

dzanto commented Jan 23, 2024

I added multi-vcenter-csi-topology: true to configmap, but it didn't help. Also I tried multi-vcenter-csi-topology: false.
vsphere-csi-controller again crashed after creating PVC.

@shalini-b
Copy link
Collaborator

Can you post the logs when you set multi-vcenter-csi-topology to true in configmap?

@dzanto
Copy link
Author

dzanto commented Jan 30, 2024

{"level":"info","time":"2024-01-30T06:39:53.042383033Z","caller":"vanilla/controller.go:1805","msg":"CreateVolume: called with args {Name:pvc-13e4061b-61d6-4f6a-ad6a-ef7d1425dc4e CapacityRange:required_bytes:10485760  VolumeCapabilities:[mount:<fs_type:\"nfs4\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"2177c87f-8ff3-406f-8ffa-b366a1d14a12"}
panic: runtime error: invalid memory address or nil pointer dereference
2024-01-30T06:39:53.059969372Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1a4f182]
2024-01-30T06:39:53.059974632Z 
2024-01-30T06:39:53.059977523Z goroutine 654 [running]:
2024-01-30T06:39:53.059980332Z sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common.(*AuthManager).GetFsEnabledClusterToDsMap(0x0, {0x26af658?, 0xc0000585b8?})
2024-01-30T06:39:53.059983223Z 	/build/pkg/csi/service/common/authmanager.go:137 +0x62
2024-01-30T06:39:53.059986273Z sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).createFileVolume(0xc00034a190, {0x26af658, 0xc000632300}, 0xc0001b2770)
2024-01-30T06:39:53.059989283Z 	/build/pkg/csi/service/vanilla/controller.go:1734 +0xcf3
2024-01-30T06:39:53.059992373Z sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume.func1()
2024-01-30T06:39:53.059995143Z 	/build/pkg/csi/service/vanilla/controller.go:1836 +0x2c5
2024-01-30T06:39:53.060001453Z sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume(0xc00034a190, {0x26af658, 0xc000fd8db0}, 0xc0001b2770)
2024-01-30T06:39:53.060004383Z 	/build/pkg/csi/service/vanilla/controller.go:1858 +0x1bb
2024-01-30T06:39:53.060007163Z github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler({0x229bba0?, 0xc00034a190}, {0x26af658, 0xc000fd8db0}, 0xc0003bb800, 0x0)
	/go/pkg/mod/github.com/container-storage-interface/spec@v1.7.0/lib/go/csi/csi.pb.go:5671 +0x170
2024-01-30T06:39:53.060013863Z google.golang.org/grpc.(*Server).processUnaryRPC(0xc000196a80, {0x26b6218, 0xc000bea9c0}, 0xc0002797a0, 0xc000a37860, 0x38db8a0, 0x0)
	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283 +0xcfe
google.golang.org/grpc.(*Server).handleStream(0xc000196a80, {0x26b6218, 0xc000bea9c0}, 0xc0002797a0, 0x0)
	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620 +0xa2f
2024-01-30T06:39:53.060029353Z google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:920 +0x28a

@Midaxess
Copy link

Hi, have you found a solution ?

I have the same issue with my clusters rke2 and k3s and the helm charts rancher-vsphere-csi:103.0.0+up3.0.2-rancher1

{"level":"info","time":"2024-03-15T15:34:05.026383613Z","caller":"vanilla/controller.go:2719","msg":"ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"91f15cdb-ed37-4014-b79e-ebd785e54684"}
{"level":"info","time":"2024-03-15T15:34:24.880189533Z","caller":"vanilla/controller.go:1806","msg":"CreateVolume: called with args {Name:pvc-2a06d694-903f-41da-85bb-1475e20d2ff9 CapacityRange:required_bytes:1073741824  VolumeCapabilities:[mount:<fs_type:\"nfs4\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > ] Parameters:map[] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}","TraceId":"a2fd7e91-f23e-44f9-8037-2a0c13595c03"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1b24e65]

goroutine 275 [running]:
sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).createFileVolume(0xc0000a1c20, {0x26b2d98, 0xc000874660}, 0xc0004e2380)
        /build/pkg/csi/service/vanilla/controller.go:1737 +0xd05
sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume.func1()
        /build/pkg/csi/service/vanilla/controller.go:1849 +0x3d7
sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/vanilla.(*controller).CreateVolume(0xc0000a1c20, {0x26b2d98, 0xc00085ce10}, 0xc0004e2380)
        /build/pkg/csi/service/vanilla/controller.go:1859 +0x1bb
github.com/container-storage-interface/spec/lib/go/csi._Controller_CreateVolume_Handler({0x229ed80?, 0xc0000a1c20}, {0x26b2d98, 0xc00085ce10}, 0xc0004fc660, 0x0)
        /go/pkg/mod/github.com/container-storage-interface/spec@v1.7.0/lib/go/csi/csi.pb.go:5671 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0002cfc00, {0x26b9978, 0xc000557ba0}, 0xc0002eea20, 0xc000875ad0, 0x38e08a0, 0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1283 +0xcfe
google.golang.org/grpc.(*Server).handleStream(0xc0002cfc00, {0x26b9978, 0xc000557ba0}, 0xc0002eea20, 0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:1620 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.2()
        /go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /go/pkg/mod/google.golang.org/grpc@v1.47.0/server.go:920 +0x28a

Anything else we need to know?:

The vSAN File Service is working because I tried to create a File NFS share and I able to mount it manually on a node of the k8s cluster

Environment:

  • csi-vsphere version: 3.0.2
  • Kubernetes version: RKE v1.25.13+rke2r1
  • vSphere version: 8.0.2
  • OS (e.g. from /etc/os-release): Ubuntu 22.04.3
  • Kernel (e.g. uname -a): 5.15.0-100-generic

@Midaxess
Copy link

Ok after few weeks I found the solution

Edit in your configMap :

csi-auth-check: 'false' -> csi-auth-check: 'true'

Restart Pods of the vSphere plugin and recreate PVC

@dzanto Let me know if this helped you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants