Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initiating ISCSI connection step takes ages #105

Open
Raoul555 opened this issue Mar 26, 2024 · 5 comments
Open

initiating ISCSI connection step takes ages #105

Raoul555 opened this issue Mar 26, 2024 · 5 comments

Comments

@Raoul555
Copy link

Raoul555 commented Mar 26, 2024

Describe the bug

Mounting a volume hosted by a PowerVault Dell disk arrays in kubernetes pods takes more than 1 minutes, using seagate-exos-x-csi driver.

To Reproduce

Create a storageClass:
`apiVersion: storage.k8s.io/v1
kind: StorageClass
provisioner: csi-exos-x.seagate.com # Check pkg/driver.go, Required for the plugin to recognize this storage class as handled by itself.
volumeBindingMode: Immediate # Prefer this value to avoid unschedulable pods (https://kubernetes.io/docs/concepts/storage/storage-classes/#volume-binding-mode)
allowVolumeExpansion: true
metadata:
name: dell-storage # Choose the name that fits the best with your StorageClass.
parameters:

Secrets name and namespace, they can be the same for provisioner, controller-publish and controller-expand sections.

csi.storage.k8s.io/provisioner-secret-name: seagate-exos-x-csi-secrets
csi.storage.k8s.io/provisioner-secret-namespace: seagate
csi.storage.k8s.io/controller-publish-secret-name: seagate-exos-x-csi-secrets
csi.storage.k8s.io/controller-publish-secret-namespace: seagate
csi.storage.k8s.io/controller-expand-secret-name: seagate-exos-x-csi-secrets
csi.storage.k8s.io/controller-expand-secret-namespace: seagate
csi.storage.k8s.io/fstype: ext4 # Desired filesystem
pool: A # Pool to use on the IQN to provision volumes
volPrefix: tools
storageProtocol: iscsi # The storage interface (iscsi, fc, sas) being used for storage i/o `

Then create pod with a persistent volume:
`apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: claim-test
spec:
accessModes:
- ReadWriteOnce
storageClassName: dell-storage
resources:
requests:
storage: 10Mi

apiVersion: v1
kind: Pod
metadata:
name: pod-test
spec:
nodeName: kube-tool-worker-01
containers:

  • image: alpine
    command: ["sleep", "3600"]
    name: pod-test
    volumeMounts:
    • mountPath: /vol
      name: volume
      volumes:
  • name: volume
    persistentVolumeClaim:
    claimName: claim-test `

The pod waits for it persistent volume to be mounted, but the PV takes mode than 1 minutes to be available.

Logs of one of seagate-exos-x-csi-node-server pod:

I0320 14:53:39.483341       1 driver.go:125] === [ROUTINE REQUEST] [0] /csi.v1.Node/NodePublishVolume (49730e675962) <0s> ===
I0320 14:53:39.483348       1 driver.go:132] === [ROUTINE START] [1] /csi.v1.Node/NodePublishVolume (49730e675962) <661ns> ===
I0320 14:53:39.483365       1 node.go:192] "NodePublishVolume call" volumeName="pro_52897734d4e9ed81d0b20c5ac87"
I0320 14:53:39.483391       1 iscsiNode.go:65] "iSCSI connection info:" iqn="iqn.1988-11.com.dell:01.array.bc305b5dd35b" portals=["10.14.11.201","10.14.11.202","10.14.11.203","10.14.12.201","10.14.12.202","10.14.12.203"]
I0320 14:53:39.483401       1 iscsiNode.go:68] "LUN:" lun=13
I0320 14:53:39.483407       1 iscsiNode.go:70] "initiating ISCSI connection..."
I0320 14:53:54.909789       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
I0320 14:54:08.048697       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
I0320 14:54:22.159686       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
I0320 14:54:26.466144       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
I0320 14:54:26.851810       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
I0320 14:54:32.982583       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
I0320 14:54:33.975557       1 node.go:96] >>> /csi.v1.Identity/Probe
I0320 14:54:34.831015       1 node.go:96] >>> /csi.v1.Node/NodeGetCapabilities
I0320 14:54:53.217063       1 iscsiNode.go:128] "attached device:" path="/dev/dm-0"
I0320 14:54:53.218059       1 iscsiNode.go:159] "saving ISCSI connection info" connectorInfoPath="/var/run/csi-exos-x.seagate.com/iscsi-pro_52897734d4e9ed81d0b20c5ac87.json"
I0320 14:54:53.226942       1 storageService.go:239] Creating ext4 filesystem on device /dev/dm-0
I0320 14:54:53.243779       1 storageService.go:333] isVolumeInUse: findmnt /dev/dm-0, err=exit status 1
I0320 14:54:53.243801       1 storageService.go:149] Checking filesystem (e2fsck -n /dev/dm-0) [Publish]
I0320 14:54:53.253809       1 storageService.go:283] "successfully mounted volume" targetPath="/var/lib/kubelet/pods/a8cc4167-7d7a-4350-a2c0-203f3ab82941/volumes/kubernetes.io~csi/pvc-fbc4e528-9773-4d4e-9ed8-1d0b20c5ac87/mount"
I0320 14:54:53.253831       1 driver.go:136] === [ROUTINE END] [0] /csi.v1.Node/NodePublishVolume (49730e675962) <1m13.77048279s> ===

Description of the created pv:

Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: csi-exos-x.seagate.com
Finalizers:      [kubernetes.io/pv-protection external-attacher/csi-exos-x-seagate-com]
StorageClass:    hub-saas-storage
Status:          Bound
Claim:           product/claim-test-1
Reclaim Policy:  Delete
Access Modes:    RWO
VolumeMode:      Filesystem
Capacity:        10Mi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            csi-exos-x.seagate.com
    FSType:            ext4
    VolumeHandle:      pro_1f30c09448884c5ffc3702c5f7e##iscsi##600c0ff0006e11815105fb6501000000
    ReadOnly:          false
    VolumeAttributes:      iqn=iqn.1988-11.com.dell:01.array.bc305b5dd35b
                           pool=A
                           portals=10.14.11.201,10.14.11.202,10.14.11.203,10.14.12.201,10.14.12.202,10.14.12.203
                           storage.kubernetes.io/csiProvisionerIdentity=1710850235982-8081-csi-exos-x.seagate.com
                           storageProtocol=iscsi
                           volPrefix=prod
Events:                <none>

Expected behavior

The PV should be available nearly immediately.

Storage System (please complete the following information):

  • Vendor: Dell
  • Model: PowerVault ME5012
  • Firmware Version: ME5.1.2.0.1

Environment:

  • Kubernetes version: Server Version: v1.27.10+rke2r1
  • Host OS: Ubuntu 22.04.4 LTS

Additional context

My k8s cluster is installed on 3 servers.
Servers to disks array are point to point connection, i.e. each server is physically directly connected to the storage.

  • server A has connectivity only to 10.14.11.201 and 10.14.12.201 storage ips
  • server B has connectivity only to 10.14.11.202 and 10.14.12.202 storage ips
  • server C has connectivity only to 10.14.11.203 and 10.14.12.203 storage ips
@seagate-chris
Copy link
Collaborator

I suspect the problem is related to the IP addresses you're using. Can you change the iSCSI IP addresses so the iSCSI ports don't all appear to be on the same 2 subnets? In other words, define 6 unique subnets, one for each point-to-point connection, so that if you try on host A to ping the iSCSI ports connected to host B, you'll get an "No route to host" error immediately. I think each host is trying to connect to all 6 iSCSI ports and will retry for a while on each port because they all appear to be on a locally-attached subnets.

@Raoul555
Copy link
Author

Raoul555 commented Mar 27, 2024

Well I've tried, and effectively without a default route, I've got a "No route to host" immediatly when node A ping a iSCSI port of node B.

But I need to set a default route. In that case the ICMP ping messages is routed to the default route, and so does not exit with "No route to host".

Does the csi driver rely only of a ping test to test which port it can use ?

So, I don't have a solution so far...

@seagate-chris
Copy link
Collaborator

Oops, I should've seen that coming. To fix that, you can use a special route like this (on server A, using your original B and C IPs as an example):

ip route unreachable 10.14.11.202/32 # block B's iSCSI ports
ip route unreachable 10.14.12.202/32
ip route unreachable 10.14.11.203/32 # block C's iSCSI ports
ip route unreachable 10.14.12.203/32

This will cause attempts by A to reach these ports to fail immediately rather than timeout after some delay, without requiring you to delete your default route. Of course, you'll need to do something similar on B and C.

The CSI driver doesn't care about which ports are reachable per se, but the iSCSI initiator automatically discovers all six ports and the ports that are unreachable will slow things down.

If this doesn't help (or doesn't help enough) please attach a complete set of host logs (node and controller logs, and kernel messages).

@Raoul555
Copy link
Author

Ah, that, this works.
Now I'm able to provision 16 volumes in 4 minutes.

Does this duration for volumes creation seem what is expected ? Or can it be even better?

On an other topic, the volumes are not suppressed on the disk array side, when I'm deleting the kubernetes pv and pvc. Is it normal ?

@David-T-White
Copy link
Collaborator

Hello, volume creation time can be influenced by many factors but 16 volumes in 4 minutes seems like reasonably expected performance.

On an other topic, the volumes are not suppressed on the disk array side, when I'm deleting the kubernetes pv and pvc. Is it normal ?

CSI managed volumes are expected to be visible on the array side. If they are persisting after the PV and PVC have been deleted from the cluster, that would not be normal behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants