-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iSCSI volumes mounted with root permissions on OpenShift 4.5 #556
Comments
My guess is that you didn't specify the fsType parameter in your storage class. Delete the existing storage class (no impact to existing volumes), add
to your storage class yaml and re-create. Some backgroundWhen the pod starts, it can specify an fsGroup as part of the securityContext. Kubernetes will go ahead and apply this group ID to all files/folders on the volume (chown/chmod) and add the group as a supplemental group to the user that the app runs with. This ensure that permissions are correctly set. As this is not possible in all scenarios (based on what storage is used), Kubernetes tries to detect if this should run or not. One of the indicators is the fsType. If that exists Kubernetes assumes that it will be able to chown/chmod. In older releases that was set to ext4 by default. This has changed to "" with more recent versions of the CSI sidecar containers that are included in Trident 20.07.1 and higher. So on old versions if you didn't specify fstype in your storage class, the default "ext4" was used and everything worked. In more recent versions there is an empty default, so if you don't have it in your storage class this will be blank - causing Kubernetes to believe there is no fsType and skipping the permission step. |
I deleted and recreated the iscsi storage class adding the parameter 'fsType: ext4':
I restarted one of the Prometheus and elasticsearch pods afterwards. But the issue remains.
Did I miss one step? |
Sorry I wasn't clear about this. The parameters of the storage class are only applied to new volumes you create. Any existing PV will not get this set. I'm afraid you cannot modifying this for an already existing PV. Could you please try with an new PVC/PV? |
Hello, that solves the issue! Thanks a lot! After adding Same for the elasticsearch pods. Except that I didn't had to recreate the PVCs. They were created automatically. I verified the permissions of the iSCSI file systems inside the pods. They are no longer owned by |
@timvandevoort this is related to how Kubernetes applies fsGroups. As can be seen here: https://github.com/kubernetes/kubernetes/blob/f137c4777095b3972e2dd71a01365d47be459389/pkg/volume/csi/csi_mounter.go#L415
As long as your StorageClass has a I am marking this as issue as "Closed". Thanks for using Trident! |
With version 2.0 the external provisioner has removed the default fsType [1]. With no filesystem set, Kubernetes does not set the group ID and this causes volumes are mounted with root permissions only. Some distributions like OpenShift disallow access in this case. [1] https://github.com/kubernetes-csi/external-provisioner/blob/master/CHANGELOG/CHANGELOG-2.0.md#changelog-since-v160 [2] NetApp/trident#556 (comment)
With version 2.0 the external provisioner has removed the default fsType [1]. With no filesystem set, Kubernetes does not set the group ID and this causes volumes are mounted with root permissions only. Some distributions like OpenShift disallow access in this case. [1] https://github.com/kubernetes-csi/external-provisioner/blob/master/CHANGELOG/CHANGELOG-2.0.md#changelog-since-v160 [2] NetApp/trident#556 (comment)
Describe the bug
A new OpenShift 4.5 cluster has been deployed together with NetApp Trident installer v20.10.1.
NFS storage served by NetApp is working fine on the OpenShift cluster. But iSCSI volumes are resulting in 'permission denied' errors. Because of this, prometheus and elasticsearch pods cannot be started.
$ oc logs -f prometheus-k8s-0 -c prometheus
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:330 msg="Starting Prometheus" version="(version=2.15.2, branch=rhaos-4.5-rhel-7, revision=c3b41963fbe48114e54396ac05b56b02cb3e4a0a)"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:331 build_context="(go=go1.13.4, user=root@3d590aab9ed6, date=20200810-04:36:52)"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:332 host_details="(Linux 4.18.0-193.14.3.el8_2.x86_64 #1 SMP Mon Jul 20 15:02:29 UTC 2020 x86_64 prometheus-k8s-0 (none))"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:333 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-03-26T10:26:30.595Z caller=main.go:334 vm_limits="(soft=unlimited, hard=unlimited)"
level=error ts=2021-03-26T10:26:30.595Z caller=query_logger.go:85 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open /prometheus/queries.active: permission denied"
panic: Unable to create mmap-ed active query log
$ oc logs -f elasticsearch-cdm-6uixr0l5-1-574cf77c5c-fbbgj -c elasticsearch
...
[2021-03-26 10:27:10,098][INFO ][container.run ] ES_JAVA_OPTS: ' -Xms4096m -Xmx4096m -XX:HeapDumpPath=/elasticsearch/persistent/heapdump.hprof -Xloggc:/elasticsearch/persistent/elasticsearch/logs/gc.log -XX:ErrorFile=/elasticsearch/persistent/elasticsearch/logs/error.log'
[2021-03-26 10:27:10,099][INFO ][container.run ] Checking if Elasticsearch is ready
mkdir: cannot create directory '/elasticsearch/persistent/elasticsearch': Permission denied
We figured out the iSCSI volumes are mounted with root permissions on the OpenShift nodes:
sh-4.4# df -h | grep pvc-17029cbf-27eb-4f29-bad2-e0608a518072
/dev/mapper/3600a098056303030303f4f77477a3276 9.8G 37M 9.3G 1% /var/lib/kubelet/pods/565ca5b8-d1fd-4dc5-b118-cf224e226dc1/volumes/kubernetes.io~csi/pvc-17029cbf-27eb-4f29-bad2-e0608a518072/mount
sh-4.4# ls -lRt /var/lib/kubelet/pods/565ca5b8-d1fd-4dc5-b118-cf224e226dc1/volumes/kubernetes.io
csi/pvc-17029cbf-27eb-4f29-bad2-e0608a518072/mountcsi/pvc-17029cbf-27eb-4f29-bad2-e0608a518072/mount:/var/lib/kubelet/pods/565ca5b8-d1fd-4dc5-b118-cf224e226dc1/volumes/kubernetes.io
total 20
drwxr-xr-x. 2 root root 4096 Mar 24 16:46 prometheus-db <=====================================
drwx------. 2 root root 16384 Mar 24 11:25 lost+found
On another OpenShift 4.5 cluster that doesn't have the issue, the prometheus volumes are mounted like this:
sh-4.4# df -h | grep pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd
/dev/mapper/3600a098056303030302b4f7733775552 15G 9.5G 4.6G 68% /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io
csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mountcsi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mountsh-4.4# ls -lRt /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io
/var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount:
total 20
drwxrwsr-x. 32 root 1000260000 4096 Mar 25 16:00 prometheus-db <==============================
drwxrws---. 2 root 1000260000 16384 Oct 16 15:28 lost+found
The only difference here is that Trident v20.07.0 is used. Which is an older version compared to v20.10.1.
Environment
Provide accurate information about the environment to help us reproduce the issue.
To Reproduce
Creating a test pod with an iscsi pvc also fails to start.
'oc debug pod/test_pod' shows the iscsi volume is mounted but no permission to write on the file system.
'oc debug pod/test_pod --as-root' allows us to write files/directories inside the iscsi file system.
This is not expected behavior. OpenShift containers should not be running via the root account.
Expected behavior
On another OpenShift 4.5 cluster that doesn't have the issue, the prometheus volumes are mounted like this:
sh-4.4# df -h | grep pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd
/dev/mapper/3600a098056303030302b4f7733775552 15G 9.5G 4.6G 68% /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io
csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mountcsi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mountsh-4.4# ls -lRt /var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io
/var/lib/kubelet/pods/bdb57dac-8202-4ec2-8c97-2f9367c7ed40/volumes/kubernetes.io~csi/pvc-0bad4ec4-2e30-472b-913b-28a29cfa0bcd/mount:
total 20
drwxrwsr-x. 32 root 1000260000 4096 Mar 25 16:00 prometheus-db <==============================
drwxrws---. 2 root 1000260000 16384 Oct 16 15:28 lost+found
The only difference is that Trident v20.07.0 is used on this OpenShift cluster, instead of v20.10.1.
Additional context
NetApp support case: 2008704377 -> still no feedback received
The text was updated successfully, but these errors were encountered: