Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n" #349

Open
sfxworks opened this issue Nov 28, 2023 · 15 comments

Comments

@sfxworks
Copy link

Not sure how exactly this is happening or what the source of the issue is, but kubelet is reporting no space left on devic when trying to mount this zfs iscsi pod against a truenas scale pool. Plently of space in the pool, zvol and node. Not sure what's going on here.

  Warning  FailedMount  16m (x622 over 21h)  kubelet  MountVolume.MountDevice failed for volume "pvc-d30fdf1e-f843-4f09-b961-9ae3f5f7c7fe" : rpc error: code = Internal desc = {"code":1,"stdout":"meta-data=/dev/sdm               isize=512    agcount=32, agsize=32767999 blks\n         =                       sectsz=512   attr=2, projid32bit=1\n         =                       crc=1        finobt=1, sparse=1, rmapbt=0\n         =                       reflink=1    bigtime=0\ndata     =                       bsize=4096   blocks=1048575968, imaxpct=5\n         =                       sunit=1      swidth=4 blks\nnaming   =version 2              bsize=4096   ascii-ci=0, ftype=1\nlog      =internal log           bsize=4096   blocks=511999, version=2\n         =                       sectsz=512   sunit=1 blks, lazy-count=1\nrealtime =none                   extsz=4096   blocks=0, rtextents=0\n","stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n","timeout":false}
@sfxworks
Copy link
Author

I mean, its there and mounted ok

[root@epyc7713 ~]# ls -lah /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/0ac47b27eb0001f9ada27a050b580f00eb74e8053272308902d4d0d1078dde3c/globalmount/
total 4.0K
drwxrwsr-x 3 root adm    20 Oct 22 01:45 .
drwxr-x--- 3 root root 4.0K Nov 13 10:31 ..
drwx--S--- 3  999 adm    18 Oct 22 01:45 pgdata
[root@epyc7713 ~]# lsblk | grep sdg
sdg           8:96   0     1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/0ac47b27eb0001f9ada27a050b580f00eb74e8053272308902d4d0d1078dde3c/globalmount

Though it just looks like the csi driver tries to run xfsgrow on it and fails

@travisghansen
Copy link
Member

Filesystem grow operations are attempted by democratic-csi every time a volume is 'staged' on a node. Was that an intermittent issue? Seems pretty odd..

@sfxworks
Copy link
Author

sfxworks commented Nov 28, 2023

Unfortunately it is occurring every time, but only to certain nodes, and only certain pods on those certain nodes. I'm currently working around it by switching to ext4 since I don't see those grow operations in https://github.com/democratic-csi/democratic-csi/blob/master/src/utils/filesystem.js#L728 and things are ok.

I will say between my homelab and colo cluster, the nodes that have been affected have admittedly been through forceful power operations.

@sfxworks
Copy link
Author

I tried checking if this was related by manually moving files to /tmp and back and restarting the pod, but to no success https://xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_receive_No_space_left_on_device_after_xfs_growfs.3F

@travisghansen
Copy link
Member

Hmm, was there an unclean shutdown or something? As any FYI ext4 also resizes upon mount, that code simply does the same thing for ext3, 4, and 4dev.

@sfxworks
Copy link
Author

Unclean, yes. Though the files are still there so there was no data corruption as far as I can tell.

@travisghansen
Copy link
Member

Can you do a df on the mount point? Also what os and kernel version is running?

@sfxworks
Copy link
Author

sfxworks commented Nov 30, 2023

kubectl describe pod -n harbor harbor-jobservice-cb75c9878-b5glj
...
  Warning  FailedMount  4m39s (x2224 over 3d4h)  kubelet  MountVolume.MountDevice failed for volume "pvc-ff03bf6c-dfea-47d1-8d6d-e8f9ad473c5b" : rpc error: code = Internal desc = {"code":1,"stdout":"meta-data=/dev/sdf               isize=512    agcount=8, agsize=32767 blks\n         =                       sectsz=512   attr=2, projid32bit=1\n         =                       crc=1        finobt=1, sparse=1, rmapbt=0\n         =                       reflink=1    bigtime=0\ndata     =                       bsize=4096   blocks=262136, imaxpct=25\n         =                       sunit=1      swidth=4 blks\nnaming   =version 2              bsize=4096   ascii-ci=0, ftype=1\nlog      =internal log           bsize=4096   blocks=1032, version=2\n         =                       sectsz=512   sunit=1 blks, lazy-count=1\nrealtime =none                   extsz=4096   blocks=0, rtextents=0\n","stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n","timeout":false}
[root@epyc7713 ~]# lsblk | grep sdf
sdf           8:80   0     1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount

[root@epyc7713 ~]# ls -lah /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount/*.log | wc -l
737

[root@epyc7713 ~]# df /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sdf         1044416 43612   1000804   5% /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount

[root@epyc7713 ~]# uname -r
6.1.63-1-lts

[root@epyc7713 ~]# cat /etc/os-release 
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo

This has also happened on arch on kernel kernel 6.5.8-arch1-1 and 6.1.54-1-lts

@sfxworks
Copy link
Author

Agh my photoprism pods have the same issue as of late. I did not force restart. This node had to go through a few clean restarts to get an amd graphics driver working for rocm.

NAME                          READY   STATUS              RESTARTS   AGE   IP       NODE       NOMINATED NODE   READINESS GATES
mariadb-0                     0/1     ContainerCreating   4          14d   <none>   epyc7713   <none>           <none>
photoprism-64fb9b4745-22sgj   0/1     ContainerCreating   4          14d   <none>   epyc7713   <none>           <none>

So the issue now triggers with shutdown -r. Though this node was not cordon'd and drained first, systemd should have stopped the process using the volumes and any iscsi volumes should have been unmounted.

@sfxworks
Copy link
Author

sfxworks commented Dec 7, 2023

Any update on this? I'm migrating to etcd so far with success. Seems to be some issue with the xfs growfs. It's broken multiple clusters of mine.

@desmo999r
Copy link

Hello,

Got the exact same issue. It happens with brand new volumes.
I managed to work around this issue by unmounting the iscsi drive on the node where the container is starting and formatting the volume again.

I run K3S on raspberry pies with the latest democratic-csi (v1.8.4):

root@kmaster:~# uname -r
6.1.21-v8+

root@kmaster:~# k3s --version
k3s version v1.29.1+k3s2 (57482a1c)
go version go1.21.6

root@kmaster:~# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

I also take the opportunity to thank you for the democratic-csi driver. It's really like I have my own datacenter at home and I have quiet some fun.

@travisghansen
Copy link
Member

Does this help at all: https://superuser.com/questions/816627/xfs-incorrect-statement-of-no-space-left-on-device

As an FYI, ext4 also does a grow operation each mount as well.

@desmo999r
Copy link

I experimented a bit more.
I'm running TrueNAS-SCALE-23.10.1.3 and I changed driver.config.iscsi.extentBlocksize from 512 to 4096 in the values.yaml file I feed to democratic-csi helm chart.

Now xfs_grow does not complain anymore.

Not sure to understand exactly what it changed...

@travisghansen
Copy link
Member

To be clear that would only impact new volumes, did you delete and recreate the volume for testing?

@desmo999r
Copy link

Yes it's what I did. I changed the extentBlocksize setting and then I deleted and recreated the volume again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants