stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n" #349

sfxworks · 2023-11-28T06:55:19Z

Not sure how exactly this is happening or what the source of the issue is, but kubelet is reporting no space left on devic when trying to mount this zfs iscsi pod against a truenas scale pool. Plently of space in the pool, zvol and node. Not sure what's going on here.

  Warning  FailedMount  16m (x622 over 21h)  kubelet  MountVolume.MountDevice failed for volume "pvc-d30fdf1e-f843-4f09-b961-9ae3f5f7c7fe" : rpc error: code = Internal desc = {"code":1,"stdout":"meta-data=/dev/sdm               isize=512    agcount=32, agsize=32767999 blks\n         =                       sectsz=512   attr=2, projid32bit=1\n         =                       crc=1        finobt=1, sparse=1, rmapbt=0\n         =                       reflink=1    bigtime=0\ndata     =                       bsize=4096   blocks=1048575968, imaxpct=5\n         =                       sunit=1      swidth=4 blks\nnaming   =version 2              bsize=4096   ascii-ci=0, ftype=1\nlog      =internal log           bsize=4096   blocks=511999, version=2\n         =                       sectsz=512   sunit=1 blks, lazy-count=1\nrealtime =none                   extsz=4096   blocks=0, rtextents=0\n","stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n","timeout":false}

The text was updated successfully, but these errors were encountered:

sfxworks · 2023-11-28T09:47:56Z

I mean, its there and mounted ok

[root@epyc7713 ~]# ls -lah /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/0ac47b27eb0001f9ada27a050b580f00eb74e8053272308902d4d0d1078dde3c/globalmount/
total 4.0K
drwxrwsr-x 3 root adm    20 Oct 22 01:45 .
drwxr-x--- 3 root root 4.0K Nov 13 10:31 ..
drwx--S--- 3  999 adm    18 Oct 22 01:45 pgdata
[root@epyc7713 ~]# lsblk | grep sdg
sdg           8:96   0     1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/0ac47b27eb0001f9ada27a050b580f00eb74e8053272308902d4d0d1078dde3c/globalmount

Though it just looks like the csi driver tries to run xfsgrow on it and fails

travisghansen · 2023-11-28T11:42:21Z

Filesystem grow operations are attempted by democratic-csi every time a volume is 'staged' on a node. Was that an intermittent issue? Seems pretty odd..

sfxworks · 2023-11-28T19:51:39Z

Unfortunately it is occurring every time, but only to certain nodes, and only certain pods on those certain nodes. I'm currently working around it by switching to ext4 since I don't see those grow operations in https://github.com/democratic-csi/democratic-csi/blob/master/src/utils/filesystem.js#L728 and things are ok.

I will say between my homelab and colo cluster, the nodes that have been affected have admittedly been through forceful power operations.

sfxworks · 2023-11-28T19:53:02Z

I tried checking if this was related by manually moving files to /tmp and back and restarting the pod, but to no success https://xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_receive_No_space_left_on_device_after_xfs_growfs.3F

travisghansen · 2023-11-28T20:03:26Z

Hmm, was there an unclean shutdown or something? As any FYI ext4 also resizes upon mount, that code simply does the same thing for ext3, 4, and 4dev.

sfxworks · 2023-11-28T21:51:54Z

Unclean, yes. Though the files are still there so there was no data corruption as far as I can tell.

travisghansen · 2023-11-28T21:59:58Z

Can you do a df on the mount point? Also what os and kernel version is running?

sfxworks · 2023-11-30T14:09:29Z

kubectl describe pod -n harbor harbor-jobservice-cb75c9878-b5glj
...
  Warning  FailedMount  4m39s (x2224 over 3d4h)  kubelet  MountVolume.MountDevice failed for volume "pvc-ff03bf6c-dfea-47d1-8d6d-e8f9ad473c5b" : rpc error: code = Internal desc = {"code":1,"stdout":"meta-data=/dev/sdf               isize=512    agcount=8, agsize=32767 blks\n         =                       sectsz=512   attr=2, projid32bit=1\n         =                       crc=1        finobt=1, sparse=1, rmapbt=0\n         =                       reflink=1    bigtime=0\ndata     =                       bsize=4096   blocks=262136, imaxpct=25\n         =                       sunit=1      swidth=4 blks\nnaming   =version 2              bsize=4096   ascii-ci=0, ftype=1\nlog      =internal log           bsize=4096   blocks=1032, version=2\n         =                       sectsz=512   sunit=1 blks, lazy-count=1\nrealtime =none                   extsz=4096   blocks=0, rtextents=0\n","stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n","timeout":false}

[root@epyc7713 ~]# lsblk | grep sdf
sdf           8:80   0     1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount

[root@epyc7713 ~]# ls -lah /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount/*.log | wc -l
737

[root@epyc7713 ~]# df /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sdf         1044416 43612   1000804   5% /var/lib/kubelet/plugins/kubernetes.io/csi/org.democratic-csi.iscsi/cee870bc14447f00d4f3097179f83477fedfb79cd785fcaa73311b3c4023e060/globalmount

[root@epyc7713 ~]# uname -r
6.1.63-1-lts

[root@epyc7713 ~]# cat /etc/os-release 
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo

This has also happened on arch on kernel kernel 6.5.8-arch1-1 and 6.1.54-1-lts

sfxworks · 2023-11-30T14:15:54Z

Agh my photoprism pods have the same issue as of late. I did not force restart. This node had to go through a few clean restarts to get an amd graphics driver working for rocm.

NAME                          READY   STATUS              RESTARTS   AGE   IP       NODE       NOMINATED NODE   READINESS GATES
mariadb-0                     0/1     ContainerCreating   4          14d   <none>   epyc7713   <none>           <none>
photoprism-64fb9b4745-22sgj   0/1     ContainerCreating   4          14d   <none>   epyc7713   <none>           <none>

So the issue now triggers with shutdown -r. Though this node was not cordon'd and drained first, systemd should have stopped the process using the volumes and any iscsi volumes should have been unmounted.

sfxworks · 2023-12-07T09:14:14Z

Any update on this? I'm migrating to etcd so far with success. Seems to be some issue with the xfs growfs. It's broken multiple clusters of mine.

desmo999r · 2024-02-16T08:44:18Z

Hello,

Got the exact same issue. It happens with brand new volumes.
I managed to work around this issue by unmounting the iscsi drive on the node where the container is starting and formatting the volume again.

I run K3S on raspberry pies with the latest democratic-csi (v1.8.4):

root@kmaster:~# uname -r
6.1.21-v8+

root@kmaster:~# k3s --version
k3s version v1.29.1+k3s2 (57482a1c)
go version go1.21.6

root@kmaster:~# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

I also take the opportunity to thank you for the democratic-csi driver. It's really like I have my own datacenter at home and I have quiet some fun.

travisghansen · 2024-02-16T22:49:26Z

Does this help at all: https://superuser.com/questions/816627/xfs-incorrect-statement-of-no-space-left-on-device

As an FYI, ext4 also does a grow operation each mount as well.

desmo999r · 2024-02-19T17:41:13Z

I experimented a bit more.
I'm running TrueNAS-SCALE-23.10.1.3 and I changed driver.config.iscsi.extentBlocksize from 512 to 4096 in the values.yaml file I feed to democratic-csi helm chart.

Now xfs_grow does not complain anymore.

Not sure to understand exactly what it changed...

travisghansen · 2024-02-22T14:07:45Z

To be clear that would only impact new volumes, did you delete and recreate the volume for testing?

desmo999r · 2024-02-22T21:00:16Z

Yes it's what I did. I changed the extentBlocksize setting and then I deleted and recreated the volume again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n" #349

stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n" #349

sfxworks commented Nov 28, 2023

sfxworks commented Nov 28, 2023

travisghansen commented Nov 28, 2023

sfxworks commented Nov 28, 2023 •

edited

sfxworks commented Nov 28, 2023

travisghansen commented Nov 28, 2023

sfxworks commented Nov 28, 2023

travisghansen commented Nov 28, 2023

sfxworks commented Nov 30, 2023 •

edited

sfxworks commented Nov 30, 2023

sfxworks commented Dec 7, 2023

desmo999r commented Feb 16, 2024

travisghansen commented Feb 16, 2024

desmo999r commented Feb 19, 2024

travisghansen commented Feb 22, 2024

desmo999r commented Feb 22, 2024

stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n" #349

stderr":"xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: No space left on device\n" #349

Comments

sfxworks commented Nov 28, 2023

sfxworks commented Nov 28, 2023

travisghansen commented Nov 28, 2023

sfxworks commented Nov 28, 2023 • edited

sfxworks commented Nov 28, 2023

travisghansen commented Nov 28, 2023

sfxworks commented Nov 28, 2023

travisghansen commented Nov 28, 2023

sfxworks commented Nov 30, 2023 • edited

sfxworks commented Nov 30, 2023

sfxworks commented Dec 7, 2023

desmo999r commented Feb 16, 2024

travisghansen commented Feb 16, 2024

desmo999r commented Feb 19, 2024

travisghansen commented Feb 22, 2024

desmo999r commented Feb 22, 2024

sfxworks commented Nov 28, 2023 •

edited

sfxworks commented Nov 30, 2023 •

edited