rawhide kernel 6.10.0 >=20240514 - podman update device-read-bps = 0 #22701

edsantiago · 2024-05-14T16:16:16Z

Seen in OpenQA. No logs available, this is a weird thing that only records movies and I don't have the desire to hand-type all the error. It basically looks like

FAIL
device-read-bps
expected a lot
got 0

This is just a placeholder for now. Smells like kernel bug to me, but it could also be a bug on our end (including in tests). If I see this blowing up (as measured by openqa emails) I will explore further. Until then, nothing to do.

The text was updated successfully, but these errors were encountered:

edsantiago · 2024-05-16T16:16:17Z

Failed again, kernel-6.10.0-0.rc0.20240516git3c999d1ae3c7.5.fc41

Luap99 · 2024-05-16T16:56:07Z

Can you create simple reproducer? AFAIK cgroup setup depends podman -> crun -> systemd-> kernel so maybe check if the other components changed too.

edsantiago · 2024-05-16T17:09:32Z

Can you create simple reproducer?

That has been my goal, as you might have predicted. However, dnf --enablerepo=updates-testing upgrade kernel does not bring in any affected (6.10) kernel, only 6.9, and I'm much too lazy to hunt down all the 6.10 packages. But okay, I'll find some time to do so.

AdamWill · 2024-05-16T22:06:01Z

@edsantiago there is no testing repo for Rawhide, so if an update fails gating there isn't really a proper repo to get it from, unfortunately. You have to get it from Koji. You can use koji download-build --arch=x86_64 --arch=noarch <NVR> to download all the packages from the build, but for kernels that's a lot of packages, so I usually just cherry-pick the few packages I need to install from the web UI.

openQA does record logs, but we don't happen to pipe the output of this specific test command to a file at present. It would be easy to do that if it's useful, though.

@Luap99 it's the kernel that is causing this. The same test is passing just fine on every other Rawhide update; it fails only on kernel updates, which means the kernel is the cause.

Luap99 · 2024-05-17T08:16:19Z

Thanks @AdamWill, I guess then we have to get a simple reproducer and file a kernel bug.

edsantiago · 2024-05-20T14:00:58Z

I'm being lazy again: the failure is a 0514 kernel build. I see a 0517 koji build and have not seen any OpenQA error emails about it. Until I have reason to suspect otherwise, I'll assume the problem is fixed. (And will save myself the time of pulling the kernel and looking for a reproducer)

edsantiago · 2024-05-20T14:54:52Z

sigh... never mind. 0517 did fail in OpenQA.

Reproducer:

# uname -r
6.9.0-0.rc7.20240510git448b3fe5a0ea.62.fc41.x86_64
# dnf -y install podman-tests

# podman run -d --name foo quay.io/libpod/testimage:20240123 sleep inf
<cid>
# podman exec foo cat /sys/fs/cgroup/io.max
# podman update --device-read-bps=/dev/zero:10mb foo
<cid>
# podman exec foo cat /sys/fs/cgroup/io.max
1:5 rbps=10485760 wbps=max riops=max wiops=max    <<<<< THIS IS GOOD

Then:

# wget https://kojipkgs.fedoraproject.org//packages/kernel/6.10.0/0.rc0.20240517gitea5f6ad9ad96.6.fc41/x86_64/kernel{,-core,-modules,-modules-core}-6.10.0-0.rc0.20240517gitea5f6ad9ad96.6.fc41.x86_64.rpm
# dnf install kern*rpm; reboot

Then

# uname -r
6.10.0-0.rc0.20240517gitea5f6ad9ad96.6.fc41.x86_64
# podman rm -f -a
[repeat the podman run/update/exec from above]
1:5 rbps=0 wbps=0 riops=0 wiops=0       <<<<<< THIS IS NOT GOOD

edsantiago · 2024-05-20T15:18:24Z

Filed rhbz2281805

Luap99 · 2024-05-29T14:32:48Z

Does this still happen with 6.10 rc1?

edsantiago · 2024-05-29T14:36:49Z

If by rc1 you mean 6.10.0-0.rc1.17, then yes

edsantiago · 2024-05-29T14:38:25Z

Also 6.10.0-0.rc1.20240528git2bfcfd584ff5.18

Luap99 · 2024-05-29T15:16:22Z

a cli reproducer should be something like this

mkdir /sys/fs/cgroup/test-cgroup
echo "1:5 rbps=10485760" > /sys/fs/cgroup/test-cgroup/io.max
cat /sys/fs/cgroup/test-cgroup/io.max
rmdir /sys/fs/cgroup/test-cgroup

Luap99 · 2024-05-29T15:17:05Z

I tried to get a rawhide VM going to test myself install but seems like something with dnf is terribly broken there as I cannot install anything due checksum errors. I tried several VM's all fail in the same way...

AdamWill · 2024-05-29T15:22:24Z

huh, that seems odd? I'm running Rawhide here and not seeing anything like that, and our automated tests aren't either.

edsantiago · 2024-05-29T15:26:21Z

On 1mt, a minute or two ago, I saw a ton of red checksum errors but dnf install podman ended up successful.

AdamWill · 2024-05-29T15:28:23Z

I do see this mail, which might be relevant. I hadn't updated to that yet. But openQA did pass tests today...which includes doing quite a lot of package installs...

Luap99 · 2024-05-29T15:31:44Z

Yeah seems to be working now again, not sure what happened.

Luap99 · 2024-05-29T16:10:11Z

Tried 6.10.0-0.rc1.20240528git2bfcfd584ff5.18 and can reproduce with the shell commands above, you may need to add the io controller first on a fresh boot.

echo +io > /sys/fs/cgroup/cgroup.subtree_control
mkdir /sys/fs/cgroup/test-cgroup
echo "1:5 rbps=10485760" > /sys/fs/cgroup/test-cgroup/io.max
cat /sys/fs/cgroup/test-cgroup/io.max
rmdir /sys/fs/cgroup/test-cgroup

I think this must be reported to the kernel upstream, I don't see this getting solved just sitting in the fedora bugzilla.

AdamWill · 2024-05-29T20:44:15Z

well, @jmflinuxtx - the Fedora kernel maintainer - is aware of the issue, so I was kinda leaving it to him to report it to the appropriate upstream venues. I find it pretty impossible to know where to send kernel issues.

jmflinuxtx · 2024-05-29T21:10:32Z

Yess, I am aware, I passed this on to Waiman Long. He thought there was a patch for it and that turned out not to cover this case, so he was looking again. In the meantime, we just hit RC1 so bug fixes are coming in fast, and it is possible that someone else has a fix. Worst case, I can bisect later this week.

Commit bf20ab5 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") attempts to revert the code change introduced by commit cd5ab1b ("blk-throttle: add .low interface"). However, it leaves behind the bps_conf[] and iops_conf[] fields in the throtl_grp structure which aren't set anywhere in the new blk-throttle.c code but are still being used by tg_prfill_limit() to display the limits in io.max. Now io.max always displays the following values if a block queue is used: <m>:<n> rbps=0 wbps=0 riops=0 wiops=0 Fix this problem by removing bps_conf[] and iops_conf[] and use bps[] and iops[] instead to complete the revert. Fixes: bf20ab5 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") Reported-by: Justin Forbes <jforbes@redhat.com> Closes: containers/podman#22701 (comment) Signed-off-by: Waiman Long <longman@redhat.com>

Commit bf20ab5 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") attempts to revert the code change introduced by commit cd5ab1b ("blk-throttle: add .low interface"). However, it leaves behind the bps_conf[] and iops_conf[] fields in the throtl_grp structure which aren't set anywhere in the new blk-throttle.c code but are still being used by tg_prfill_limit() to display the limits in io.max. Now io.max always displays the following values if a block queue is used: <m>:<n> rbps=0 wbps=0 riops=0 wiops=0 Fix this problem by removing bps_conf[] and iops_conf[] and use bps[] and iops[] instead to complete the revert. Fixes: bf20ab5 ("blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW") Reported-by: Justin Forbes <jforbes@redhat.com> Closes: containers/podman#22701 (comment) Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240530134547.970075-1-longman@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>

edsantiago changed the title ~~kernel-6.10.0-0.rc0.20240514gita5131c3fdf26.2.fc41 - podman update device-read-bps = 0~~ rawhide kernel 6.10.0 >=20240514 - podman update device-read-bps = 0 May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rawhide kernel 6.10.0 >=20240514 - podman update device-read-bps = 0 #22701

rawhide kernel 6.10.0 >=20240514 - podman update device-read-bps = 0 #22701

edsantiago commented May 14, 2024

edsantiago commented May 16, 2024

Luap99 commented May 16, 2024

edsantiago commented May 16, 2024

AdamWill commented May 16, 2024

Luap99 commented May 17, 2024

edsantiago commented May 20, 2024

edsantiago commented May 20, 2024

edsantiago commented May 20, 2024 •

edited

Luap99 commented May 29, 2024

edsantiago commented May 29, 2024

edsantiago commented May 29, 2024

Luap99 commented May 29, 2024

Luap99 commented May 29, 2024

AdamWill commented May 29, 2024

edsantiago commented May 29, 2024

AdamWill commented May 29, 2024

Luap99 commented May 29, 2024

Luap99 commented May 29, 2024

AdamWill commented May 29, 2024

jmflinuxtx commented May 29, 2024

rawhide kernel 6.10.0 >=20240514 - podman update device-read-bps = 0 #22701

rawhide kernel 6.10.0 >=20240514 - podman update device-read-bps = 0 #22701

Comments

edsantiago commented May 14, 2024

edsantiago commented May 16, 2024

Luap99 commented May 16, 2024

edsantiago commented May 16, 2024

AdamWill commented May 16, 2024

Luap99 commented May 17, 2024

edsantiago commented May 20, 2024

edsantiago commented May 20, 2024

edsantiago commented May 20, 2024 • edited

Luap99 commented May 29, 2024

edsantiago commented May 29, 2024

edsantiago commented May 29, 2024

Luap99 commented May 29, 2024

Luap99 commented May 29, 2024

AdamWill commented May 29, 2024

edsantiago commented May 29, 2024

AdamWill commented May 29, 2024

Luap99 commented May 29, 2024

Luap99 commented May 29, 2024

AdamWill commented May 29, 2024

jmflinuxtx commented May 29, 2024

edsantiago commented May 20, 2024 •

edited