Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xbian repeatedly crash: Kernel panic - system is deadlocked on memory. swapper/0 tainted , HW: BCM2711 #928

Open
slrslr opened this issue May 3, 2023 · 11 comments

Comments

@slrslr
Copy link

slrslr commented May 3, 2023

Linux xbian 6.1.24+ ... armv7l
XBian 11.0 - Bullseye - 20230419-0 - Bleeding Edge, 2012-2023
Raspberry Pi 4

I updated Xbian software a few times in last month (Inside Kodi, System update...) and setup 2 new Kodi addons from 2 new repository.

Every couple of days (i think that one day it happened multiple times) i find out internet stop working in home LAN, even unable to connect/ping router or Xbian. Router restart not helps and i find out Xbian deadlocked as per the screenshot attached:

Xbian kernel panic deadlocked on memory - internet fail
(this deadklock is what happen repeatedly as mentioned)

So i power off/on the Xbian device. Then somehow router device/internet start working.

@mkreisl
Copy link
Contributor

mkreisl commented May 3, 2023

I recommend you to install the previous version of the kernel package (linux-image-bcm2836) to see if the kernel panic disappears with it

@slrslr
Copy link
Author

slrslr commented May 4, 2023

install the previous version of the kernel package (linux-image-bcm2836)

# apt install linux-image-bcm2836*
linux-image-bcm2836 is already the newest version (6.1.24+-1681410943)

# apt search linux-image-bcm2*

linux-image-bcm2836/stable,now 6.1.24+-1681410943 armhf [installed]
  Latest XBian kernel (rpi2/6.1.y 6.1.24+)

linux-image-bcm2837/stable 6.1.24+-1681414802 armhf
  Latest XBian kernel (rpi3/6.1.y 6.1.24+)

# dpkg --list 'linux-image-*'

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                Version            Architecture Description
+++-===================-==================-============-========================================
un  linux-image-armmp   <none>             <none>       (no description available)
ii  linux-image-bcm2836 6.1.24+-1681410943 armhf        Latest XBian kernel (rpi2/6.1.y 6.1.24+)
un  linux-image-bcm2837 <none>             <none>       (no description available)

Which command to run to install previous version please and what to do if issue (not) appear again?

@mkreisl
Copy link
Contributor

mkreisl commented May 4, 2023

Why am I not surprised?
I find it sad that nowadays fewer and fewer are able or want to be able to solve problems themselves.
There are certainly countless examples on the net of how to do this.

Here again in short form:

apt-cache policy linux-image-bcm2836
gives you a list of all available version of the package

sudo apt-get install linux-image-bcm2836=<theversionyouwant>
installs the desired version

sudo apt-mark hold linux-image-bcm2836
protects the package from being overwritten

@mkreisl
Copy link
Contributor

mkreisl commented May 7, 2023

I assume the cause is the same problem as in this issue: raspberrypi/linux#5395

Kernel 6.1 has a new feature called MGLRU, this seems to be very buggy, but is unfortunately enabled by default
I have not been able to detect this until today, maybe because I am running the 64 bit kernel on my development Pi4

But today I was working with an installation that had the 32bit 6.1.24 kernel installed and I got these kernel panics all the time

To disable MGLRU, a simple command
sudo echo 0 > /sys/kernel/mm/lru_gen/enabled
should disable this feature on a running system

@yuzhaogoogle
Copy link

I assume the cause is the same problem as in this issue: raspberrypi/linux#5395

Kernel 6.1 has a new feature called MGLRU, this seems to be very buggy, but is unfortunately enabled by default I have not been able to detect this until today, maybe because I am running the 64 bit kernel on my development Pi4

There is only one known problem: MGLRU caused OOM kills on Pi 4 when running 32-bit kernels, because Pi 4 has to use CONFIG_ VMSPLIT_3G. This was fixed by disabling MGLRU by default on Pi 4 32-bit kernels. The recommendation is to switch to 64-bit kernels on Pi 4, which has MGLRU on by default.

See raspberrypi/linux#5395 (comment).

@slrslr
Copy link
Author

slrslr commented Jul 8, 2023

To disable MGLRU, a simple command sudo echo 0 > /sys/kernel/mm/lru_gen/enabled should disable this feature on a running system

I have tried this and likely did reboot after which it crashed again and made router not respond to most of LAN computers during the crash, with same or similar kernel panic, then after boot i have found that the lru_gen option was reset back to:

cat /sys/kernel/mm/lru_gen/enabled
0x0001

I have failed to find how to make "0x0000" persistent/survive reboot. Can you suggest command for this please?

@yuzhaogoogle
Copy link

To disable MGLRU, a simple command sudo echo 0 > /sys/kernel/mm/lru_gen/enabled should disable this feature on a running system

I have tried this and likely did reboot after which it crashed again and made router not respond to most of LAN computers during the crash, with same or similar kernel panic, then after boot i have found that the lru_gen option was reset back to:

Your crash doesn't seem to be related to this issue.

cat /sys/kernel/mm/lru_gen/enabled 0x0001

I have failed to find how to make "0x0000" persistent/survive reboot. Can you suggest command for this please?

The latest 32-bit kernel disabled MGLRU by default. If you are using the 32-bit kernel, please make sure it's the latest. And please also make sure you don't have modifications to the initscript or services that enable MGLRU, e.g., grep -r lru_gen /etc/.

@slrslr
Copy link
Author

slrslr commented Jul 9, 2023

The latest 32-bit kernel disabled MGLRU by default.

# uname -r
6.1.24+
# uname -m
armv7l
People there says it means i am on 32bit kernel, where you say that MGLRU should be disabled in latest kernel (i am unsure if i am on latest version and if/how @mkreisl suggests to upgrade/proceed?
# apt search linux-|grep -i installed

binutils-arm-linux-gnueabihf/oldstable,now 2.35.2-2 armhf [installed,automatic]
libpam0g/oldstable,now 1.4.0-9+deb11u1 armhf [installed]
libselinux1/oldstable,now 3.1-3 armhf [installed]
linux-base/oldstable,now 4.6 all [installed,automatic]
linux-libc-dev/now 6.1.24-1681410943 armhf [installed,local] <-------
parted/oldstable,now 3.4-1 armhf [installed,automatic]

# grep -r lru_gen /etc/
..empty result..

I do not know regarding initscript (not know right command), but tried "grep -Ria lru /etc/" and found nothing that would appear like config. option and such.

What do you suggest?

@mkreisl
Copy link
Contributor

mkreisl commented Jul 9, 2023

@slrslr
You are not on the latest kernel

@slrslr
Copy link
Author

slrslr commented Jul 9, 2023

After "apt update;apt upgrade", i may be on latest:

Linux xbian 6.1.28+
linux-image-bcm2836/stable,now 6.1.28+-1684360121 armhf [installed]

This question of mine is unanswered:

I have failed to find how to make "0x0000" persistent/survive reboot. Can you suggest command for this please?

Yet i see that after reboot under this newer kernel, it is disabled already:
cat /sys/kernel/mm/lru_gen/enabled
0x0000

And i will keep you updated on next crash if you have no more instruction on what to do now.

@mkreisl
Copy link
Contributor

mkreisl commented Jul 9, 2023

Yes of course, this was fixed promptly after your issue and no longer occurs with 6.1.28.
The kernel update is correct, I was wrong, a newer kernel is only available for Debian Bookworm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants