Running X/LXDE causes packet loss/dmesg errors on networking #29

marsman2020 · 2012-05-31T16:24:30Z

I am seeing extreme packet loss/networking issues when launching X/LXDE, but not at the console. In an attempt to rule out an issue with my power supply setup, I have gone through the troubleshooting steps below.

Hardware configuration:
-Pi Power Supply -> HP Touchpad 5.3V/2A supply, 24AWG USB A to Micro B cable
-Pi Storage -> SandDisk Extreme HD Video 4GB 20MB/s Class 6 SDHC Card
-Ethernet connected directly to router
-USB Hub -> Belkin F5U307 w/PS0538 5V 3.5A power supply
--USB Mouse (connected to hub) -> Logitech Wireless, 100mA receiver
--Keyboard Adapter (connected to hub) -> Belkin F5U119vE PS/2 to USB Adapter
--PS/2 Keyboard (connected to keyboard adapter) -> IBM KPD8923
-Pi Monitor - Lenovo L220x connected via HDMI->DVI Cable

Software Configurations:
I used the latest Debian Squeeze image, with the latest kernel (PREEMPT #89) installed via rpi-update and the Debian system fully updated using apt-get upgrade.

I did the following:

On another machine on the network - Used "python -m http.server" in Python 3.2.2 to create an http server. Picked a random ~90MB file on the machine as a test download over http
On the Pi - Installed the 'stress' utility using apt-get. Used "stress -c 15" to load the Pis CPU to 100%. Started htop at the console
On the other machine on the network - SSHed into the Pi and ran wget in the ssh terminal to download the 90MB test.zip onto the Pi using http over the local network. Download speeds ~2.9-3.0 MB/s. ssh window responsiveness is stable. Repeated over a period of 10 minutes to allow the Pis CPU to reach a steady state temperature/power draw at 100% CPU load. Started a final instance of the download
On the Pi (do this while the download is running in the ssh window!) - Exited htop. Ran "Startx"
On the other machine - Observed that the download running in the ssh window on the Pi slowed from ~3MB/s to ~44KB/s as soon as X/LXDE loaded . (The 'stress' command is still running in the background, loading the CPU to the same constant 100% as with the console download tests in Step 3)
On the Pi - Exited X/LXDE ("Logout")
On the other machine - Observed that the download returned to its original ~3MB/s speed as soon as the system returned to the console.
On the Pi - Observed that there are error messages in the dmesg log releated to the eth0 driver - see http://paste.debian.net/172123

I do not believe this is solely a power related issue, as others on this forum have repeatedly stated when users have asked about it. I can peg my Pis CPU at 100% for > 10 minutes at the console and Ethernet works just find until I start X/LXDE (with the same stress programs still running). Only then does the Ethernet drop out. No change in hardware configuration between the two cases (100% CPU at console, 100% CPU with X/LXDE running).

Other reports of similar issues on the Raspberry Pi forums:
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=6928
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=6042
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=66&t=6827#p87855
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=6677
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=7075
http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=7445
http://forum.stmlabs.com/showthread.php?tid=441&pid=3258#pid3258

XECDesign · 2012-06-01T04:22:19Z

Does this happen if you go to console and start gpm?

marsman2020 · 2012-06-01T16:06:53Z

XECDesign: Yes! I did some more testing last night. Same hardware configuration.

-100% CPU load with "stress" applied for all cases
-At console, mouse+keyboard->powered hub->pi w/ 5.3V 2A power supply => Network OK
-LXDE, mouse+keyboard->powered hub->pi w/ 5.3V 2A power supply => Network DROPS
-LXDE, powered hub->pi w/ 5.3V 2A power supply => Network OK (no mouse/keyboard attached)
-LXDE, keyboard only ->powered hub->pi w/ 5.3V 2A power supply => Network OK
-LXDE, mouse only->powered hub->pi w/ 5.3V 2A power supply => Network OK
-LXDE, keyboard only ->powered hub->pi w/ 5.3V 2A power supply; mouse->pi usb port => Network DROPS
-LXDE, mouse+keyboard->pi USB ports w/ 5.3V 2A power supply => Network OK (mouse eventually became unreliable, probably a genuine power issue in this single case)

-At console
-Without GPM, mouse+keyboard->powered hub->pi w/ 5.3V 2A power supply => Network OK
-GPM, mouse+keyboard->powered hub->pi w/ 5.3V 2A power supply => Network DROPS
-Without GPM, mouse+keyboard->pi w/ 5.3V 2A power supply => Network OK
-GPM, mouse+keyboard->pi w/ 5.3V 2A power supply => Network DROPS

In LXDE, it seems like having 1 low speed device on a hub and 1 low speed device elsewhere or 2 on the hub causes the problem.

With GPM, it seems like just reading the mouse period is enough to cause the problem.

(Note that my PS/2 -> USB adapter I am using for the keyboard is a dual adapter - there is a mouse port I'm not using - so it shows up as 2 USB devices)

I've got plain jane wired USB keyboard/mouse on the way from Monoprice so I can test with less power-hungry devices....

Edit - another user has done some testing here - http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=7075#p89311

XECDesign · 2012-06-01T17:36:50Z

I have the same issues, I haven't narrowed it down yet, but some things which you might want to check:

Interrupt collision between smsc95xx and USB storage drivers under heavy load firmware#9
Test the voltage between TP1 and TP2 before and after. If you have a scope , see if it oscillates or oscillated differently.
Try a different powered hub, or try plugging things in a different order, into different ports (sometimes seems to work)
Try a different power supply
Try a different (simpler) keyboard and mouse

marsman2020 · 2012-06-01T17:51:45Z

I do not believe this is a power supply issue.

I can load the Pi's CPU to 100% for 10+ minutes with no impact on networking util LXDE is started.

Conversely, starting GPM does not cause a major CPU load on the system, and yet it kills networking as well.

I think there is a software bug in the USB code that has to do with devices that require real-time updates like mice. I don't have the linux kernel skills to chase it down myself.

As I said, I have another mouse/keyboard set en route.

XECDesign · 2012-06-01T17:59:37Z

Power does not equate directly to CPU load. But if activating your mouse draws another 100mA on top of that, that can make all the difference.

I have found that I can two of either the keyboard, mouse or internet, but not all three, regardless of CPU load. And I have found that there IS a voltage drop when all three are connected.

Either way, it's worth checking if only to rule it out.

XECDesign · 2012-06-01T22:18:57Z

Also, from http://elinux.org/R-Pi_Troubleshooting#Ethernet_connection_is_lost_when_a_USB_device_is_plugged_in :
Ethernet connection is lost when a USB device is plugged in
This is caused by inadequate power. Use a good power supply and a good power cable. Some cheap cables that work with a cell phone, cannot fully power the R-Pi. Some USB devices require a lot of power (>100 mA), so they must be used with a powered USB hub. Some cheap USB hubs suck power from the Raspberry Pi even if a USB power supply is connected.

marsman2020 · 2012-06-03T19:28:23Z

So, I got a hold of an oscilloscope, since I don't own one, and I can verify by checking across TP1-TP2 that this is not a power issue.

I have also re-run some more tests. Hardware configuration as follows:
-Pi Power Supply -> HP Touchpad 5.3V/2A supply, 24AWG USB A to Micro B cable
-Pi Storage -> SandDisk Extreme HD Video 4GB 20MB/s Class 6 SDHC Card
-Ethernet connected directly to router
-Powered USB Hub -> Belkin F5U307 w/PS0538 5V 3.5A power supply
--USB Mouse (connected to hub) -> Logitech Wireless, 100mA receiver
--Keyboard Adapter (connected to hub) -> Belkin F5U119vE PS/2 to USB Adapter
--PS/2 Keyboard (connected to keyboard adapter) -> IBM KPD8923
-Pi Monitor - Lenovo L220x connected via HDMI->DVI Cable
-Techtronix TDS1012 measuring across TP1 TP2, with cursors at 4.75V and 5.25V and set to trigger on falling edges at 4.75V. Measuring average, min, and max voltage.

-At console, with no extra CPU load
-Without GPM, mouse+keyboard->powered hub->pi w/ 5.3V 2A power supply => 4.86V min 4.88V mean 4.93V max, Network OK
-GPM, mouse+keyboard->powered hub->pi w/ 5.3V 2A power supply => 4.93V min 4.96V mean 4.98B max, Network DROPS ("eth0: kevent 4 may have been dropped")
-Without GPM, mouse+keyboard->pi w/ 5.3V 2A power supply => 4.88V min 4.91V mean 4.94V max, Network OK
-GPM, mouse+keyboard->pi w/ 5.3V 2A power supply => 4.88V min 4.91V mean 4.94V max, Network OK

-LXDE started, without GPM, no extra CPU load
-mouse+keyboard->pi w/ 5.3V 2A power supply => 4.85V min 4.89V mean 4.95V max, Network OK
(I logged out of LXDE to change to the powered hub)
-mouse+keyboard->powered hub->pi w/ 5.3V 2A power supply => 4.91V min 4.96V mean 5.00V max, Network DROPS
-unplug hub - network returns

The result with the mouse+keyboard directly attached to the Pi is different with GPM from last time, but consistent with the LXDE result.

As a final test of the possibility of any power supply issues in the previous testing I have done:
-LXDE running, dual USB->PS2 adapter with keyboard attached (no mouse) & USB Wireless Mouse Receiver attached to Pi USB ports, 100% CPU load via "stress -c 13" => 4.86V min 4.90V mean 4.95V max, Network WORKS

Paste of todays session dmesg log - http://paste.ubuntu.com/1021879/

Conclusions:
-The HP Touchpad power supply (5.3V/2A) is an excellent choice for the Pi, when paired with a 6ft 24AWG power wire USB cable. It keeps the TP1-TP2 voltage within spec even with multiple USB devices attached and using power, X running, and 100% CPU load
-There is a software issue related to (some? all? unclear as I only have Belkin hubs to test) powered hubs, that interferes with operation of the USB-attached Ethernet power on the Pi. The same issue may possibly manifest with USB-attached wireless adapters (I do not have one to test). Based on GPM triggering the issue, it would appear to be related to the mouse.

So, maybe we can get on with solving this issue?

marsman2020 · 2012-06-04T21:05:15Z

I'm continuing to monitor the forums for apparent cases of this issue and am updating the first post of this issue with links to the threads as they appear.

NickBT · 2012-06-08T15:55:04Z

In support of this issue and to raise its profile, I would like to point out that I have similar errors which are related to a wireless dongle failing to connect after LXDE is started. I believe they may have a common cause, so I've cut and pasted some input from me in a couple of threads on the RPi forum.

I've managed to get my pi connected wirelessly with a Belkin FD7050 and the ralink drivers when at the command line. I can ping the BBC website and also ssh into the pi from my PC. I'm powering a Logik keyboard and a mouse from a Logik powered hub. Once I start LXDE with 'startx', I have no network connectivity from the pi and a ping from my PC shows it as unreachable (as is the router itself). When I quit LXDE, I still have no connectivity. LXDE works fine if I connect via the Ethernet cable.

The dmesg output after LXDE is started shows a lot of failed to read/write errors such as

smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118

I can fix it with a kludge of a workaround

A Google search for 'smsc95xx' brings up a few hits, many of them describing behaviour in the Rpi. It's a kernel module and it doesn't seem to like a mixture of high and low speed USB devices on the same hub.

The (fix):
I've got a Belkin wifi dongle, a keyboard, mouse and the recommended Logik powered (2.0A) hub. I've also got a crappy Asda unpowered 4 port hub which I wasn't using until now. Originally I had mouse, kbd, wifi all plugged into the powered hub and the wifi worked until I started LXDE. I then got identical errors to you. The wifi won't run off the Rpi port as there's not enough current capacity by the way.

So mindful of the hint not to mix devices on the same hub I got it going with the following sequence:

Wifi in powered hub in one Rpi USB, Asda hub (NO DEVICES CONECTED) in the other RPi USB, turn on power
Wait till login prompt, plug kbd into Asda hub, log in
Type 'startx', wait for GUI, plug mouse into Asda hub

Browser works and if you quit to the cli, wifi is still up. If you plug the kbd in when you power up, or the mouse before LXDE starts, then you get errors or the wifi connectivity doesn't work. I reckon that two powered hubs, or a low power wifi dongle connected to the Rpi would probably work OK from power up with all device connected

I can confirm that this error is present even after an rpi-update. It also happens with the other lightweight desktop
XFCE. The 5V rail holds up at over 4.85 volts under all conditions tested.

jstsch · 2012-06-10T11:56:03Z

Similar issues here. Also without running X. Setup:

Pi with the RS Electronics Micro USB Euro power supply
Belkin 7-Port USB hub
Logitech Pilot Mouse and Logitech K120 keyboard
TP-Link TL WN821N USB Wireless N Dongle

Using just the keyboard and mouse, with or without the hub gives no issues. When the wireless dongle is plugged in I get errors such as:

smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
Jun 9 15:05:23 raspberrypi kernel: DEBUG:handle_hc_chhltd_intr_dma:: XactErr without NYET/NAK/ACK
Jun 9 15:05:23 raspberrypi kernel: hub 1-1.2:1.0: hub_port_status failed (err = -71)
Jun 9 15:05:23 raspberrypi kernel: hub 1-1.2:1.0: Cannot enable port 4. Maybe the USB cable is bad?
Jan 1 01:07:55 raspberrypi kernel: INFO:: periodic_channel_available: Total channels: 8, Periodic: 4, Non-periodic: 4
Jan 1 01:07:55 raspberrypi kernel: INFO:: schedule_periodic: No host channel available for periodic transfer.
Jan 1 01:07:55 raspberrypi kernel: ERROR::dwc_otg_hcd_urb_enqueue:487: DWC OTG HCD URB Enqueue failed adding QTD. Error status -4008
Jan 1 01:07:55 raspberrypi kernel: usb 1-1.3.3: reset low speed USB device number 5 using dwc_otg

Last night everything was working. Had everything plugged into the hub. Browsing using Midori over the wireless connection. This morning the Pi had crashed (nothing on the screen). Booting with everything plugged in gave me a kernel panic just now after a stream of similar error messages.

Hopefully some useful information in here!

jstsch · 2012-06-10T11:58:53Z

Some more:

Jun 9 23:16:39 raspberrypi kernel: mmc0: missed completion of cmd 17 DMA (512/512 [1]/[1]) - ignoring it
Jun 9 23:16:39 raspberrypi kernel: mmc0: DMA IRQ 6 ignored - results were reset
Jun 9 23:16:39 raspberrypi kernel: mmc0: missed completion of cmd 17 DMA (512/512 [1]/[1]) - ignoring it
Jun 9 23:16:39 raspberrypi kernel: mmc0: DMA IRQ 6 ignored - results were reset

Upon pluggin in the wireless dongle:

Jun 10 00:43:00 raspberrypi kernel: usb 1-1.3.6: new high speed USB device number 9 using dwc_otg
Jun 10 00:43:01 raspberrypi kernel: usb 1-1.3.6: New USB device found, idVendor=0cf3, idProduct=7015
Jun 10 00:43:01 raspberrypi kernel: usb 1-1.3.6: New USB device strings: Mfr=16, Product=32, SerialNumber=48
Jun 10 00:43:01 raspberrypi kernel: usb 1-1.3.6: Product: USB WLAN
Jun 10 00:43:01 raspberrypi kernel: usb 1-1.3.6: Manufacturer: ATHEROS
Jun 10 00:43:01 raspberrypi kernel: usb 1-1.3.6: SerialNumber: 12345
Jun 10 00:43:01 raspberrypi kernel: usb 1-1.3.6: ath9k_htc: Transferred FW: htc_7010.fw, size: 72992
Jun 10 00:43:02 raspberrypi kernel: usb 1-1.3.6: Service connection timeout for: 256
Jun 10 00:43:02 raspberrypi kernel: ath9k_htc 1-1.3.6:1.0: ath9k_htc: Unable to initialize HTC services
Jun 10 00:43:02 raspberrypi kernel: Failed to initialize the device
Jun 10 00:43:02 raspberrypi kernel: ath9k_htc: probe of 1-1.3.6:1.0 failed with error -22

marsman2020 · 2012-06-11T16:18:41Z

I also was able to observe the issue in Arch Linux with their custom kernel.

However, changing mice seems to have semi-resolved my immediate issue. I went from a Logitech Wireless to a "Monoprice special" wired optical mouse.

That Logitech mouse has worked fine on multiple computers with several Linux distributions, Windows XP, and Windows 7 for the last ~4 years. It should work fine with a powered hub on the Pi. Based on the many, many other threads on issues with USB devices on this forum, I think it comes down to the USB driver provided by the company that made the USB IP core that Broadcom bought for the Pi's SoC just sucking, and until someone rewrites a complete new USB driver we will continue to have these types of issues.

It's unfortunate because with these kinds of issues, the Pi doesn't really meet the as-advertised "plug in whatever USB keyboard/mouse/hub you have in the house and start coding" that has been put forth by the Foundation for the last 6 months.

I will by trying to add an Arduino Uno and Open Bench Logic Sniffer to the same powered USB hub soon, to test out using the Pi as a low-cost host computer for open source hardware tools. I'll report back on what happens when those devices are added...

NickBT · 2012-06-11T16:31:52Z

I've done some investigation into what fails when I startx and one of the peripherals plugged into the powered hub fails.
I did a fairly rough and ready hack to find out where the first error that I can locate is. I altered dwc_otg_hcd_queue.c's function

static int periodic_channel_available(dwc_otg_hcd_t * hcd)
{
/*
* Currently assuming that there is a dedicated host channnel for each
* periodic transaction plus at least one host channel for
* non-periodic transactions.
*/
int status;
int num_channels;

num_channels = hcd->core_if->core_params->host_channels;
if ((hcd->periodic_channels + hcd->non_periodic_channels < num_channels) &&
    (hcd->periodic_channels < num_channels - 1)) {
    status = 0;
} else {
    //DWC_INFO("%s: Total channels: %d, Periodic: %d, Non-periodic: %d\n",
    //  __func__, num_channels, hcd->periodic_channels, hcd->non_periodic_channels);    //NOTICE
    DWC_ERROR("%s: NBT Total channels: %d, Periodic: %d, Non-periodic: %d\n",
        __func__, num_channels, hcd->periodic_channels, hcd->non_periodic_channels);    //NOTICE
    status = -DWC_E_NO_SPACE;
}

return status;

}

(The 'NBT' is my marker for grepping)

When it fails it gives error in dmesg:

smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
ERROR::periodic_channel_available:342: periodic_channel_available: NBT Total channels: 8, Periodic: 5, Non-periodic: 3

Leaving aside that I may have given func as an incorrect argument to DWC_ERROR, we can see that the first condition will fail,as 5 + 3 is not less than 8.

I hope this sheds some light on the matter.

P.S. the dmesg output is crammed full of such messages after the failure
P.P.S - sorry if this doesn't format right.

marsman2020 · 2012-06-14T23:54:53Z

Another potential case of this issue - http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=8391
And another - http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=8352

marsman2020 · 2012-06-16T20:51:40Z

Another potential case - http://www.raspberrypi.org/phpBB3/viewtopic.php?f=50&t=8453

guisacouto · 2012-06-21T13:13:58Z

I wouldn't be surprise if this had something to do with issue 9. My guess is that when you startx you start opening and reading a lot of files into memory, and there is where the problem is. Very I/O causes issues, and not the 100% CPU for itself.

If you combine high CPU usage with some sort of high usb usage (rather it is the ethernet using the usb bus, wifi, or usb storage) that generates a lot of I/0 opening, writing, and reading files that is what causes all this issues I think

XECDesign · 2012-06-21T13:26:49Z

@guisacouto

starting x is not a requirement to replicate this. starting gpm (which wouldn't affect IO much) also.

usb interrupts is something that might be worth looking into.

guisacouto · 2012-06-21T13:33:43Z

That skipped me. Then yes.. I guess it's all about usb interrupts. What bothers me is that there is a lot of people reporting this kind of problems, but I don't see updates on the repos to try solve this.. it always ends with a 'it must be a power supply issue' or something. I hope this gets more attention soon

jstsch · 2012-06-21T13:47:02Z

Dito. Right now I am recommending people to wait getting a R-Pi until these issues are resolved. I wrote a short review the other day: http://jstsch.com/post/24_hours_with_the_raspberry_pi

marsman2020 · 2012-06-22T14:03:10Z

Another case - http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=9077&p=106318#p106318

User bought a hub listed in the "verified peripherals" section of the Wiki...

fbutler · 2012-06-22T15:58:15Z

I'm seeing the same type of issue with a Logitech Di Novo Edge keyboard without running startx. I get variations of the same type of errors whether the device is plugged in at boot time or plugged in after booting has completed.

My Setup is:

Wheezy beta distro fully updated
A logilink 10 port powered hub
A USB Y cable powering the Pi from the Logilink hub
Voltage across TP1 and TP2 is 4.91V
Additional USB cable between the hub and the Pi with the red wire snipped to prevent back powering to the Pi
Logitech wireless adapter (Part Number 832243-0000) plugged into hub
Keyboard Part Number YRAY-81
No other USB devices plugged in

Here is a portion of the dmesg output from the point when the device is detected:

[ 940.096845] usb 1-1.2: new full speed USB device number 12 using dwc_otg
[ 940.200190] usb 1-1.2: New USB device found, idVendor=046d, idProduct=0b04
[ 940.200237] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 940.200259] usb 1-1.2: Product: Logitech BT Mini-Receiver
[ 940.200275] usb 1-1.2: Manufacturer: Logitech
[ 940.207521] hub 1-1.2:1.0: USB hub found
[ 940.207888] hub 1-1.2:1.0: 3 ports detected
[ 940.486952] usb 1-1.2.2: new full speed USB device number 13 using dwc_otg
[ 940.595807] usb 1-1.2.2: New USB device found, idVendor=046d, idProduct=c713
[ 940.595864] usb 1-1.2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 940.595889] usb 1-1.2.2: Product: Logitech BT Mini-Receiver
[ 940.595907] usb 1-1.2.2: Manufacturer: Logitech
[ 940.595924] usb 1-1.2.2: SerialNumber: 001F2039B887
[ 940.615500] input: Logitech Logitech BT Mini-Receiver as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2.2/1-1.2.2:1.0/input/input4
[ 940.618982] generic-usb 0003:046D:C713.0005: input: USB HID v1.11 Keyboard [Logitech Logitech BT Mini-Receiver] on usb-bcm2708_usb-1.2.2/input0
[ 940.707062] usb 1-1.2.3: new full speed USB device number 14 using dwc_otg
[ 940.815761] usb 1-1.2.3: New USB device found, idVendor=046d, idProduct=c714
[ 940.815799] usb 1-1.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 940.815821] usb 1-1.2.3: Product: Logitech BT Mini-Receiver
[ 940.815852] usb 1-1.2.3: Manufacturer: Logitech
[ 940.815870] usb 1-1.2.3: SerialNumber: 001F2039B887
[ 940.849750] input: Logitech Logitech BT Mini-Receiver as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2.3/1-1.2.3:1.0/input/input5
[ 940.854898] logitech 0003:046D:C714.0006: input,hiddev0: USB HID v1.11 Mouse [Logitech Logitech BT Mini-Receiver] on usb-bcm2708_usb-1.2.3/input0
[ 945.923909] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 946.256616] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 946.256665] smsc95xx 1-1.1:1.0: eth0: MII is busy in smsc95xx_mdio_read
[ 951.248243] usb 1-1.2.2: USB disconnect, device number 13
[ 951.255861] usb 1-1.2.3: USB disconnect, device number 14
[ 951.346828] usb 1-1.2: reset full speed USB device number 12 using dwc_otg
[ 951.746821] usb 1-1.2.2: new full speed USB device number 15 using dwc_otg
[ 951.855081] usb 1-1.2.2: New USB device found, idVendor=046d, idProduct=c713
[ 951.855133] usb 1-1.2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 951.855155] usb 1-1.2.2: Product: Logitech BT Mini-Receiver
[ 951.855174] usb 1-1.2.2: Manufacturer: Logitech
[ 951.855190] usb 1-1.2.2: SerialNumber: 001F2039B887
[ 951.875358] input: Logitech Logitech BT Mini-Receiver as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2.2/1-1.2.2:1.0/input/input6
[ 951.879026] generic-usb 0003:046D:C713.0007: input: USB HID v1.11 Keyboard [Logitech Logitech BT Mini-Receiver] on usb-bcm2708_usb-1.2.2/input0
[ 951.966973] usb 1-1.2.3: new full speed USB device number 16 using dwc_otg
[ 952.075734] usb 1-1.2.3: New USB device found, idVendor=046d, idProduct=c714
[ 952.075783] usb 1-1.2.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 952.075806] usb 1-1.2.3: Product: Logitech BT Mini-Receiver
[ 952.075823] usb 1-1.2.3: Manufacturer: Logitech
[ 952.075840] usb 1-1.2.3: SerialNumber: 001F2039B887
[ 952.103900] input: Logitech Logitech BT Mini-Receiver as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2.3/1-1.2.3:1.0/input/input7
[ 952.111451] logitech 0003:046D:C714.0008: input,hiddev0: USB HID v1.11 Mouse [Logitech Logitech BT Mini-Receiver] on usb-bcm2708_usb-1.2.3/input0
[ 957.106472] hub 1-1:1.0: hub_port_status failed (err = -110)
[ 957.246478] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 962.676429] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 966.206348] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 967.676337] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 971.206295] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
[ 972.676267] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 977.306200] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 977.676223] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 982.306165] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
[ 982.676153] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 991.286029] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 991.286066] hub 1-1.2:1.0: connect-debounce failed, port 1 disabled
[ 996.245957] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
[ 996.285980] hub 1-1.2:1.0: hub_port_status failed (err = -110)
[ 1001.245917] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 1006.245838] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
[ 1011.245784] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 1020.475652] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 1020.475703] smsc95xx 1-1.1:1.0: eth0: MII is busy in smsc95xx_mdio_read
[ 1025.475568] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
[ 1025.475618] smsc95xx 1-1.1:1.0: eth0: MII is busy in smsc95xx_mdio_read
[ 1029.401439] usb 1-1.2.2: USB disconnect, device number 15
[ 1029.414868] usb 1-1.2.3: USB disconnect, device number 16
[ 1029.515768] usb 1-1.2: reset full speed USB device number 12 using dwc_otg
[ 1029.915780] usb 1-1.2.1: new full speed USB device number 17 using dwc_otg
[ 1030.021981] usb 1-1.2.1: New USB device found, idVendor=046d, idProduct=c709
[ 1030.022018] usb 1-1.2.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1030.022040] usb 1-1.2.1: Product: Logitech BT Mini-Receiver
[ 1030.022057] usb 1-1.2.1: Manufacturer: Logitech
[ 1030.022085] usb 1-1.2.1: SerialNumber: 001F2039B887
[ 1030.136063] usb 1-1.2.2: new full speed USB device number 18 using dwc_otg
[ 1030.247815] usb 1-1.2.2: New USB device found, idVendor=046d, idProduct=c713
[ 1030.247878] usb 1-1.2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1030.247923] usb 1-1.2.2: Product: Logitech BT Mini-Receiver
[ 1030.247968] usb 1-1.2.2: Manufacturer: Logitech
[ 1030.248008] usb 1-1.2.2: SerialNumber: 001F2039B887
[ 1030.286157] input: Logitech Logitech BT Mini-Receiver as /devices/platform/bcm2708_usb/usb1/1-1/1-1.2/1-1.2.2/1-1.2.2:1.0/input/input8
[ 1030.286316] INFO:: periodic_channel_available: Total channels: 8, Periodic: 6, Non-periodic: 2
[ 1030.286330]
[ 1030.286346] INFO:: schedule_periodic: No host channel available for periodic transfer.
[ 1030.286358]
[ 1030.286378] ERROR::dwc_otg_hcd_urb_enqueue:487: DWC OTG HCD URB Enqueue failed adding QTD. Error status -4008
[ 1030.286391]
[ 1030.287573] generic-usb 0003:046D:C713.0009: input: USB HID v1.11 Keyboard [Logitech Logitech BT Mini-Receiver] on usb-bcm2708_usb-1.2.2/input0
[ 1030.305539] INFO:: periodic_channel_available: Total channels: 8, Periodic: 6, Non-periodic: 2
[ 1030.305557]
[ 1030.305581] INFO:: schedule_periodic: No host channel available for periodic transfer.
[ 1030.305594]
[ 1030.305616] ERROR::dwc_otg_hcd_urb_enqueue:487: DWC OTG HCD URB Enqueue failed adding QTD. Error status -4008
[ 1030.305629]
[ 1030.335596] INFO:: periodic_channel_available: Total channels: 8, Periodic: 6, Non-periodic: 2
[ 1030.335614]
[ 1030.335638] INFO:: schedule_periodic: No host channel available for periodic transfer.
[ 1030.335650]
[ 1030.335671] ERROR::dwc_otg_hcd_urb_enqueue:487: DWC OTG HCD URB Enqueue failed adding QTD. Error status -4008
[ 1030.335684]
[ 1030.366111] usb 1-1.2.3: new full speed USB device number 19 using dwc_otg
[ 1030.395543] INFO:: periodic_channel_available: Total channels: 8, Periodic: 6, Non-periodic: 2
[ 1030.395561]
[ 1030.395584] INFO:: schedule_periodic: No host channel available for periodic transfer.

g4eml · 2012-06-22T17:24:25Z

I too can confirm that I am seeing similar issues with low speed devices. I initially suspected power problems but have monitored the power with a scope and I am now totally confident that power is not the cause.

If I have two low speed devices plugged into an external hub I almost immediately start getting USB hub and Ethernet failure messages reported with dmesg. The hub cycles on and off repeatedly. Anything attached to the hub works for a few seconds then stops for a few seconds.

I initially saw this with just my keyboard and mouse plugged into the hub. Connecting either the keyboard or the mouse directly to the second port on the Pi and leaving the other device connected to the hub immediately stops the error messages .

Connecting two mice to the hub produces the same errors.

Connecting high speed devices doesn't seem to cause the same problems.

In conclusion it appears that having two low speed devices on an external hub is causing some sort of conflict in the software.

I am pretty sure this has been seen by many people but incorrectly attributed to power problems.

Colin...

Pitel · 2012-06-26T06:58:08Z

I can somehow confirm the previous findings.

I bought Pyramid 7 Port USB 2.0 Hub (powered). I have my wifi dongle connected to it, which is low-speed device. If I also try connecting my mouse, whch is also low-speed, bad things starts to happen.

marsman2020 · 2012-06-28T19:36:45Z

Another case here - http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=9470&p=111192#p111192
and here - http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=8473

mcphail · 2012-06-29T23:16:19Z

I'm having almost identical problems but don't need to startx to drop eth0. Simply plugging a Huawei e173 modem causes a flood of similar error messages in the logs with frequent disconnection and reconnection of the USB modem. I have a Logitech K260 wireless keyboard and mouse combo. I've tried every combination of power supply and powered or passive USB hubs I can get my hands on. I'm running debian 6 and the latest firmware.

popcornmix · 2012-07-10T17:24:33Z

Just an update. We have been looking into various USB issues, and some of the simpler ones have been fixed.
I think the comments in this thread, as usual has a mixture of issues, but one significant one is running out of periodic channels. E.g.
INFO:: schedule_periodic: No host channel available for periodic transfer.

Now, the hardware we have is limited to 8 host channels. Now that sounds like lots, but it turns out the ethernet takes one. The ethernet hub takes one. An external hub takes one. One is reserved for non-periodic transfers.
Some devices have multiple endpoints. E.g. we've seen an Air-Mouse with 4 endpoints.

Currently the driver just has a fixed allocation of host channels, so a mostly idle keyboard permanently consumes a channel.

Take a look at drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c#periodic_channel_available

https://github.com/raspberrypi/linux/blob/rpi-patches/drivers/usb/host/dwc_otg/dwc_otg_hcd_queue.c#L323

it requires there is one dedicated host channel for each 'periodic' (interrupt or isochronous I think) transaction. This
was actually fixed in the denx tree some time ago:

http://git.denx.de/?p=linux-denx.git;a=commit;h=9796e39e7a513d8a4acde759ec5d0023645143d8

This patch has been incorporated in to the dwc_otg patchset that APM have been trying to get upstream:

http://lkml.indiana.edu/hypermail/linux/kernel/1205.0/01170.html

So, we can see a flaw in the USB driver. And there is a potential solution.
Naren has ported it and we have just started testing. Fingers crossed it helps.

marsman2020 · 2012-07-10T17:47:52Z

I'm willing to help, is there a place where I can get Naren's version of the patch and apply it to a custom compiled kernel, or a testing kernel image that includes the patch already?

Unfortunately my old patebins have expired, but it's likely I have been seeing the out of channels error just before all of the many repeated kevent 4 errors and failed to read register errors.

popcornmix · 2012-07-10T18:22:22Z

Well, I wouldn't recommend this to people who want a more stable system, as it has had very limited testing.
But if you want to help with testing, then try this:
https://dl.dropbox.com/u/3669512/temp/0001-added-microframe-schedule-from-the-linux-denx-tree.dc4.patch

It will probably only help if you were getting this error:
"schedule_periodic: No host channel available for periodic transfer"

But, try the patch. Report anything that used to work and now doesn't, or anything that didn't work and now does.

NickBT · 2012-07-10T21:53:30Z

I've just applied this patch to a newly cloned kernel. For the first time ever I can get wifi dongle, keyboard, mouse all working in LXDE. It has wifi network connectivity through the browser. IT WORKS - no fail r/w on index 114 or 118 or failure to enque error messages. That's the good news! The bad news is that pinging bbc.co.uk from a terminal in the GUI gives 35% packet loss. It's a definite improvement from my point of view, well done.
Update : bit more bad news, I quit the GUI back to the command line and I can't even ping the router at 192.168.1.1. - host unreachable

commit d3f07c0 upstream. syzbot found the following crash on: HEAD commit: d9bd94c0bcaa Add linux-next specific files for 20180801 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1001189c400000 kernel config: https://syzkaller.appspot.com/x/.config?x=cc8964ea4d04518c dashboard link: https://syzkaller.appspot.com/bug?extid=c966a82db0b14aa37e81 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c966a82db0b14aa37e81@syzkaller.appspotmail.com loop7: rw=12288, want=8200, limit=20 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. openvswitch: netlink: Message has 8 unknown bytes. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 1 PID: 7615 Comm: syz-executor7 Not tainted 4.18.0-rc7-next-20180801+ #29 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_get_valid_checkpoint+0x436/0x1ec0 fs/f2fs/checkpoint.c:860 f2fs_fill_super+0x2d42/0x8110 fs/f2fs/super.c:2883 mount_bdev+0x314/0x3e0 fs/super.c:1344 f2fs_mount+0x3c/0x50 fs/f2fs/super.c:3133 legacy_get_tree+0x131/0x460 fs/fs_context.c:729 vfs_get_tree+0x1cb/0x5c0 fs/super.c:1743 do_new_mount fs/namespace.c:2603 [inline] do_mount+0x6f2/0x1e20 fs/namespace.c:2927 ksys_mount+0x12d/0x140 fs/namespace.c:3143 __do_sys_mount fs/namespace.c:3157 [inline] __se_sys_mount fs/namespace.c:3154 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3154 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x45943a Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 bd 8a fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 9a 8a fb ff c3 66 0f 1f 84 00 00 00 00 00 RSP: 002b:00007f36a61d4a88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f36a61d4b30 RCX: 000000000045943a RDX: 00007f36a61d4ad0 RSI: 0000000020000100 RDI: 00007f36a61d4af0 RBP: 0000000020000100 R08: 00007f36a61d4b30 R09: 00007f36a61d4ad0 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000013 R13: 0000000000000000 R14: 00000000004c8ea0 R15: 0000000000000000 Modules linked in: Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace bd8550c129352286 ]--- RIP: 0010:__read_once_size include/linux/compiler.h:188 [inline] RIP: 0010:compound_head include/linux/page-flags.h:142 [inline] RIP: 0010:PageLocked include/linux/page-flags.h:272 [inline] RIP: 0010:f2fs_put_page fs/f2fs/f2fs.h:2011 [inline] RIP: 0010:validate_checkpoint+0x66d/0xec0 fs/f2fs/checkpoint.c:835 Code: e8 58 05 7f fe 4c 8d 6b 80 4d 8d 74 24 08 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 c6 04 02 00 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 f4 06 00 00 4c 89 ea 4d 8b 7c 24 08 48 b8 00 00 RSP: 0018:ffff8801937cebe8 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff8801937cef30 RCX: ffffc90006035000 RDX: 0000000000000000 RSI: ffffffff82fd9658 RDI: 0000000000000005 netlink: 65342 bytes leftover after parsing attributes in process `syz-executor4'. RBP: ffff8801937cef58 R08: ffff8801ab254700 R09: fffff94000d9e026 openvswitch: netlink: Message has 8 unknown bytes. R10: fffff94000d9e026 R11: ffffea0006cf0137 R12: fffffffffffffffb R13: ffff8801937ceeb0 R14: 0000000000000003 R15: ffff880193419b40 FS: 00007f36a61d5700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc04ff93000 CR3: 00000001d0562000 CR4: 00000000001426e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 In validate_checkpoint(), if we failed to call get_checkpoint_version(), we will pass returned invalid page pointer into f2fs_put_page, cause accessing invalid memory, this patch tries to handle error path correctly to fix this issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Increase kasan instrumented kernel stack size from 32k to 64k. Other architectures seems to get away with just doubling kernel stack size under kasan, but on s390 this appears to be not enough due to bigger frame size. The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting stack overflow is fs sync on xfs filesystem: #0 [9a0681e8] 704 bytes check_usage at 34b1fc #1 [9a0684a8] 432 bytes check_usage at 34c710 #2 [9a068658] 1048 bytes validate_chain at 35044a #3 [9a068a70] 312 bytes __lock_acquire at 3559fe #4 [9a068ba8] 440 bytes lock_acquire at 3576ee #5 [9a068d60] 104 bytes _raw_spin_lock at 21b44e0 #6 [9a068dc8] 1992 bytes enqueue_entity at 2dbf72 #7 [9a069590] 1496 bytes enqueue_task_fair at 2df5f0 #8 [9a069b68] 64 bytes ttwu_do_activate at 28f438 #9 [9a069ba8] 552 bytes try_to_wake_up at 298c4c #10 [9a069dd0] 168 bytes wake_up_worker at 23f97c #11 [9a069e78] 200 bytes insert_work at 23fc2e #12 [9a069f40] 648 bytes __queue_work at 2487c0 #13 [9a06a1c8] 200 bytes __queue_delayed_work at 24db28 #14 [9a06a290] 248 bytes mod_delayed_work_on at 24de84 #15 [9a06a388] 24 bytes kblockd_mod_delayed_work_on at 153e2a0 #16 [9a06a3a0] 288 bytes __blk_mq_delay_run_hw_queue at 158168c #17 [9a06a4c0] 192 bytes blk_mq_run_hw_queue at 1581a3c #18 [9a06a580] 184 bytes blk_mq_sched_insert_requests at 15a2192 #19 [9a06a638] 1024 bytes blk_mq_flush_plug_list at 1590f3a #20 [9a06aa38] 704 bytes blk_flush_plug_list at 1555028 #21 [9a06acf8] 320 bytes schedule at 219e476 #22 [9a06ae38] 760 bytes schedule_timeout at 21b0aac #23 [9a06b130] 408 bytes wait_for_common at 21a1706 #24 [9a06b2c8] 360 bytes xfs_buf_iowait at fa1540 #25 [9a06b430] 256 bytes __xfs_buf_submit at fadae6 #26 [9a06b530] 264 bytes xfs_buf_read_map at fae3f6 #27 [9a06b638] 656 bytes xfs_trans_read_buf_map at 10ac9a8 #28 [9a06b8c8] 304 bytes xfs_btree_kill_root at e72426 #29 [9a06b9f8] 288 bytes xfs_btree_lookup_get_block at e7bc5e #30 [9a06bb18] 624 bytes xfs_btree_lookup at e7e1a6 #31 [9a06bd88] 2664 bytes xfs_alloc_ag_vextent_near at dfa070 #32 [9a06c7f0] 144 bytes xfs_alloc_ag_vextent at dff3ca #33 [9a06c880] 1128 bytes xfs_alloc_vextent at e05fce #34 [9a06cce8] 584 bytes xfs_bmap_btalloc at e58342 #35 [9a06cf30] 1336 bytes xfs_bmapi_write at e618de #36 [9a06d468] 776 bytes xfs_iomap_write_allocate at ff678e #37 [9a06d770] 720 bytes xfs_map_blocks at f82af8 #38 [9a06da40] 928 bytes xfs_writepage_map at f83cd6 #39 [9a06dde0] 320 bytes xfs_do_writepage at f85872 #40 [9a06df20] 1320 bytes write_cache_pages at 73dfe8 #41 [9a06e448] 208 bytes xfs_vm_writepages at f7f892 #42 [9a06e518] 88 bytes do_writepages at 73fe6a #43 [9a06e570] 872 bytes __writeback_single_inode at a20cb6 #44 [9a06e8d8] 664 bytes writeback_sb_inodes at a23be2 #45 [9a06eb70] 296 bytes __writeback_inodes_wb at a242e0 #46 [9a06ec98] 928 bytes wb_writeback at a2500e #47 [9a06f038] 848 bytes wb_do_writeback at a260ae #48 [9a06f388] 536 bytes wb_workfn at a28228 #49 [9a06f5a0] 1088 bytes process_one_work at 24a234 #50 [9a06f9e0] 1120 bytes worker_thread at 24ba26 #51 [9a06fe40] 104 bytes kthread at 26545a #52 [9a06fea8] kernel_thread_starter at 21b6b62 To be able to increase the stack size to 64k reuse LLILL instruction in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE (65192) value as unsigned. Reported-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>

[ Upstream commit 0d9c9a2 ] These functions are called from atomic context: [ 9.150239] BUG: sleeping function called from invalid context at /home/scott/git/linux/mm/slab.h:421 [ 9.158159] in_atomic(): 1, irqs_disabled(): 0, pid: 4432, name: ip [ 9.163128] CPU: 8 PID: 4432 Comm: ip Not tainted 4.20.0-rc2-00169-g63d86876f324 #29 [ 9.163130] Call Trace: [ 9.170701] [c0000002e899a980] [c0000000009c1068] .dump_stack+0xa8/0xec (unreliable) [ 9.177140] [c0000002e899aa10] [c00000000007a7b4] .___might_sleep+0x138/0x164 [ 9.184440] [c0000002e899aa80] [c0000000001d5bac] .kmem_cache_alloc_trace+0x238/0x30c [ 9.191216] [c0000002e899ab40] [c00000000065ea1c] .memac_add_hash_mac_address+0x104/0x198 [ 9.199464] [c0000002e899abd0] [c00000000065a788] .set_multi+0x1c8/0x218 [ 9.206242] [c0000002e899ac80] [c0000000006615ec] .dpaa_set_rx_mode+0xdc/0x17c [ 9.213544] [c0000002e899ad00] [c00000000083d2b0] .__dev_set_rx_mode+0x80/0xd4 [ 9.219535] [c0000002e899ad90] [c00000000083d334] .dev_set_rx_mode+0x30/0x54 [ 9.225271] [c0000002e899ae10] [c00000000083d4a0] .__dev_open+0x148/0x1c8 [ 9.230751] [c0000002e899aeb0] [c00000000083d934] .__dev_change_flags+0x19c/0x1e0 [ 9.230755] [c0000002e899af60] [c00000000083d9a4] .dev_change_flags+0x2c/0x80 [ 9.242752] [c0000002e899aff0] [c0000000008554ec] .do_setlink+0x350/0xf08 [ 9.248228] [c0000002e899b170] [c000000000857ad0] .rtnl_newlink+0x588/0x7e0 [ 9.253965] [c0000002e899b740] [c000000000852424] .rtnetlink_rcv_msg+0x3e0/0x498 [ 9.261440] [c0000002e899b820] [c000000000884790] .netlink_rcv_skb+0x134/0x14c [ 9.267607] [c0000002e899b8e0] [c000000000851840] .rtnetlink_rcv+0x18/0x2c [ 9.274558] [c0000002e899b950] [c000000000883c8c] .netlink_unicast+0x214/0x318 [ 9.281163] [c0000002e899ba00] [c000000000884220] .netlink_sendmsg+0x348/0x444 [ 9.287076] [c0000002e899bae0] [c00000000080d13c] .sock_sendmsg+0x2c/0x54 [ 9.287080] [c0000002e899bb50] [c0000000008106c0] .___sys_sendmsg+0x2d0/0x2d8 [ 9.298375] [c0000002e899bd30] [c000000000811a80] .__sys_sendmsg+0x5c/0xb0 [ 9.303939] [c0000002e899be20] [c0000000000006b0] system_call+0x60/0x6c Signed-off-by: Scott Wood <oss@buserror.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 0d9c9a2 ] These functions are called from atomic context: [ 9.150239] BUG: sleeping function called from invalid context at /home/scott/git/linux/mm/slab.h:421 [ 9.158159] in_atomic(): 1, irqs_disabled(): 0, pid: 4432, name: ip [ 9.163128] CPU: 8 PID: 4432 Comm: ip Not tainted 4.20.0-rc2-00169-g63d86876f324 raspberrypi#29 [ 9.163130] Call Trace: [ 9.170701] [c0000002e899a980] [c0000000009c1068] .dump_stack+0xa8/0xec (unreliable) [ 9.177140] [c0000002e899aa10] [c00000000007a7b4] .___might_sleep+0x138/0x164 [ 9.184440] [c0000002e899aa80] [c0000000001d5bac] .kmem_cache_alloc_trace+0x238/0x30c [ 9.191216] [c0000002e899ab40] [c00000000065ea1c] .memac_add_hash_mac_address+0x104/0x198 [ 9.199464] [c0000002e899abd0] [c00000000065a788] .set_multi+0x1c8/0x218 [ 9.206242] [c0000002e899ac80] [c0000000006615ec] .dpaa_set_rx_mode+0xdc/0x17c [ 9.213544] [c0000002e899ad00] [c00000000083d2b0] .__dev_set_rx_mode+0x80/0xd4 [ 9.219535] [c0000002e899ad90] [c00000000083d334] .dev_set_rx_mode+0x30/0x54 [ 9.225271] [c0000002e899ae10] [c00000000083d4a0] .__dev_open+0x148/0x1c8 [ 9.230751] [c0000002e899aeb0] [c00000000083d934] .__dev_change_flags+0x19c/0x1e0 [ 9.230755] [c0000002e899af60] [c00000000083d9a4] .dev_change_flags+0x2c/0x80 [ 9.242752] [c0000002e899aff0] [c0000000008554ec] .do_setlink+0x350/0xf08 [ 9.248228] [c0000002e899b170] [c000000000857ad0] .rtnl_newlink+0x588/0x7e0 [ 9.253965] [c0000002e899b740] [c000000000852424] .rtnetlink_rcv_msg+0x3e0/0x498 [ 9.261440] [c0000002e899b820] [c000000000884790] .netlink_rcv_skb+0x134/0x14c [ 9.267607] [c0000002e899b8e0] [c000000000851840] .rtnetlink_rcv+0x18/0x2c [ 9.274558] [c0000002e899b950] [c000000000883c8c] .netlink_unicast+0x214/0x318 [ 9.281163] [c0000002e899ba00] [c000000000884220] .netlink_sendmsg+0x348/0x444 [ 9.287076] [c0000002e899bae0] [c00000000080d13c] .sock_sendmsg+0x2c/0x54 [ 9.287080] [c0000002e899bb50] [c0000000008106c0] .___sys_sendmsg+0x2d0/0x2d8 [ 9.298375] [c0000002e899bd30] [c000000000811a80] .__sys_sendmsg+0x5c/0xb0 [ 9.303939] [c0000002e899be20] [c0000000000006b0] system_call+0x60/0x6c Signed-off-by: Scott Wood <oss@buserror.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

Loading the driver on a system with W83627DHG-P crashes as follows. w83627ehf: Found W83627DHG-P chip at 0x290 BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 PID: 604 Comm: sensors Not tainted 5.6.0-rc2-00055-gca7e1fd1026c #29 Hardware name: /D425KT, BIOS MWPNT10N.86A.0132.2013.0726.1534 07/26/2013 RIP: 0010:w83627ehf_read_string+0x27/0x70 [w83627ehf] Code: [... ] RSP: 0018:ffffb95980657df8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff96caaa7f5218 RCX: 0000000000000000 RDX: 0000000000000015 RSI: 0000000000000001 RDI: ffff96caa736ec08 RBP: 0000000000000000 R08: ffffb95980657e20 R09: 0000000000000001 R10: ffff96caaa635cc0 R11: 0000000000000000 R12: ffff96caa9f7cf00 R13: ffff96caa9ec3d00 R14: ffff96caa9ec3d28 R15: ffff96caa9ec3d40 FS: 00007fbc7c4e2740(0000) GS:ffff96caabc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000129d58000 CR4: 00000000000006f0 Call Trace: ? cp_new_stat+0x12d/0x160 hwmon_attr_show_string+0x37/0x70 [hwmon] dev_attr_show+0x14/0x50 sysfs_kf_seq_show+0xb5/0x1b0 seq_read+0xcf/0x460 vfs_read+0x9b/0x150 ksys_read+0x5f/0xe0 do_syscall_64+0x48/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ... Temperature labels are not always present. Adjust sysfs attribute visibility accordingly. Reported-by: Meelis Roos <mroos@linux.ee> Suggested-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Dr. David Alan Gilbert <linux@treblig.org> Cc: Meelis Roos <mroos@linux.ee> Cc: Dr. David Alan Gilbert <linux@treblig.org> Fixes: 266cd58 ("hwmon: (w83627ehf) convert to with_info interface") Signed-off-by: Guenter Roeck <linux@roeck-us.net>

[ Upstream commit c4317b1 ] In case devlink reload failed, it is possible to trigger a use-after-free when querying the kernel for device info via 'devlink dev info' [1]. This happens because as part of the reload error path the PCI command interface is de-initialized and its mailboxes are freed. When the devlink '->info_get()' callback is invoked the device is queried via the command interface and the freed mailboxes are accessed. Fix this by initializing the command interface once during probe and not during every reload. This is consistent with the other bus used by mlxsw (i.e., 'mlxsw_i2c') and also allows user space to query the running firmware version (for example) from the device after a failed reload. [1] BUG: KASAN: use-after-free in memcpy include/linux/string.h:406 [inline] BUG: KASAN: use-after-free in mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675 Write of size 4096 at addr ffff88810ae32000 by task syz-executor.1/2355 CPU: 1 PID: 2355 Comm: syz-executor.1 Not tainted 5.8.0-rc2+ #29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192 memcpy+0x39/0x60 mm/kasan/common.c:106 memcpy include/linux/string.h:406 [inline] mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675 mlxsw_cmd_exec+0x249/0x550 drivers/net/ethernet/mellanox/mlxsw/core.c:2335 mlxsw_cmd_access_reg drivers/net/ethernet/mellanox/mlxsw/cmd.h:859 [inline] mlxsw_core_reg_access_cmd drivers/net/ethernet/mellanox/mlxsw/core.c:1938 [inline] mlxsw_core_reg_access+0x2f6/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1985 mlxsw_reg_query drivers/net/ethernet/mellanox/mlxsw/core.c:2000 [inline] mlxsw_devlink_info_get+0x17f/0x6e0 drivers/net/ethernet/mellanox/mlxsw/core.c:1090 devlink_nl_info_fill.constprop.0+0x13c/0x2d0 net/core/devlink.c:4588 devlink_nl_cmd_info_get_dumpit+0x246/0x460 net/core/devlink.c:4648 genl_lock_dumpit+0x85/0xc0 net/netlink/genetlink.c:575 netlink_dump+0x515/0xe50 net/netlink/af_netlink.c:2245 __netlink_dump_start+0x53d/0x830 net/netlink/af_netlink.c:2353 genl_family_rcv_msg_dumpit.isra.0+0x296/0x300 net/netlink/genetlink.c:638 genl_family_rcv_msg net/netlink/genetlink.c:733 [inline] genl_rcv_msg+0x78d/0x9d0 net/netlink/genetlink.c:753 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:764 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0x150/0x190 net/socket.c:672 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2363 ___sys_sendmsg+0xff/0x170 net/socket.c:2417 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a9c8336 ("mlxsw: core: Add support for devlink info command") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit c4317b1 ] In case devlink reload failed, it is possible to trigger a use-after-free when querying the kernel for device info via 'devlink dev info' [1]. This happens because as part of the reload error path the PCI command interface is de-initialized and its mailboxes are freed. When the devlink '->info_get()' callback is invoked the device is queried via the command interface and the freed mailboxes are accessed. Fix this by initializing the command interface once during probe and not during every reload. This is consistent with the other bus used by mlxsw (i.e., 'mlxsw_i2c') and also allows user space to query the running firmware version (for example) from the device after a failed reload. [1] BUG: KASAN: use-after-free in memcpy include/linux/string.h:406 [inline] BUG: KASAN: use-after-free in mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675 Write of size 4096 at addr ffff88810ae32000 by task syz-executor.1/2355 CPU: 1 PID: 2355 Comm: syz-executor.1 Not tainted 5.8.0-rc2+ raspberrypi#29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xf6/0x16e lib/dump_stack.c:118 print_address_description.constprop.0+0x1c/0x250 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 check_memory_region_inline mm/kasan/generic.c:186 [inline] check_memory_region+0x14e/0x1b0 mm/kasan/generic.c:192 memcpy+0x39/0x60 mm/kasan/common.c:106 memcpy include/linux/string.h:406 [inline] mlxsw_pci_cmd_exec+0x177/0xa60 drivers/net/ethernet/mellanox/mlxsw/pci.c:1675 mlxsw_cmd_exec+0x249/0x550 drivers/net/ethernet/mellanox/mlxsw/core.c:2335 mlxsw_cmd_access_reg drivers/net/ethernet/mellanox/mlxsw/cmd.h:859 [inline] mlxsw_core_reg_access_cmd drivers/net/ethernet/mellanox/mlxsw/core.c:1938 [inline] mlxsw_core_reg_access+0x2f6/0x540 drivers/net/ethernet/mellanox/mlxsw/core.c:1985 mlxsw_reg_query drivers/net/ethernet/mellanox/mlxsw/core.c:2000 [inline] mlxsw_devlink_info_get+0x17f/0x6e0 drivers/net/ethernet/mellanox/mlxsw/core.c:1090 devlink_nl_info_fill.constprop.0+0x13c/0x2d0 net/core/devlink.c:4588 devlink_nl_cmd_info_get_dumpit+0x246/0x460 net/core/devlink.c:4648 genl_lock_dumpit+0x85/0xc0 net/netlink/genetlink.c:575 netlink_dump+0x515/0xe50 net/netlink/af_netlink.c:2245 __netlink_dump_start+0x53d/0x830 net/netlink/af_netlink.c:2353 genl_family_rcv_msg_dumpit.isra.0+0x296/0x300 net/netlink/genetlink.c:638 genl_family_rcv_msg net/netlink/genetlink.c:733 [inline] genl_rcv_msg+0x78d/0x9d0 net/netlink/genetlink.c:753 netlink_rcv_skb+0x152/0x440 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:764 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x53a/0x750 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x850/0xd90 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0x150/0x190 net/socket.c:672 ____sys_sendmsg+0x6d8/0x840 net/socket.c:2363 ___sys_sendmsg+0xff/0x170 net/socket.c:2417 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2450 do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a9c8336 ("mlxsw: core: Add support for devlink info command") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit b514191 ] The commit cited below removed the RCU read-side critical section from rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked without RCU protection. This results in the following warning [1] in the VXLAN driver, which relied on the callback being invoked from an RCU read-side critical section. Fix this by calling rcu_read_lock() in the VXLAN driver, as already done in the bridge driver. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted ----------------------------- drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by bridge/166: #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090 stack backtrace: CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d vxlan_fdb_dump+0x51e/0x6d0 rtnl_fdb_dump+0x4dc/0xad0 netlink_dump+0x540/0x1090 __netlink_dump_start+0x695/0x950 rtnetlink_rcv_msg+0x802/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 __sys_sendto+0x279/0x3b0 __x64_sys_sendto+0xe6/0x1a0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe14fa2ade0 Code: Bad RIP value. RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0 RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003 RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Fixes: 5e6d243 ("bridge: netlink dump interface at par with brctl") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The commit cited below removed the RCU read-side critical section from rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked without RCU protection. This results in the following warning [1] in the VXLAN driver, which relied on the callback being invoked from an RCU read-side critical section. Fix this by calling rcu_read_lock() in the VXLAN driver, as already done in the bridge driver. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01521-g481007553ce6 #29 Not tainted ----------------------------- drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by bridge/166: #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090 stack backtrace: CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 #29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d vxlan_fdb_dump+0x51e/0x6d0 rtnl_fdb_dump+0x4dc/0xad0 netlink_dump+0x540/0x1090 __netlink_dump_start+0x695/0x950 rtnetlink_rcv_msg+0x802/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 __sys_sendto+0x279/0x3b0 __x64_sys_sendto+0xe6/0x1a0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe14fa2ade0 Code: Bad RIP value. RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0 RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003 RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Fixes: 5e6d243 ("bridge: netlink dump interface at par with brctl") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>

[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 #33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

When more than a single SCMI device are present in the system, the creation of the notification workqueue with the WQ_SYSFS flag will lead to the following sysfs duplicate node warning: sysfs: cannot create duplicate filename '/devices/virtual/workqueue/scmi_notify' CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.9.0-gdf4dd84a3f7d #29 Hardware name: Broadcom STB (Flattened Device Tree) Workqueue: events deferred_probe_work_func Backtrace: show_stack + 0x20/0x24 dump_stack + 0xbc/0xe0 sysfs_warn_dup + 0x70/0x80 sysfs_create_dir_ns + 0x15c/0x1a4 kobject_add_internal + 0x140/0x4d0 kobject_add + 0xc8/0x138 device_add + 0x1dc/0xc20 device_register + 0x24/0x28 workqueue_sysfs_register + 0xe4/0x1f0 alloc_workqueue + 0x448/0x6ac scmi_notification_init + 0x78/0x1dc scmi_probe + 0x268/0x4fc platform_drv_probe + 0x70/0xc8 really_probe + 0x184/0x728 driver_probe_device + 0xa4/0x278 __device_attach_driver + 0xe8/0x148 bus_for_each_drv + 0x108/0x158 __device_attach + 0x190/0x234 device_initial_probe + 0x1c/0x20 bus_probe_device + 0xdc/0xec deferred_probe_work_func + 0xd4/0x11c process_one_work + 0x420/0x8f0 worker_thread + 0x4fc/0x91c kthread + 0x21c/0x22c ret_from_fork + 0x14/0x20 kobject_add_internal failed for scmi_notify with -EEXIST, don't try to register things with the same name in the same directory. arm-scmi brcm_scmi@1: SCMI Notifications - Initialization Failed. arm-scmi brcm_scmi@1: SCMI Notifications NOT available. arm-scmi brcm_scmi@1: SCMI Protocol v1.0 'brcm-scmi:' Firmware version 0x1 Fix this by using dev_name(handle->dev) which guarantees that the name is unique and this also helps correlate which notification workqueue corresponds to which SCMI device instance. Link: https://lore.kernel.org/r/20201014021737.287340-1-f.fainelli@gmail.com Fixes: bd31b24 ("firmware: arm_scmi: Add notification dispatch and delivery") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> [sudeep.holla: trimmed backtrace to remove all unwanted hexcodes and timestamps] Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

[ Upstream commit b9ceca6 ] When more than a single SCMI device are present in the system, the creation of the notification workqueue with the WQ_SYSFS flag will lead to the following sysfs duplicate node warning: sysfs: cannot create duplicate filename '/devices/virtual/workqueue/scmi_notify' CPU: 0 PID: 20 Comm: kworker/0:1 Not tainted 5.9.0-gdf4dd84a3f7d raspberrypi#29 Hardware name: Broadcom STB (Flattened Device Tree) Workqueue: events deferred_probe_work_func Backtrace: show_stack + 0x20/0x24 dump_stack + 0xbc/0xe0 sysfs_warn_dup + 0x70/0x80 sysfs_create_dir_ns + 0x15c/0x1a4 kobject_add_internal + 0x140/0x4d0 kobject_add + 0xc8/0x138 device_add + 0x1dc/0xc20 device_register + 0x24/0x28 workqueue_sysfs_register + 0xe4/0x1f0 alloc_workqueue + 0x448/0x6ac scmi_notification_init + 0x78/0x1dc scmi_probe + 0x268/0x4fc platform_drv_probe + 0x70/0xc8 really_probe + 0x184/0x728 driver_probe_device + 0xa4/0x278 __device_attach_driver + 0xe8/0x148 bus_for_each_drv + 0x108/0x158 __device_attach + 0x190/0x234 device_initial_probe + 0x1c/0x20 bus_probe_device + 0xdc/0xec deferred_probe_work_func + 0xd4/0x11c process_one_work + 0x420/0x8f0 worker_thread + 0x4fc/0x91c kthread + 0x21c/0x22c ret_from_fork + 0x14/0x20 kobject_add_internal failed for scmi_notify with -EEXIST, don't try to register things with the same name in the same directory. arm-scmi brcm_scmi@1: SCMI Notifications - Initialization Failed. arm-scmi brcm_scmi@1: SCMI Notifications NOT available. arm-scmi brcm_scmi@1: SCMI Protocol v1.0 'brcm-scmi:' Firmware version 0x1 Fix this by using dev_name(handle->dev) which guarantees that the name is unique and this also helps correlate which notification workqueue corresponds to which SCMI device instance. Link: https://lore.kernel.org/r/20201014021737.287340-1-f.fainelli@gmail.com Fixes: bd31b24 ("firmware: arm_scmi: Add notification dispatch and delivery") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> [sudeep.holla: trimmed backtrace to remove all unwanted hexcodes and timestamps] Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>

This patch fixes issue introduced by a previous commit where iWARP doorbell address wasn't initialized, causing call trace when any RDMA application wants to use this interface: Illegal doorbell address: 0000000000000000. Legal range for doorbell addresses is [0000000011431e08..00000000ec3799d3] WARNING: CPU: 11 PID: 11990 at drivers/net/ethernet/qlogic/qed/qed_dev.c:93 qed_db_rec_sanity.isra.12+0x48/0x70 [qed] ... hpsa scsi_transport_sas [last unloaded: crc8] CPU: 11 PID: 11990 Comm: rping Tainted: G S 5.10.0-rc1 #29 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018 RIP: 0010:qed_db_rec_sanity.isra.12+0x48/0x70 [qed] ... RSP: 0018:ffffafc28458fa88 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8d0d4c620000 RCX: 0000000000000000 RDX: ffff8d10afde7d50 RSI: ffff8d10afdd8b40 RDI: ffff8d10afdd8b40 RBP: ffffafc28458fe38 R08: 0000000000000003 R09: 0000000000007fff R10: 0000000000000001 R11: ffffafc28458f888 R12: 0000000000000000 R13: 0000000000000000 R14: ffff8d0d43ccbbd0 R15: ffff8d0d48dae9c0 FS: 00007fbd5267e740(0000) GS:ffff8d10afdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fbd4f258fb8 CR3: 0000000108d96003 CR4: 00000000001706e0 Call Trace: qed_db_recovery_add+0x6d/0x1f0 [qed] qedr_create_user_qp+0x57e/0xd30 [qedr] qedr_create_qp+0x5f3/0xab0 [qedr] ? lookup_get_idr_uobject.part.12+0x45/0x90 [ib_uverbs] create_qp+0x45d/0xb30 [ib_uverbs] ? ib_uverbs_cq_event_handler+0x30/0x30 [ib_uverbs] ib_uverbs_create_qp+0xb9/0xe0 [ib_uverbs] ib_uverbs_write+0x3f9/0x570 [ib_uverbs] ? security_mmap_file+0x62/0xe0 vfs_write+0xb7/0x200 ksys_write+0xaf/0xd0 ? syscall_trace_enter.isra.25+0x152/0x200 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 06e8d1d ("RDMA/qedr: Add support for user mode XRC-SRQ's") Link: https://lore.kernel.org/r/20201127163251.14533-1-palok@marvell.com Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>

The trace_hardirqs_{on,off}() require the caller to setup frame pointer properly. This because these two functions use macro 'CALLER_ADDR1' (aka. __builtin_return_address(1)) to acquire caller info. If the $fp is used for other purpose, the code generated this macro (as below) could trigger memory access fault. 0xffffffff8011510e <+80>: ld a1,-16(s0) 0xffffffff80115112 <+84>: ld s2,-8(a1) # <-- paging fault here The oops message during booting if compiled with 'irqoff' tracer enabled: [ 0.039615][ T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8 [ 0.041925][ T0] Oops [#1] [ 0.042063][ T0] Modules linked in: [ 0.042864][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 #29 [ 0.043568][ T0] Hardware name: riscv-virtio,qemu (DT) [ 0.044343][ T0] epc : trace_hardirqs_on+0x56/0xe2 [ 0.044601][ T0] ra : restore_all+0x12/0x6e [ 0.044721][ T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0 [ 0.044801][ T0] gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020 [ 0.044882][ T0] t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0 [ 0.044967][ T0] s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100 [ 0.045046][ T0] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.045124][ T0] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45 [ 0.045210][ T0] s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50 [ 0.045289][ T0] s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8 [ 0.045389][ T0] s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000 [ 0.045474][ T0] s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000 [ 0.045548][ T0] t5 : 0000000000000000 t6 : ffffffff814aa368 [ 0.045620][ T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d [ 0.046402][ T0] [<ffffffff80003b94>] restore_all+0x12/0x6e This because the $fp(aka. $s0) register is not used as frame pointer in the assembly entry code. resume_kernel: REG_L s0, TASK_TI_PREEMPT_COUNT(tp) bnez s0, restore_all REG_L s0, TASK_TI_FLAGS(tp) andi s0, s0, _TIF_NEED_RESCHED beqz s0, restore_all call preempt_schedule_irq j restore_all To fix above issue, here we add one extra level wrapper for function trace_hardirqs_{on,off}() so they can be safely called by low level entry code. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: 3c46979 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

commit 22e2100 upstream. The trace_hardirqs_{on,off}() require the caller to setup frame pointer properly. This because these two functions use macro 'CALLER_ADDR1' (aka. __builtin_return_address(1)) to acquire caller info. If the $fp is used for other purpose, the code generated this macro (as below) could trigger memory access fault. 0xffffffff8011510e <+80>: ld a1,-16(s0) 0xffffffff80115112 <+84>: ld s2,-8(a1) # <-- paging fault here The oops message during booting if compiled with 'irqoff' tracer enabled: [ 0.039615][ T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8 [ 0.041925][ T0] Oops [#1] [ 0.042063][ T0] Modules linked in: [ 0.042864][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 #29 [ 0.043568][ T0] Hardware name: riscv-virtio,qemu (DT) [ 0.044343][ T0] epc : trace_hardirqs_on+0x56/0xe2 [ 0.044601][ T0] ra : restore_all+0x12/0x6e [ 0.044721][ T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0 [ 0.044801][ T0] gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020 [ 0.044882][ T0] t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0 [ 0.044967][ T0] s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100 [ 0.045046][ T0] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.045124][ T0] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45 [ 0.045210][ T0] s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50 [ 0.045289][ T0] s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8 [ 0.045389][ T0] s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000 [ 0.045474][ T0] s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000 [ 0.045548][ T0] t5 : 0000000000000000 t6 : ffffffff814aa368 [ 0.045620][ T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d [ 0.046402][ T0] [<ffffffff80003b94>] restore_all+0x12/0x6e This because the $fp(aka. $s0) register is not used as frame pointer in the assembly entry code. resume_kernel: REG_L s0, TASK_TI_PREEMPT_COUNT(tp) bnez s0, restore_all REG_L s0, TASK_TI_FLAGS(tp) andi s0, s0, _TIF_NEED_RESCHED beqz s0, restore_all call preempt_schedule_irq j restore_all To fix above issue, here we add one extra level wrapper for function trace_hardirqs_{on,off}() so they can be safely called by low level entry code. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: 3c46979 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 22e2100 upstream. The trace_hardirqs_{on,off}() require the caller to setup frame pointer properly. This because these two functions use macro 'CALLER_ADDR1' (aka. __builtin_return_address(1)) to acquire caller info. If the $fp is used for other purpose, the code generated this macro (as below) could trigger memory access fault. 0xffffffff8011510e <+80>: ld a1,-16(s0) 0xffffffff80115112 <+84>: ld s2,-8(a1) # <-- paging fault here The oops message during booting if compiled with 'irqoff' tracer enabled: [ 0.039615][ T0] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000f8 [ 0.041925][ T0] Oops [raspberrypi#1] [ 0.042063][ T0] Modules linked in: [ 0.042864][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.17.0-rc1-00233-g9a20c48d1ed2 raspberrypi#29 [ 0.043568][ T0] Hardware name: riscv-virtio,qemu (DT) [ 0.044343][ T0] epc : trace_hardirqs_on+0x56/0xe2 [ 0.044601][ T0] ra : restore_all+0x12/0x6e [ 0.044721][ T0] epc : ffffffff80126a5c ra : ffffffff80003b94 sp : ffffffff81403db0 [ 0.044801][ T0] gp : ffffffff8163acd8 tp : ffffffff81414880 t0 : 0000000000000020 [ 0.044882][ T0] t1 : 0098968000000000 t2 : 0000000000000000 s0 : ffffffff81403de0 [ 0.044967][ T0] s1 : 0000000000000000 a0 : 0000000000000001 a1 : 0000000000000100 [ 0.045046][ T0] a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000 [ 0.045124][ T0] a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000054494d45 [ 0.045210][ T0] s2 : ffffffff80003b94 s3 : ffffffff81a8f1b0 s4 : ffffffff80e27b50 [ 0.045289][ T0] s5 : ffffffff81414880 s6 : ffffffff8160fa00 s7 : 00000000800120e8 [ 0.045389][ T0] s8 : 0000000080013100 s9 : 000000000000007f s10: 0000000000000000 [ 0.045474][ T0] s11: 0000000000000000 t3 : 7fffffffffffffff t4 : 0000000000000000 [ 0.045548][ T0] t5 : 0000000000000000 t6 : ffffffff814aa368 [ 0.045620][ T0] status: 0000000200000100 badaddr: 00000000000000f8 cause: 000000000000000d [ 0.046402][ T0] [<ffffffff80003b94>] restore_all+0x12/0x6e This because the $fp(aka. $s0) register is not used as frame pointer in the assembly entry code. resume_kernel: REG_L s0, TASK_TI_PREEMPT_COUNT(tp) bnez s0, restore_all REG_L s0, TASK_TI_FLAGS(tp) andi s0, s0, _TIF_NEED_RESCHED beqz s0, restore_all call preempt_schedule_irq j restore_all To fix above issue, here we add one extra level wrapper for function trace_hardirqs_{on,off}() so they can be safely called by low level entry code. Signed-off-by: Changbin Du <changbin.du@gmail.com> Fixes: 3c46979 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The following has been observed when running stressng mmap since commit b653db7 ("mm: Clear page->private when splitting or migrating a page") watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546] CPU: 75 PID: 9546 Comm: stress-ng Tainted: G E 6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374 Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016 RIP: 0010:xas_descend+0x28/0x80 Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2 RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246 RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002 RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0 RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0 R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000 R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88 FS: 00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xas_load+0x3a/0x50 __filemap_get_folio+0x80/0x370 ? put_swap_page+0x163/0x360 pagecache_get_page+0x13/0x90 __try_to_reclaim_swap+0x50/0x190 scan_swap_map_slots+0x31e/0x670 get_swap_pages+0x226/0x3c0 folio_alloc_swap+0x1cc/0x240 add_to_swap+0x14/0x70 shrink_page_list+0x968/0xbc0 reclaim_page_list+0x70/0xf0 reclaim_pages+0xdd/0x120 madvise_cold_or_pageout_pte_range+0x814/0xf30 walk_pgd_range+0x637/0xa30 __walk_page_range+0x142/0x170 walk_page_range+0x146/0x170 madvise_pageout+0xb7/0x280 ? asm_common_interrupt+0x22/0x40 madvise_vma_behavior+0x3b7/0xac0 ? find_vma+0x4a/0x70 ? find_vma+0x64/0x70 ? madvise_vma_anon_name+0x40/0x40 madvise_walk_vmas+0xa6/0x130 do_madvise+0x2f4/0x360 __x64_sys_madvise+0x26/0x30 do_syscall_64+0x5b/0x80 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? do_syscall_64+0x67/0x80 ? common_interrupt+0x8b/0xa0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The problem can be reproduced with the mmtests config config-workload-stressng-mmap. It does not always happen and when it triggers is variable but it has happened on multiple machines. The intent of commit b653db7 patch was to avoid the case where PG_private is clear but folio->private is not-NULL. However, THP tail pages uses page->private for "swp_entry_t if folio_test_swapcache()" as stated in the documentation for struct folio. This patch only clobbers page->private for tail pages if the head page was not in swapcache and warns once if page->private had an unexpected value. Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net Fixes: b653db7 ("mm: Clear page->private when splitting or migrating a page") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <shy828301@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

commit 71e2d66 upstream. The following has been observed when running stressng mmap since commit b653db7 ("mm: Clear page->private when splitting or migrating a page") watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546] CPU: 75 PID: 9546 Comm: stress-ng Tainted: G E 6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374 Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016 RIP: 0010:xas_descend+0x28/0x80 Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2 RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246 RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002 RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0 RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0 R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000 R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88 FS: 00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xas_load+0x3a/0x50 __filemap_get_folio+0x80/0x370 ? put_swap_page+0x163/0x360 pagecache_get_page+0x13/0x90 __try_to_reclaim_swap+0x50/0x190 scan_swap_map_slots+0x31e/0x670 get_swap_pages+0x226/0x3c0 folio_alloc_swap+0x1cc/0x240 add_to_swap+0x14/0x70 shrink_page_list+0x968/0xbc0 reclaim_page_list+0x70/0xf0 reclaim_pages+0xdd/0x120 madvise_cold_or_pageout_pte_range+0x814/0xf30 walk_pgd_range+0x637/0xa30 __walk_page_range+0x142/0x170 walk_page_range+0x146/0x170 madvise_pageout+0xb7/0x280 ? asm_common_interrupt+0x22/0x40 madvise_vma_behavior+0x3b7/0xac0 ? find_vma+0x4a/0x70 ? find_vma+0x64/0x70 ? madvise_vma_anon_name+0x40/0x40 madvise_walk_vmas+0xa6/0x130 do_madvise+0x2f4/0x360 __x64_sys_madvise+0x26/0x30 do_syscall_64+0x5b/0x80 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? syscall_exit_to_user_mode+0x17/0x40 ? do_syscall_64+0x67/0x80 ? do_syscall_64+0x67/0x80 ? common_interrupt+0x8b/0xa0 entry_SYSCALL_64_after_hwframe+0x63/0xcd The problem can be reproduced with the mmtests config config-workload-stressng-mmap. It does not always happen and when it triggers is variable but it has happened on multiple machines. The intent of commit b653db7 patch was to avoid the case where PG_private is clear but folio->private is not-NULL. However, THP tail pages uses page->private for "swp_entry_t if folio_test_swapcache()" as stated in the documentation for struct folio. This patch only clobbers page->private for tail pages if the head page was not in swapcache and warns once if page->private had an unexpected value. Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.net Fixes: b653db7 ("mm: Clear page->private when splitting or migrating a page") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Yang Shi <shy828301@gmail.com> Cc: Brian Foster <bfoster@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit b514191 ] The commit cited below removed the RCU read-side critical section from rtnl_fdb_dump() which means that the ndo_fdb_dump() callback is invoked without RCU protection. This results in the following warning [1] in the VXLAN driver, which relied on the callback being invoked from an RCU read-side critical section. Fix this by calling rcu_read_lock() in the VXLAN driver, as already done in the bridge driver. [1] WARNING: suspicious RCU usage 5.8.0-rc4-custom-01521-g481007553ce6 raspberrypi#29 Not tainted ----------------------------- drivers/net/vxlan.c:1379 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by bridge/166: #0: ffffffff85a27850 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0xea/0x1090 stack backtrace: CPU: 1 PID: 166 Comm: bridge Not tainted 5.8.0-rc4-custom-01521-g481007553ce6 raspberrypi#29 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x100/0x184 lockdep_rcu_suspicious+0x153/0x15d vxlan_fdb_dump+0x51e/0x6d0 rtnl_fdb_dump+0x4dc/0xad0 netlink_dump+0x540/0x1090 __netlink_dump_start+0x695/0x950 rtnetlink_rcv_msg+0x802/0xbd0 netlink_rcv_skb+0x17a/0x480 rtnetlink_rcv+0x22/0x30 netlink_unicast+0x5ae/0x890 netlink_sendmsg+0x98a/0xf40 __sys_sendto+0x279/0x3b0 __x64_sys_sendto+0xe6/0x1a0 do_syscall_64+0x54/0xa0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fe14fa2ade0 Code: Bad RIP value. RSP: 002b:00007fff75bb5b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00005614b1ba0020 RCX: 00007fe14fa2ade0 RDX: 000000000000011c RSI: 00007fff75bb5b90 RDI: 0000000000000003 RBP: 00007fff75bb5b90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00005614b1b89160 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Fixes: 5e6d243 ("bridge: netlink dump interface at par with brctl") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scx_pair uses the default stride value of nr_cpu_ids / 2, which matches most x86 SMT configurations. However, it does allow specifying a custom stride value with -S so that e.g. neighboring CPUs can be paired up. However, not all stride values work and errors were not reported very well. This patch improves error handling so that scx_pair fails with clear error message if CPUs can't be paired up with the specified stride value. scx_pair now also prints out how CPUs are paired on startup. This should address issues raspberrypi#28 and raspberrypi#29.

jstsch mentioned this issue Jun 16, 2012

USB devices causing "XactErr without NYET/NAK/ACK" #35

Closed

jonhadfield mentioned this issue Jun 30, 2020

"brcmf_sdio_readframes: RXHEADER FAILED: -84" spam fills root partition #2978

Open

alanbork mentioned this issue Apr 15, 2021

4k60p display issue (DRM VC4 driver crash / backtrace) #3842

Closed

alanbork mentioned this issue May 16, 2022

KMS: 4096x2160p60 causes hang/unresponsive pi unless overclocked #5034

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running X/LXDE causes packet loss/dmesg errors on networking #29

Running X/LXDE causes packet loss/dmesg errors on networking #29

marsman2020 commented May 31, 2012

XECDesign commented Jun 1, 2012

marsman2020 commented Jun 1, 2012

XECDesign commented Jun 1, 2012

marsman2020 commented Jun 1, 2012

XECDesign commented Jun 1, 2012

XECDesign commented Jun 1, 2012

marsman2020 commented Jun 3, 2012

marsman2020 commented Jun 4, 2012

NickBT commented Jun 8, 2012

jstsch commented Jun 10, 2012

jstsch commented Jun 10, 2012

marsman2020 commented Jun 11, 2012

NickBT commented Jun 11, 2012

marsman2020 commented Jun 14, 2012

marsman2020 commented Jun 16, 2012

guisacouto commented Jun 21, 2012

XECDesign commented Jun 21, 2012

guisacouto commented Jun 21, 2012

jstsch commented Jun 21, 2012

marsman2020 commented Jun 22, 2012

fbutler commented Jun 22, 2012

g4eml commented Jun 22, 2012

Pitel commented Jun 26, 2012

marsman2020 commented Jun 28, 2012

mcphail commented Jun 29, 2012

popcornmix commented Jul 10, 2012

marsman2020 commented Jul 10, 2012

popcornmix commented Jul 10, 2012

NickBT commented Jul 10, 2012

Running X/LXDE causes packet loss/dmesg errors on networking #29

Running X/LXDE causes packet loss/dmesg errors on networking #29

Comments

marsman2020 commented May 31, 2012

XECDesign commented Jun 1, 2012

marsman2020 commented Jun 1, 2012

XECDesign commented Jun 1, 2012

marsman2020 commented Jun 1, 2012

XECDesign commented Jun 1, 2012

XECDesign commented Jun 1, 2012

marsman2020 commented Jun 3, 2012

marsman2020 commented Jun 4, 2012

NickBT commented Jun 8, 2012

jstsch commented Jun 10, 2012

jstsch commented Jun 10, 2012

marsman2020 commented Jun 11, 2012

NickBT commented Jun 11, 2012

marsman2020 commented Jun 14, 2012

marsman2020 commented Jun 16, 2012

guisacouto commented Jun 21, 2012

XECDesign commented Jun 21, 2012

guisacouto commented Jun 21, 2012

jstsch commented Jun 21, 2012

marsman2020 commented Jun 22, 2012

fbutler commented Jun 22, 2012

g4eml commented Jun 22, 2012

Pitel commented Jun 26, 2012

marsman2020 commented Jun 28, 2012

mcphail commented Jun 29, 2012

popcornmix commented Jul 10, 2012

marsman2020 commented Jul 10, 2012

popcornmix commented Jul 10, 2012

NickBT commented Jul 10, 2012