Network driver on RPi3 B Plus causing hung tasks when working on an NFS mount #2482

graysky2 · 2018-03-30T18:28:16Z

Platform/Distro: RPi 3B+ running Arch ARM (armv7h).
Kernel version: 4.14.31 (b36f4e9)
Firmware version: latest as I write this (raspberrypi/firmware@c14a903)

Bug: Frequent kernel oops due to blocked tasks when writing files to NFS mount.

Details: When compiling, dmesg is full of kernel oops like the below when doing so on an NFS mount. Compiling to the micro SD card is fine. I believe that the software (disto) on the micro SD card is NOT to blame... if I put the same micro SD card into a RPi3 or RPi2, I can compile without error.

Again, I am using an NFS mounted partition (/scratch) on which to compile, so I'm hypothesizing that these problems are related to the network driver.

...
[ 2455.534291] INFO: task ld:24879 blocked for more than 120 seconds.
[ 2455.538489]       Tainted: G         C      4.14.31-1-ARCH #1
[ 2455.542688] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2455.550990] ld              D    0 24879  24804 0x00000000
[ 2455.555379] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 2455.559662] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 2455.563990] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 2455.572326] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 2455.580865] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 2455.589272] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 2455.597837] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 2455.606295] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 2455.610675] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 2455.614999] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 2547.695051] nfs: server ease not responding, still trying
[ 2548.735626] nfs: server ease not responding, still trying
[ 2548.768826] nfs: server ease OK
[ 2548.796748] nfs: server ease OK
[ 2701.296329] INFO: task ld:24879 blocked for more than 120 seconds.
[ 2701.300214]       Tainted: G         C      4.14.31-1-ARCH #1
[ 2701.304061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2701.311642] ld              D    0 24879  24804 0x00000000
[ 2701.315536] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 2701.319458] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 2701.323355] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 2701.330878] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 2701.338447] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 2701.345916] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 2701.353469] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 2701.360953] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 2701.364740] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 2701.368593] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 2772.976750] nfs: server ease not responding, still trying
[ 2774.331264] nfs: server ease OK
[ 2947.057892] INFO: task ld:24879 blocked for more than 120 seconds.
[ 2947.061907]       Tainted: G         C      4.14.31-1-ARCH #1
[ 2947.066031] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2947.074107] ld              D    0 24879  24804 0x00000000
[ 2947.078244] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 2947.081483] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 2947.084348] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 2947.090033] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 2947.095898] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 2947.101751] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 2947.107513] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 2947.113352] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 2947.116350] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 2947.119289] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 2998.258064] nfs: server ease not responding, still trying
[ 2999.352463] nfs: server ease OK
[ 3192.819075] INFO: task ld:24879 blocked for more than 120 seconds.
[ 3192.823185]       Tainted: G         C      4.14.31-1-ARCH #1
[ 3192.827330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3192.835447] ld              D    0 24879  24804 0x00000000
[ 3192.839604] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 3192.842832] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 3192.845750] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 3192.851476] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 3192.857318] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 3192.863126] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 3192.868837] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 3192.874594] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 3192.877558] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 3192.880466] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 3223.539141] nfs: server ease not responding, still trying
[ 3224.579687] nfs: server ease not responding, still trying
[ 3224.612015] nfs: server ease OK
[ 3224.626000] nfs: server ease OK
[ 3438.580109] INFO: task objcopy:24916 blocked for more than 120 seconds.
[ 3438.583905]       Tainted: G         C      4.14.31-1-ARCH #1
[ 3438.587697] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3438.595231] objcopy         D    0 24916  24912 0x00000000
[ 3438.599109] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 3438.603019] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 3438.606896] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 3438.614435] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 3438.622018] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 3438.629666] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 3438.637259] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 3438.644894] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 3438.648704] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 3438.652599] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 3448.820081] nfs: server ease not responding, still trying
[ 3450.148878] nfs: server ease OK
[ 3674.100906] nfs: server ease not responding, still trying
[ 3675.141506] nfs: server ease not responding, still trying
[ 3675.174279] nfs: server ease OK
[ 3675.202048] nfs: server ease OK
[ 3807.221430] INFO: task objcopy:24916 blocked for more than 120 seconds.
[ 3807.225253]       Tainted: G         C      4.14.31-1-ARCH #1
[ 3807.229007] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3807.236459] objcopy         D    0 24916  24912 0x00000000
[ 3807.240428] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 3807.244393] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 3807.248202] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 3807.255540] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 3807.263030] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 3807.270494] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 3807.277992] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 3807.285364] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 3807.289292] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 3807.293169] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 3899.381659] nfs: server ease not responding, still trying
[ 3900.422241] nfs: server ease not responding, still trying
[ 3900.461112] nfs: server ease OK
[ 3900.474540] nfs: server ease OK
[ 4011.372575] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based  firewall rule not found. Use the iptables CT target to attach helpers instead.
[ 4052.982250] INFO: task as:25088 blocked for more than 120 seconds.
[ 4052.986324]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4052.990389] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4052.998504] as              D    0 25088  25086 0x00000000
[ 4053.002785] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4053.006065] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4053.008960] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4053.014564] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4053.020330] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 4053.026110] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 4053.031705] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 4053.037527] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 4053.040507] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 4053.043431] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4134.902727] nfs: server ease not responding, still trying
[ 4135.997194] nfs: server ease OK
[ 4529.145918] nfs: server ease not responding, still trying
[ 4529.145923] nfs: server ease not responding, still trying
[ 4529.145940] nfs: server ease not responding, still trying
[ 4529.145978] nfs: server ease not responding, still trying
[ 4529.146011] nfs: server ease not responding, still trying
[ 4529.146028] nfs: server ease not responding, still trying
[ 4529.146044] nfs: server ease not responding, still trying
[ 4538.105971] nfs: server ease not responding, still trying
[ 4538.109131] nfs: server ease not responding, still trying
[ 4544.506128] INFO: task gcc:2854 blocked for more than 120 seconds.
[ 4544.509193]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4544.512157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4544.517957] gcc             D    0  2854   2852 0x00000000
[ 4544.520871] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4544.523830] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4544.526762] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4544.530980] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4544.534883] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 4544.538880] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 4544.542873] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 4544.546949] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 4544.549173] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 4544.551445] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4571.406855] nfs: server ease OK
[ 4571.406996] nfs: server ease OK
[ 4571.407031] nfs: server ease OK
[ 4571.407691] nfs: server ease OK
[ 4571.407701] nfs: server ease OK
[ 4571.410844] nfs: server ease OK
[ 4571.410877] nfs: server ease OK
[ 4571.411761] nfs: server ease OK
[ 4571.411810] nfs: server ease OK
[ 4790.267644] INFO: task ld:7630 blocked for more than 120 seconds.
[ 4790.270597]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4790.273588] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4790.279563] ld              D    0  7630   7628 0x00000000
[ 4790.282558] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4790.285531] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4790.288488] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4790.294136] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4790.299855] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 4790.305556] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 4790.311366] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 4790.317112] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 4790.320380] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 4790.323699] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4790.330500] INFO: task ld:7636 blocked for more than 120 seconds.
[ 4790.334181]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4790.338097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4790.346223] ld              D    0  7636   7633 0x00000000
[ 4790.350304] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4790.354463] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4790.358593] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4790.366494] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4790.374744] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 4790.383021] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 4790.391236] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 4790.399371] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 4790.403607] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 4790.407831] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)

The text was updated successfully, but these errors were encountered:

graysky2 · 2018-03-30T20:12:44Z

An easy way to trigger this bug (if you don't want to try compiling the kernel package) is to simply use dd to write out from /dev/zero to the NFS mount. For example on my RPi3 B+:

# mount ease:/scratch /scratch-nfs
% dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=1000 status=progress
964689920 bytes (965 MB, 920 MiB) copied, 149 s, 6.5 MB/s

<<< it froze up after about 965 MB written >>>
<<< In dmesg I get another server not responding error >>>

[ 5112.824818] nfs: server ease not responding, still trying
[ 5149.707808] nfs: server ease OK

Now, if I swap out the micro SD and boot into a RPi 2 I have lying around, same network cable, same power supply, and repeat the commands, everything works as expected. I think that helps to rule out the NFS server, network hardware etc. as potentially to blame.

# mount ease:/scratch /scratch-nfs
% dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=1000 status=progress
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 346 s, 12.1 MB/s
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 357.595 s, 11.7 MB/s
dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=1000 status=progress  0.00s user 24.47s system 5% cpu 8:03.99 total

pelwell · 2018-03-30T21:26:51Z

Does disabling Energy Efficient Ethernet make a difference? Add dtparam=eee=off to config.txt and reboot.

But before trying that you can confirm whether EEE is active using ethtool --show-eee eth0.

graysky2 · 2018-03-30T21:55:19Z

Great suggestion, @pelwell. I got some very encouraging results using the dd test which floods the I/O with a steady stream of data. It "passed" meaning no timeouts writing and no server not responding messages via dmesg. I am compiling the same package that consistently gives the errors now and will post back with those results.

Before:

# ethtool --show-eee eth0
EEE Settings for eth0:
	EEE status: enabled - active
...

After:

# ethtool --show-eee eth0
EEE Settings for eth0:
	EEE status: disabled
...

The test with dd:

# mount ease:/scratch /scratch-nfs

% dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=1000 status=progress && rm fill
4169138176 bytes (4.2 GB, 3.9 GiB) copied, 97 s, 42.9 MB/s 
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB, 3.9 GiB) copied, 100.665 s, 41.7 MB/s
dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=1000 status=progress  0.00s user 13.79s system 13% cpu 1:40.68 total

% dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=2000 status=progress && rm fill
8380219392 bytes (8.4 GB, 7.8 GiB) copied, 198 s, 42.3 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 201.245 s, 41.7 MB/s
dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=2000 status=progress  0.00s user 27.98s system 13% cpu 3:21.25 total

% dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=2000 status=progress && rm fill
8380219392 bytes (8.4 GB, 7.8 GiB) copied, 198 s, 42.3 MB/s
2000+0 records in
2000+0 records out
8388608000 bytes (8.4 GB, 7.8 GiB) copied, 201.052 s, 41.7 MB/s
dd if=/dev/zero of=/scratch-nfs/fill bs=4M count=2000 status=progress  0.00s user 28.23s system 13% cpu 3:22.19 total

Unfortunately, when compiling which as you can reconize, writes out data must less frequently than dd does, I am experiencing the same errors:

[ 3315.685473] INFO: task gzip:29769 blocked for more than 120 seconds.
[ 3315.685636]       Tainted: G         C      4.14.31-1-ARCH #1
[ 3315.685767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3315.685955] gzip            D    0 29769  29767 0x00000000
[ 3315.686127] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 3315.686299] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 3315.686473] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 3315.686663] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 3315.686875] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 3315.687121] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 3315.687349] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 3315.687529] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 3315.691478] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 3315.695540] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 3402.725251] nfs: server ease not responding, still trying
[ 3403.765783] nfs: server ease not responding, still trying
[ 3404.089089] nfs: server ease OK
[ 3404.089297] nfs: server ease OK
[ 3899.364008] nfs: server ease not responding, still trying
[ 3899.364013] nfs: server ease not responding, still trying
[ 3899.364028] nfs: server ease not responding, still trying
[ 3899.364060] nfs: server ease not responding, still trying
[ 3899.364071] nfs: server ease not responding, still trying
[ 3899.364076] nfs: server ease not responding, still trying
[ 3930.084023] INFO: task ld:13616 blocked for more than 120 seconds.
[ 3930.087086]       Tainted: G         C      4.14.31-1-ARCH #1
[ 3930.090229] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3930.096312] ld              D    0 13616  13612 0x00000000
[ 3930.099422] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 3930.102523] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 3930.105566] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 3930.111351] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 3930.117264] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 3930.123049] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 3930.129036] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 3930.135044] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 3930.138283] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 3930.141618] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 3941.625186] nfs: server ease OK
[ 3941.625295] nfs: server ease OK
[ 3941.625441] nfs: server ease OK
[ 3941.625829] nfs: server ease OK
[ 3941.635332] nfs: server ease OK
[ 3941.635549] nfs: server ease OK
[ 4170.727338] nfs: server ease not responding, still trying
[ 4170.727343] nfs: server ease not responding, still trying
[ 4170.727356] nfs: server ease not responding, still trying
[ 4170.727395] nfs: server ease not responding, still trying
[ 4170.727413] nfs: server ease not responding, still trying
[ 4170.727428] nfs: server ease not responding, still trying
[ 4170.727441] nfs: server ease not responding, still trying
[ 4170.727455] nfs: server ease not responding, still trying
[ 4170.727461] nfs: server ease not responding, still trying
[ 4170.727467] nfs: server ease not responding, still trying
[ 4175.847588] INFO: task gzip:22430 blocked for more than 120 seconds.
[ 4175.849590]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4175.851594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4175.855516] gzip            D    0 22430  22391 0x00000000
[ 4175.857549] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4175.859576] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4175.861543] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4175.865280] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4175.869533] [<80230d0c>] (__filemap_fdatawait_range) from [<80230d70>] (filemap_fdatawait_range+0x18/0x28)
[ 4175.874352] [<80230d70>] (filemap_fdatawait_range) from [<802330f4>] (filemap_write_and_wait+0x58/0x7c)
[ 4175.879764] [<802330f4>] (filemap_write_and_wait) from [<803ea028>] (nfs_wb_all+0x14/0x15c)
[ 4175.885618] [<803ea028>] (nfs_wb_all) from [<803dd96c>] (nfs_setattr+0x280/0x2a4)
[ 4175.892223] [<803dd96c>] (nfs_setattr) from [<802bf8d4>] (notify_change+0x17c/0x410)
[ 4175.899511] [<802bf8d4>] (notify_change) from [<802d62fc>] (utimes_common+0xbc/0x188)
[ 4175.907605] [<802d62fc>] (utimes_common) from [<802d64c8>] (do_utimes+0x100/0x144)
[ 4175.916359] [<802d64c8>] (do_utimes) from [<802d6548>] (SyS_utimensat+0x3c/0xb0)
[ 4175.925462] [<802d6548>] (SyS_utimensat) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4175.934598] INFO: task cp:22444 blocked for more than 120 seconds.
[ 4175.939378]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4175.944179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4175.953867] cp              D    0 22444  22422 0x00000000
[ 4175.958731] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4175.963485] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4175.968194] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4175.977469] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4175.986837] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[ 4175.996147] [<80233190>] (filemap_write_and_wait_range) from [<803db1c4>] (nfs_file_fsync+0x30/0x280)
[ 4176.005282] [<803db1c4>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[ 4176.014199] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[ 4176.018723] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[ 4176.023212] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4176.031736] INFO: task gzip:22446 blocked for more than 120 seconds.
[ 4176.036150]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4176.040488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4176.048923] gzip            D    0 22446  22413 0x00000000
[ 4176.053120] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4176.057319] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4176.061492] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4176.069486] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4176.077824] [<80230d0c>] (__filemap_fdatawait_range) from [<80230d70>] (filemap_fdatawait_range+0x18/0x28)
[ 4176.086158] [<80230d70>] (filemap_fdatawait_range) from [<802330f4>] (filemap_write_and_wait+0x58/0x7c)
[ 4176.094434] [<802330f4>] (filemap_write_and_wait) from [<803ea028>] (nfs_wb_all+0x14/0x15c)
[ 4176.102722] [<803ea028>] (nfs_wb_all) from [<803dd96c>] (nfs_setattr+0x280/0x2a4)
[ 4176.111286] [<803dd96c>] (nfs_setattr) from [<802bf8d4>] (notify_change+0x17c/0x410)
[ 4176.119906] [<802bf8d4>] (notify_change) from [<802d62fc>] (utimes_common+0xbc/0x188)
[ 4176.128677] [<802d62fc>] (utimes_common) from [<802d64c8>] (do_utimes+0x100/0x144)
[ 4176.137604] [<802d64c8>] (do_utimes) from [<802d6548>] (SyS_utimensat+0x3c/0xb0)
[ 4176.146598] [<802d6548>] (SyS_utimensat) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4176.155652] INFO: task gzip:22448 blocked for more than 120 seconds.
[ 4176.160374]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4176.165034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4176.174526] gzip            D    0 22448  22399 0x00000000
[ 4176.179330] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4176.183995] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4176.188703] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4176.197969] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4176.207204] [<80230d0c>] (__filemap_fdatawait_range) from [<80230d70>] (filemap_fdatawait_range+0x18/0x28)
[ 4176.216304] [<80230d70>] (filemap_fdatawait_range) from [<802330f4>] (filemap_write_and_wait+0x58/0x7c)
[ 4176.225376] [<802330f4>] (filemap_write_and_wait) from [<803ea028>] (nfs_wb_all+0x14/0x15c)
[ 4176.234297] [<803ea028>] (nfs_wb_all) from [<803dd96c>] (nfs_setattr+0x280/0x2a4)
[ 4176.243314] [<803dd96c>] (nfs_setattr) from [<802bf8d4>] (notify_change+0x17c/0x410)
[ 4176.252319] [<802bf8d4>] (notify_change) from [<802d62fc>] (utimes_common+0xbc/0x188)
[ 4176.261418] [<802d62fc>] (utimes_common) from [<802d64c8>] (do_utimes+0x100/0x144)
[ 4176.270386] [<802d64c8>] (do_utimes) from [<802d6548>] (SyS_utimensat+0x3c/0xb0)
[ 4176.279426] [<802d6548>] (SyS_utimensat) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4176.288484] INFO: task gzip:22449 blocked for more than 120 seconds.
[ 4176.293202]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4176.297913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4176.307420] gzip            D    0 22449  22402 0x00000000
[ 4176.312234] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4176.316976] [<80a88018>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[ 4176.321712] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[ 4176.330930] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[ 4176.340209] [<80230d0c>] (__filemap_fdatawait_range) from [<80230d70>] (filemap_fdatawait_range+0x18/0x28)
[ 4176.349341] [<80230d70>] (filemap_fdatawait_range) from [<802330f4>] (filemap_write_and_wait+0x58/0x7c)
[ 4176.358392] [<802330f4>] (filemap_write_and_wait) from [<803ea028>] (nfs_wb_all+0x14/0x15c)
[ 4176.367284] [<803ea028>] (nfs_wb_all) from [<803dd96c>] (nfs_setattr+0x280/0x2a4)
[ 4176.376299] [<803dd96c>] (nfs_setattr) from [<802bf8d4>] (notify_change+0x17c/0x410)
[ 4176.385274] [<802bf8d4>] (notify_change) from [<802d62fc>] (utimes_common+0xbc/0x188)
[ 4176.394337] [<802d62fc>] (utimes_common) from [<802d64c8>] (do_utimes+0x100/0x144)
[ 4176.403276] [<802d64c8>] (do_utimes) from [<802d6548>] (SyS_utimensat+0x3c/0xb0)
[ 4176.412262] [<802d6548>] (SyS_utimensat) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4176.421295] INFO: task gzip:22452 blocked for more than 120 seconds.
[ 4176.425993]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4176.430693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4176.440149] gzip            D    0 22452  22418 0x00000000
[ 4176.444909] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4176.449585] [<80a88018>] (schedule) from [<80a8b3d4>] (rwsem_down_write_failed+0x12c/0x278)
[ 4176.458924] [<80a8b3d4>] (rwsem_down_write_failed) from [<80a8a6f0>] (down_write+0x58/0x60)
[ 4176.468288] [<80a8a6f0>] (down_write) from [<802afc48>] (path_openat+0x3b0/0x1150)
[ 4176.477766] [<802afc48>] (path_openat) from [<802b1954>] (do_filp_open+0x6c/0xdc)
[ 4176.487122] [<802b1954>] (do_filp_open) from [<8029edc4>] (do_sys_open+0x168/0x20c)
[ 4176.496594] [<8029edc4>] (do_sys_open) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4176.505990] INFO: task mkdir:22457 blocked for more than 120 seconds.
[ 4176.510857]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4176.515599] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4176.525022] mkdir           D    0 22457  22453 0x00000000
[ 4176.529774] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4176.534358] [<80a88018>] (schedule) from [<80a8b3d4>] (rwsem_down_write_failed+0x12c/0x278)
[ 4176.543475] [<80a8b3d4>] (rwsem_down_write_failed) from [<80a8a6f0>] (down_write+0x58/0x60)
[ 4176.552568] [<80a8a6f0>] (down_write) from [<802b1118>] (filename_create+0x70/0x14c)
[ 4176.561851] [<802b1118>] (filename_create) from [<802b1d30>] (SyS_mkdirat+0x4c/0xec)
[ 4176.571295] [<802b1d30>] (SyS_mkdirat) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4176.580876] INFO: task mkdir:22458 blocked for more than 120 seconds.
[ 4176.585741]       Tainted: G         C      4.14.31-1-ARCH #1
[ 4176.590628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4176.600118] mkdir           D    0 22458  22450 0x00000000
[ 4176.604882] [<80a87848>] (__schedule) from [<80a88018>] (schedule+0x3c/0xa0)
[ 4176.609664] [<80a88018>] (schedule) from [<80a8b3d4>] (rwsem_down_write_failed+0x12c/0x278)
[ 4176.618855] [<80a8b3d4>] (rwsem_down_write_failed) from [<80a8a6f0>] (down_write+0x58/0x60)
[ 4176.628053] [<80a8a6f0>] (down_write) from [<802b1118>] (filename_create+0x70/0x14c)
[ 4176.637238] [<802b1118>] (filename_create) from [<802b1d30>] (SyS_mkdirat+0x4c/0xec)
[ 4176.646634] [<802b1d30>] (SyS_mkdirat) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[ 4211.688544] nfs: server ease not responding, still trying
[ 4212.989190] nfs: server ease OK
[ 4212.989336] nfs: server ease OK
[ 4212.989372] nfs: server ease OK
[ 4212.992652] nfs: server ease OK
[ 4213.002084] nfs: server ease OK
[ 4213.002311] nfs: server ease OK
[ 4213.002416] nfs: server ease OK
[ 4213.017966] nfs: server ease OK
[ 4213.018012] nfs: server ease OK
[ 4213.018632] nfs: server ease OK
[ 4213.020006] nfs: server ease OK
[ 4401.133010] nfs: server ease not responding, still trying
[ 4401.133014] nfs: server ease not responding, still trying
[ 4401.133030] nfs: server ease not responding, still trying
[ 4401.133067] nfs: server ease not responding, still trying
[ 4401.133110] nfs: server ease not responding, still trying
[ 4401.133120] nfs: server ease not responding, still trying
[ 4401.133124] nfs: server ease not responding, still trying
[ 4401.133139] nfs: server ease not responding, still trying
[ 4401.133156] nfs: server ease not responding, still trying
[ 4401.133171] nfs: server ease not responding, still trying
[ 4401.133187] nfs: server ease not responding, still trying
[ 4401.133202] nfs: server ease not responding, still trying
[ 4401.133233] nfs: server ease not responding, still trying
[ 4401.133245] nfs: server ease not responding, still trying
[ 4401.133251] nfs: server ease not responding, still trying
[ 4443.397196] nfs: server ease OK
[ 4443.397213] nfs: server ease OK
[ 4443.397291] nfs: server ease OK
[ 4443.397316] nfs: server ease OK
[ 4443.397343] nfs: server ease OK
[ 4443.397410] nfs: server ease OK
[ 4443.397505] nfs: server ease OK
[ 4443.397580] nfs: server ease OK
[ 4443.397605] nfs: server ease OK
[ 4443.397714] nfs: server ease OK
[ 4443.399097] nfs: server ease OK
[ 4443.405096] nfs: server ease OK
[ 4443.405772] nfs: server ease OK
[ 4443.406117] nfs: server ease OK
[ 4443.406398] nfs: server ease OK
[ 4667.377155] nfs: server ease not responding, still trying
[ 4668.417708] nfs: server ease not responding, still trying
[ 4668.700017] nfs: server ease OK
[ 4668.700524] nfs: server ease OK
[ 4856.819062] nfs: server ease not responding, still trying
[ 4856.819067] nfs: server ease not responding, still trying
[ 4856.819082] nfs: server ease not responding, still trying
[ 4856.819130] nfs: server ease not responding, still trying
[ 4856.819135] nfs: server ease not responding, still trying
[ 4856.819142] nfs: server ease not responding, still trying
[ 4856.819154] nfs: server ease not responding, still trying
[ 4856.819174] nfs: server ease not responding, still trying
[ 4856.819188] nfs: server ease not responding, still trying
[ 4856.819209] nfs: server ease not responding, still trying
[ 4856.819216] nfs: server ease not responding, still trying
[ 4893.959982] nfs: server ease OK
[ 4893.960172] nfs: server ease OK
[ 4893.960210] nfs: server ease OK
[ 4893.960311] nfs: server ease OK
[ 4893.960640] nfs: server ease OK
[ 4893.960770] nfs: server ease OK
[ 4893.960780] nfs: server ease OK
[ 4893.961280] nfs: server ease OK
[ 4893.966452] nfs: server ease OK
[ 4893.967131] nfs: server ease OK
[ 4893.969369] nfs: server ease OK
[ 5123.060914] nfs: server ease not responding, still trying
[ 5124.101425] nfs: server ease not responding, still trying
[ 5124.376882] nfs: server ease OK
[ 5124.381100] nfs: server ease OK
[ 5353.461931] nfs: server ease not responding, still trying
[ 5354.784753] nfs: server ease OK
[ 5588.982673] nfs: server ease not responding, still trying
[ 5590.077559] nfs: server ease OK
[ 5814.263180] nfs: server ease not responding, still trying
[ 5815.303698] nfs: server ease not responding, still trying
[ 5815.334003] nfs: server ease OK
[ 5815.360538] nfs: server ease OK
[ 6044.663615] nfs: server ease not responding, still trying
[ 6045.721789] nfs: server ease OK
[ 6285.305546] nfs: server ease not responding, still trying
[ 6286.346054] nfs: server ease not responding, still trying
[ 6286.376999] nfs: server ease OK
[ 6286.403761] nfs: server ease OK
[ 6510.587277] nfs: server ease not responding, still trying
[ 6511.627865] nfs: server ease not responding, still trying
[ 6511.674761] nfs: server ease OK
[ 6511.686188] nfs: server ease OK
[ 6735.868562] nfs: server ease not responding, still trying
[ 6736.909076] nfs: server ease not responding, still trying
[ 6736.940771] nfs: server ease OK
[ 6736.967038] nfs: server ease OK
[ 6940.669438] nfs: server ease not responding, still trying
[ 6977.551872] nfs: server ease OK

graysky2 · 2018-03-31T11:34:04Z

I combined a few replies into one (above) and tried to make it it bit more concise. TL;DR version is that disabling EEE does not help.

pelwell · 2018-03-31T12:14:17Z

If possible, and if it isn't already on, can you enable flow control on the switch port connected to the Pi?

graysky2 · 2018-03-31T13:08:22Z

@pelwell - All the wired connections go through an unmanaged switch. No settings to tweak :/

mkreisl · 2018-03-31T18:29:18Z

Bug: Frequent kernel oops due to blocked tasks when writing files to NFS mount.

I had similar issues on SAMBA mount. But currently I can not run tests again, because I sent back my Pi3B+.

IMO current revision of Pi3B+ has serious hardware issues and I don't believe that they can be solved via software (Finally, I never was able to play a video longer than 15mins without a Kodi crash, kernel Oops, or freeze)

@pelwell and co:
Which Pi3B+ (revision) are you currently using? Parts of 0-series or parts from current production line, which customers are using now.

I still can't believe that you guys never had such issues before

JamesH65 · 2018-03-31T20:00:27Z

Current production I believe. Although I don't think there have been many/any changes since the prototypes. It appears that the issues are erratic, and depend on the capabilities of the network the device is attached to. We are trying to figure out the exact circumstances. I suspect there are a number of issues being seen, as often happens when a previously working driver suddenly because used by 250k extra people over a weekend, in all sorts of new and unpredictable ways. I have high hopes there is a software solution to this, we've always been able to find them in the past.

…

On 31 March 2018 at 19:29, Manfred Kreisl ***@***.***> wrote: Bug: Frequent kernel oops due to blocked tasks when writing files to NFS mount. I had similar issues on SAMBA mount. But currently I can not run tests again, because I sent back my Pi3B+. IMO current revision of Pi3B+ has serious hardware issues and I don't believe that they can be solved via software (Finally, I never was able to play a video longer than 15mins without a Kodi crash, kernel Oops, or freeze) @pelwell <https://github.com/pelwell> and co: Which Pi3B+ (revision) are you currently using? Parts of 0-series or parts from current production line, which customers are using now. I still can't believe that you guys never had such issues before — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2482 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADqrHV52f7xwHg74CUm3wYPAGFDuyvYZks5tj8sFgaJpZM4TB_lw> .

-- James Hughes Principal Software Engineer, Raspberry Pi (Trading) Ltd

Knoppix1 · 2018-04-01T14:49:36Z

Perso I back on pi 2b ...
(He boot faster with the same SD card...)

@mkreisl when your pi is back if work normally I send my pi too

zmartell · 2018-04-04T05:26:29Z

+1 I am noticing this issue as well when reading off samba mount. Brand new RPI 3B+.

graysky2 · 2018-04-04T23:32:41Z

@pelwell - From @popcornmix's advice in #2442, I built:

raspberrypi-firmware from raspberrypi/firmware@0dff9ec
linux-raspberrypi from b5b6bb9

I automated that dd test I described above in a simple script that repeats the writing out of 1G worth of zero filled file over an NFS share 32 times. I then used histogram.py to compute the stats.

With the dtparam=eee=off parameter set in /boot/config.txt I got some consistent results:

% histogram.py -p < results_no_eee.csv
# NumSamples = 32; Min = 25.46; Max = 25.75
# Mean = 25.687864; Variance = 0.002705; SD = 0.052009; Median 25.693114

When I removed that line (reverting to the default state of it being on, 1 of the 32 runs was really long:

% histogram.py -p < results.csv
# NumSamples = 36; Min = 25.34; Max = 139.44
# Mean = 28.763650; Variance = 350.005030; SD = 18.708421; Median 25.599488

Since using dd is going to max out the bus, I will try compiling the kernel which is much more gentle to the network IO and much more prone to errors in my experience. Thoughts?

graysky2 · 2018-04-05T19:46:51Z

OK... still experiencing the timeouts when compiling to the NFS share with eee enabled despite the successful replicates of using dd above. I am currently building c2eb306 and will test it by compiling the kernel to NFS with eee enabled and with it disabled.

For reference, here is the script to automate the replicate compile jobs.

graysky2 · 2018-04-06T00:14:32Z

@pelwell - I am still getting network timeouts... below is with dtparam=eee=off set booted into the latest kernel.

[11786.758187] nfs: server ease not responding, still trying
[11786.758192] nfs: server ease not responding, still trying
[11786.758206] nfs: server ease not responding, still trying
[11786.758225] nfs: server ease not responding, still trying
[11794.438353] INFO: task ld:25967 blocked for more than 120 seconds.
[11794.441599]       Tainted: G         C      4.14.32-2-ARCH #1
[11794.444867] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11794.451496] ld              D    0 25967  25966 0x00000000
[11794.454918] [<80a87c48>] (__schedule) from [<80a88418>] (schedule+0x3c/0xa0)
[11794.458408] [<80a88418>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[11794.461670] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[11794.468043] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[11794.474533] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[11794.481165] [<80233190>] (filemap_write_and_wait_range) from [<803db254>] (nfs_file_fsync+0x30/0x280)
[11794.487571] [<803db254>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[11794.494001] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[11794.497339] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[11794.500698] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[11831.579322] nfs: server ease OK
[11831.579326] nfs: server ease OK
[11831.583067] nfs: server ease OK
[11831.583118] nfs: server ease OK
[12040.199240] INFO: task ld:27693 blocked for more than 120 seconds.
[12040.202836]       Tainted: G         C      4.14.32-2-ARCH #1
[12040.206449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12040.213627] ld              D    0 27693  27692 0x00000000
[12040.217311] [<80a87c48>] (__schedule) from [<80a88418>] (schedule+0x3c/0xa0)
[12040.220971] [<80a88418>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[12040.223568] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[12040.228677] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[12040.233740] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[12040.238903] [<80233190>] (filemap_write_and_wait_range) from [<803db254>] (nfs_file_fsync+0x30/0x280)
[12040.244189] [<803db254>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[12040.249445] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[12040.252070] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[12040.254713] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)
[12101.639302] nfs: server ease not responding, still trying
[12101.639311] nfs: server ease not responding, still trying
[12101.639328] nfs: server ease not responding, still trying
[12142.599536] nfs: server ease not responding, still trying
[12143.639966] nfs: server ease not responding, still trying
[12143.900616] nfs: server ease OK
[12143.900633] nfs: server ease OK
[12143.909707] nfs: server ease OK
[12143.917548] nfs: server ease OK
[12143.917848] nfs: server ease OK
[12408.840196] nfs: server ease not responding, still trying
[12408.840200] nfs: server ease not responding, still trying
[12408.840228] nfs: server ease not responding, still trying
[12408.840248] nfs: server ease not responding, still trying
[12408.840274] nfs: server ease not responding, still trying
[12408.840412] INFO: task ld:29538 blocked for more than 120 seconds.
[12408.840421]       Tainted: G         C      4.14.32-2-ARCH #1
[12408.840424] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12408.840430] ld              D    0 29538  29537 0x00000000
[12408.840493] [<80a87c48>] (__schedule) from [<80a88418>] (schedule+0x3c/0xa0)
[12408.840514] [<80a88418>] (schedule) from [<8015c138>] (io_schedule+0x14/0x3c)
[12408.840541] [<8015c138>] (io_schedule) from [<80230be4>] (wait_on_page_bit+0x110/0x15c)
[12408.840559] [<80230be4>] (wait_on_page_bit) from [<80230d0c>] (__filemap_fdatawait_range+0xdc/0x128)
[12408.840574] [<80230d0c>] (__filemap_fdatawait_range) from [<80233190>] (filemap_write_and_wait_range+0x54/0x88)
[12408.840596] [<80233190>] (filemap_write_and_wait_range) from [<803db254>] (nfs_file_fsync+0x30/0x280)
[12408.840618] [<803db254>] (nfs_file_fsync) from [<802d5dcc>] (vfs_fsync+0x24/0x2c)
[12408.840635] [<802d5dcc>] (vfs_fsync) from [<8029d758>] (filp_close+0x2c/0x80)
[12408.840648] [<8029d758>] (filp_close) from [<8029d7cc>] (SyS_close+0x20/0x48)
[12408.840665] [<8029d7cc>] (SyS_close) from [<80107ce0>] (ret_fast_syscall+0x0/0x4c)

mkreisl · 2018-04-06T11:11:48Z

Same here, nothing changed. Still absolutely unstable, unreliable and completely unusable, the Pi3B+

JamesH65 · 2018-04-06T11:47:29Z

Odd, got one on my desk that is working fine. I think you forget to add "In the circumstances I am using it".

Anyway, issues still being looked at both here and at Microchip. There was a patch on the linux netdev list today for this chips driver (lan78xxx) for EEE which may well help, that will need to be tried. It's not like we are just sitting here twiddling our thumbs.

mkreisl · 2018-04-06T12:02:03Z

Anyway, issues still being looked at both here and at Microchip. There was a patch on the linux netdev list today for this chips driver (lan78xxx) for EEE which may well help, that will need to be tried. It's not like we are just sitting here twiddling our thumbs.

Seems you're getting fire under your a.. now 😄

IMO you're looking at the wrong place. LAN issues are only the top of the iceberg

I was already reporting, that system is still unstable after that dump microchip is powered off and all traffic is going over wlan device. System still freezing randomly. So, before I'm better informed, I would say the hole Pi3B+ design is a huge issue

pelwell · 2018-04-06T13:46:50Z

Some users who reported problems (and there honestly haven't been that many, but they are shouting loudly) have had success with adding sdram_freq=450 to config.txt. I would recommend anybody with stability problems (anything not obviously network related) to do the same.

mkreisl · 2018-04-06T14:00:07Z

Some users who reported problems (and there honestly haven't been that many, but they are shouting loudly) have had success with adding sdram_freq=450 to config.txt. I would recommend anybody with stability problems (anything not obviously network related) to do the same.

What's the default for Pi3B+. Cant find it here

pelwell · 2018-04-06T14:07:43Z

500 turbo, 400 normal

graysky2 · 2018-04-06T18:51:29Z

For reference, here is the script to automate the replicate compile jobs.

@pelwell - I have some hard data now. I ran the make benchmark writing out to the NFS share under 2 conditions, once with eee disabled and once with it enabled. There is a clear trend: eee is causing problems.

Running `make zImage`

Here are 9 or 10 replicates running make zImage with all times reported in minutes.

% histogram.py < eee_on_zimage 
# NumSamples = 9; Min = 9.77; Max = 29.07
# Mean = 18.573025; Variance = 86.102905; SD = 9.279165; Median 10.764777

vs

% histogram.py < eee_off_zimage
# NumSamples = 10; Min = 9.91; Max = 10.87
# Mean = 10.178291; Variance = 0.067048; SD = 0.258936; Median 10.166035

Several trends from these data:

Average time to compile is nearly double with eee enabled.
Standard deviation and variance is much worse with eee enabled (more unpredictable compile times).
Of the replicates, the longest compile time was observed with eee enabled and was nearly tripled.

Running `make modules`

Here are 9 or 10 replicates running make modules with all times reported in minutes.

% histogram.py < eee_on_modules 
# NumSamples = 9; Min = 25.21; Max = 67.19
# Mean = 51.765753; Variance = 218.212739; SD = 14.772026; Median 46.494882

vs

% histogram.py < eee_off_modules
# NumSamples = 9; Min = 26.33; Max = 49.60
# Mean = 33.328529; Variance = 42.429103; SD = 6.513763; Median 32.126122

The same trends from these data:

Average time to compile is about 1.5x longer with eee enabled.
Standard deviation and variance is much worse with eee enabled (more unpredictable compile times).
Of the replicates, the longest compile time was observed with eee enabled and was about 33% longer.

I am happy to test future patches/firmware, whatever to help optimize this. I think the make zImage benchmark will be sufficient for this since it's way faster than make modules and gives similar results. Just let me know.

EDIT: I see @popcornmix pushed raspberrypi/firmware@3aa8060 a few hours ago... time to retest?

graysky2 · 2018-04-06T19:00:05Z

@mkreisl - Please keep this issue on task... it's scoped for network writes not for general stability. Open a new task for that.

mkreisl · 2018-04-06T19:36:01Z

@graysky2 Oops, sorry for tainting your thread

graysky2 · 2018-04-06T21:08:41Z

A potential work-around: don't totally disable EEE, but set dtparam=tx_lpi_timer=10000 in /boot/config.txt which I did and found nearly identical results in the make zImage benchmark to having EEE totally disabled.

Again, values reported are compile times in minutes.

`dtparam=tx_lpi_timer=10000`

# NumSamples = 12; Min = 9.90; Max = 10.19
# Mean = 10.089245; Variance = 0.007412; SD = 0.086094; Median 10.119596

`dtparam=eee=off`

# NumSamples = 10; Min = 9.91; Max = 10.87
# Mean = 10.178291; Variance = 0.067048; SD = 0.258936; Median 10.166035

EDIT: see #2482 (comment) which demonstrates that the problem is still present.

mkreisl · 2018-04-06T21:18:26Z

All those EEE settings doesn't help for me, because my router/switch does not support EEE (most router with integrated switch does not support it) and I'm still getting nfs timeouts even if EEE is completely disabled, or I'm getting

Apr  6 16:11:14 kmxbilr2 kernel: [  837.345227] CIFS VFS: sends on sock aa2921c0 stuck for 15 seconds
Apr  6 16:11:14 kmxbilr2 kernel: [  837.345261] CIFS VFS: Error -11 sending data on socket to server
Apr  6 16:11:30 kmxbilr2 kernel: [  852.705497] CIFS VFS: sends on sock aa2921c0 stuck for 15 seconds
Apr  6 16:11:30 kmxbilr2 kernel: [  852.705532] CIFS VFS: Error -11 sending data on socket to server
Apr  6 16:11:30 kmxbilr2 kernel: [  852.833704] CIFS VFS: Free previous auth_key.response = 99685c00
Apr  6 16:11:55 kmxbilr2 kernel: [  878.305932] CIFS VFS: sends on sock aa29c380 stuck for 15 seconds
Apr  6 16:11:55 kmxbilr2 kernel: [  878.305972] CIFS VFS: Error -11 sending data on socket to server
Apr  6 16:12:11 kmxbilr2 kernel: [  893.666123] CIFS VFS: sends on sock aa29c380 stuck for 15 seconds
Apr  6 16:12:11 kmxbilr2 kernel: [  893.666156] CIFS VFS: Error -11 sending data on socket to server
Apr  6 16:12:26 kmxbilr2 kernel: [  909.026351] CIFS VFS: sends on sock aa29c380 stuck for 15 seconds
Apr  6 16:12:26 kmxbilr2 kernel: [  909.026382] CIFS VFS: Error -11 sending data on socket to server
Apr  6 16:12:41 kmxbilr2 kernel: [  924.386541] CIFS VFS: sends on sock aa29c380 stuck for 15 seconds
Apr  6 16:12:41 kmxbilr2 kernel: [  924.386573] CIFS VFS: Error -11 sending data on socket to server
Apr  6 16:12:41 kmxbilr2 kernel: [  924.484318] CIFS VFS: Free previous auth_key.response = a7910f00

if using SAMBA mount instead of NFS mount and after some time process that writes to share stucks and becomes uninterruptable 'D' state forever

graysky2 · 2018-04-06T21:42:23Z

@mkreisl - Are you booted into the same kernel and are you using the same firmware commit that I am?
Kernel: c2eb306
Firmware: raspberrypi/firmware@0dff9ec

mkreisl · 2018-04-06T21:52:45Z

@graysky2
Kernel: yes (XBian built based on bcm2709_defconfig)
Firmware: yes, excatly the same version

graysky2 · 2018-04-06T22:31:47Z

@mkreisl - not sure what to say then.... perhaps you have a different issue. As a control, have you tried the same stuff with another older RPi? Like a 2 or 3?

mkreisl · 2018-04-07T12:03:01Z

@graysky2 Sure, I'm running same procedure on Pi1, 2 and 3 (without +) since years without any problem.

mkreisl · 2018-04-07T12:26:49Z

@graysky2 In short words, I can explain what it does

mount network share (sshfs, nfs or samba)
create image on this share, big enough to backup data from root/boot fs into it
create partition in image (vfat for boot, btrfs for root)
copy boot partition into mounted image (loop device)
copy all subvolumes into mounted image (using btrfs send/receive or tar, both tested)
close everything and umount share

From 1 to 4 it always works, and within 5 it stucks always, but not on the same subvolume
And, it does not matter if source fs is on sd, usb disk or iSCSI target

TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. raspberrypi/linux#2449 raspberrypi/linux#2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

TSO seems to be having issues when packets are dropped and the remote end uses Selective Acknowledge (SACK) to denote that data is missing. The missing data is never resent, so the connection eventually stalls. There is a module parameter of enable_tso added to allow further debugging without forcing a rebuild of the kernel. #2449 #2482 Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

graysky2 changed the title ~~Network driver on RPi3 BPlus causing hung tasks when working on an NFS mount~~ Network driver on RPi3 B Plus causing hung tasks when working on an NFS mount Mar 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network driver on RPi3 B Plus causing hung tasks when working on an NFS mount #2482

Network driver on RPi3 B Plus causing hung tasks when working on an NFS mount #2482

graysky2 commented Mar 30, 2018

graysky2 commented Mar 30, 2018 •

edited

pelwell commented Mar 30, 2018

graysky2 commented Mar 30, 2018 •

edited

graysky2 commented Mar 31, 2018 •

edited

pelwell commented Mar 31, 2018

graysky2 commented Mar 31, 2018

mkreisl commented Mar 31, 2018

JamesH65 commented Mar 31, 2018 via email

Knoppix1 commented Apr 1, 2018

zmartell commented Apr 4, 2018

graysky2 commented Apr 4, 2018 •

edited

graysky2 commented Apr 5, 2018 •

edited

graysky2 commented Apr 6, 2018 •

edited

mkreisl commented Apr 6, 2018

JamesH65 commented Apr 6, 2018

mkreisl commented Apr 6, 2018 •

edited

pelwell commented Apr 6, 2018

mkreisl commented Apr 6, 2018

pelwell commented Apr 6, 2018

graysky2 commented Apr 6, 2018 •

edited

graysky2 commented Apr 6, 2018

mkreisl commented Apr 6, 2018

graysky2 commented Apr 6, 2018 •

edited

mkreisl commented Apr 6, 2018

graysky2 commented Apr 6, 2018

mkreisl commented Apr 6, 2018

graysky2 commented Apr 6, 2018

mkreisl commented Apr 7, 2018

mkreisl commented Apr 7, 2018

Network driver on RPi3 B Plus causing hung tasks when working on an NFS mount #2482

Network driver on RPi3 B Plus causing hung tasks when working on an NFS mount #2482

Comments

graysky2 commented Mar 30, 2018

graysky2 commented Mar 30, 2018 • edited

pelwell commented Mar 30, 2018

graysky2 commented Mar 30, 2018 • edited

graysky2 commented Mar 31, 2018 • edited

pelwell commented Mar 31, 2018

graysky2 commented Mar 31, 2018

mkreisl commented Mar 31, 2018

JamesH65 commented Mar 31, 2018 via email

Knoppix1 commented Apr 1, 2018

zmartell commented Apr 4, 2018

graysky2 commented Apr 4, 2018 • edited

graysky2 commented Apr 5, 2018 • edited

graysky2 commented Apr 6, 2018 • edited

mkreisl commented Apr 6, 2018

JamesH65 commented Apr 6, 2018

mkreisl commented Apr 6, 2018 • edited

pelwell commented Apr 6, 2018

mkreisl commented Apr 6, 2018

pelwell commented Apr 6, 2018

graysky2 commented Apr 6, 2018 • edited

Running make zImage

Running make modules

graysky2 commented Apr 6, 2018

mkreisl commented Apr 6, 2018

graysky2 commented Apr 6, 2018 • edited

dtparam=tx_lpi_timer=10000

dtparam=eee=off

mkreisl commented Apr 6, 2018

graysky2 commented Apr 6, 2018

mkreisl commented Apr 6, 2018

graysky2 commented Apr 6, 2018

mkreisl commented Apr 7, 2018

mkreisl commented Apr 7, 2018

graysky2 commented Mar 30, 2018 •

edited

graysky2 commented Mar 30, 2018 •

edited

graysky2 commented Mar 31, 2018 •

edited

graysky2 commented Apr 4, 2018 •

edited

graysky2 commented Apr 5, 2018 •

edited

graysky2 commented Apr 6, 2018 •

edited

mkreisl commented Apr 6, 2018 •

edited

graysky2 commented Apr 6, 2018 •

edited

Running `make zImage`

Running `make modules`

graysky2 commented Apr 6, 2018 •

edited

`dtparam=tx_lpi_timer=10000`

`dtparam=eee=off`