Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtd1296 stability issue #275

Open
bb-qq opened this issue Dec 4, 2022 · 53 comments
Open

rtd1296 stability issue #275

bb-qq opened this issue Dec 4, 2022 · 53 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@bb-qq
Copy link
Owner

bb-qq commented Dec 4, 2022

This issue summarizes the topic of the driver not working on the rtd1296 platform.

There are many reports of unstable operation in products using rtd1296. The typical symptoms reported are as follows

  • Driver installation succeeds without problems.
  • NAS works stably when receiving traffic
  • Connection is dropped when the NAS is sending traffic
  • May operate stably when linked at 1 Gbps
    • ethtool -s ethX speed 1000 duplex full 
  • The exact same symptoms occur with both r8152 and aqc111 drivers

There are also no reports of stable operation.

When disconnected, there seems to be something wrong at the USB level. This may indicate that the rtd1296 SoC may have some software or hardware issues with the xHCI host controller.

I am looking for a workaround for this problem, but so far have not found it. (I am also considering providing a standard usb-cdc driver separately.)
I will report here if any progress is made in the investigation.

Affected Products

  • DS420j, DS220j
  • RS819
  • DS418(no plus), DS218(no plus), DS218play, DS118
@andrus2049
Copy link

andrus2049 commented Dec 4, 2022

In my case the problem occurs only when transferring large files from NAS to PC.
I conversely can navigate the NAS directory structure for hours without the problem occurs.

One side note: after installation I immediately changed the MTU size to 9000 in the NAS LAN configuration.
After I found the problem, I tried to reset MTU to 1500 (default), also unchecking the manual setting, but after saving this setting it still remains enabled with the value of 9000, at least as shown in the GUI.
No way to reset. But it may be only a GUI issue.

Anyway, because of the evidence that only large transfers cause the problem, it might have something to do with the MTU?

@bb-qq
Copy link
Owner Author

bb-qq commented Dec 10, 2022

Anyway, because of the evidence that only large transfers cause the problem, it might have something to do with the MTU?

MTU may have something to do with this stability issue, but there are reports of problems occurring even with MTU values of 1500.

It might possibly relate to the hardware-assisted functions of the transmission on the NIC side.

I would like to know if disabling those features by the following command will make a difference in stability.

ethtool -K eth2 tso off
ethtool -K eth2 gso off
ethtool -K eth2 sg off

@andrus2049
Copy link

ethtool -K eth2 tso off
ethtool -K eth2 gso off
ethtool -K eth2 sg off

Thanks, going to try.

How to check which are the current values before issueing these commands?

And are these new values reversed upon NAS restart or are they persistent?

@NikitaOsotsky
Copy link

NikitaOsotsky commented Dec 12, 2022

I tested the connection with the suggested changes. It's a pity, but nothing has changed.
DS218 & rtl8156 2.5

@NikitaOsotsky
Copy link

I also tried ipv6 access https://[fe80::XXXX:XXXX:XXXX:XXXX]:5001/
I tried to download the file and it didn't help either

@cqwangding
Copy link

DS920+ 2.16.3-3 DSM7.x (reuploaded) lan rtd1296 (ks-is ks-714) https://ks-is.com/usb-3-1-ethernet-adapter-ks-is-ks-714?tag=2.5G

There are no problems with data transfer. Especially for a couple of hours I drove chia 100gb plots at a speed of 2.5. But there is another problem! When you pull out and put back the adapter, the driver turns off. The same goes for rebooting. Must be manually enabled in the web interface.

rtd1296 is the cpu for entry level synology, but not for DS920+.

@jebug29
Copy link

jebug29 commented Dec 24, 2022

ethtool -K eth2 tso off
ethtool -K eth2 gso off
ethtool -K eth2 sg off

Using a DS418 with a TRENDnet TUC-ET2G and the r8152 driver. This managed to get me 2.5Gb speeds briefly (and for longer than it would previously hold a connection at all), but the connection ultimately shut down. It does seem like I was getting 2500mbps upload and only about 1000mbps download.

@dlbomber1974
Copy link

dlbomber1974 commented Dec 30, 2022

Has there been any updates for DS418 with TRENDnet 2.5G USB-C to RJ-45? I got this and thought before I looked on here. My expectation was this was going to work. Yet I am seeing the issues with the drivers above. I ran the SSH after it failed and then saw the connection under network. Connected yet it was a 169. address . I am also using a TRENDnet 5-Port Unmanaged 2.5G PoE+switch with its own AC Adapter. After a reboot its completely gone. I had to run the RT App when I rebooted as it did not auto restart.

After assigning a static IP, I am now showing:
2500mbps Full Duplex 1500 MTU

I will put it to test with a few file transfers small and large tomorrow when I get up.

@bb-qq
Copy link
Owner Author

bb-qq commented Dec 31, 2022

Thanks all, it looks like disabling GSO and TSO didn't make much difference in stability.

These settings will revert after reboot. If they have any effect, please register them in the task scheduler or something so that they are configured at startup.

@Dayofwonder
Copy link

Dayofwonder commented Jan 6, 2023

Same here with my 218play. I tried two different adapters with 8152 chipset (none of the recommended adapters yet). After some research, I found on the internet that the error is very common. It can be seen well in the /var/log/kern.log.
Unfortunately, I could not find a solution to the problem. The error occurs with large amounts of data. The connection is interrupted for about 45 seconds, the NAS is then also not accessible via ping.

This is what the kern.log looks like:
2023-01-06T20:03:46+01:00 diskstation kernel: [470548.442279] r8152 3-1:1.0 eth1: Tx timeout
2023-01-06T20:03:46+01:00 diskstation kernel: [470548.448949] r8152 3-1:1.0 eth1: Tx status -2
2023-01-06T20:03:46+01:00 diskstation kernel: [470548.453431] r8152 3-1:1.0 eth1: Tx status -2
2023-01-06T20:03:46+01:00 diskstation kernel: [470548.457911] r8152 3-1:1.0 eth1: Tx status -2
2023-01-06T20:03:46+01:00 diskstation kernel: [470548.462397] r8152 3-1:1.0 eth1: Tx status -2
2023-01-06T20:03:48+01:00 diskstation kernel: [470550.434430] r8152 3-1:1.0 eth1: get_registers -108
2023-01-06T20:03:48+01:00 diskstation kernel: [470550.439501] r8152 3-1:1.0 eth1: get_registers -71
2023-01-06T20:03:48+01:00 diskstation kernel: [470550.444479] r8152 3-1:1.0 eth1: get_registers -71
2023-01-06T20:03:48+01:00 diskstation kernel: [470550.449441] r8152 3-1:1.0 eth1: get_registers -71
2023-01-06T20:03:48+01:00 diskstation kernel: [470550.454439] r8152 3-1:1.0 eth1: get_registers -71
2023-01-06T20:03:48+01:00 diskstation kernel: [470550.459401] r8152 3-1:1.0 eth1: get_registers -71

I operate it with 1 Gbps, not with 2,5 Gbps. So Your workaround ("May operate stably when linked at 1 Gbps") has no effect here.

Next week I will get the club 3D USB adapter with 8156 chipset. I will test it and report if the error also occurs.

As written before, there are some articles and forums about this topic, here are some of them, don't know if it could help in our environment:

https://portal.cloudunboxed.net/knowledgebase/55/How-to-fix-Realtek-USB-NIC-TX-timeout-issues.html

https://forum.odroid.com/viewtopic.php?f=212&t=45857

https://bugzilla.kernel.org/show_bug.cgi?id=198931

And by the way: I tested another adapter with 8169 chipset (together with Your 8152 driver). It worked, but with a poor performance (about 30 MB/s).

@dlbomber1974
Copy link

dlbomber1974 commented Jan 8, 2023

Coming back to test my NAS DS418 with my Trendnet 2.5gbe setup I saw where my connection showed connected still but I had no ping and the port was non-responsive. I saw the mac address but no even after several reboots, uninstall, reinstall etc... I read some other places where this has occurred so I had to dust off my old Linux had and found a short remedy for this. I did notice regardless of me setting MTU to 9000 in the GUI it still is showing up as MTU 1500.

sudo /etc/rc.network restart

My connection back up , IP now showing and pingable.
eth2 Link encap:Ethernet HWaddr 3C:8C:F8:60:0A:94
inet addr:192.168.1.201 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:59735 errors:0 dropped:0 overruns:0 frame:0
TX packets:39 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3237395 (3.0 MiB) TX bytes:9563 (9.3 KiB)

Now I will go forward with my testing.

@Dayofwonder
Copy link

Dayofwonder commented Jan 8, 2023

Coming back to test my NAS DS418 with my Trendnet 2.5gbe setup I saw where my connection showed connected still but I had no ping and the port was non-responsive. I saw the mac address but no even after several reboots, uninstall, reinstall etc... I read some other places where this has occurred so I had to dust off my old Linux had and found a short remedy for this. I did notice regardless of me setting MTU to 9000 in the GUI it still is showing up as MTU 1500.

sudo /etc/rc.network restart

Yes, this is ONE way. For me it works to stop and restart the installed driver in the package center ... But this isn't a workaround as long as I won't be able to download any file from the NAS.

@Dayofwonder
Copy link

Dayofwonder commented Jan 10, 2023

Just tested with Club 3D USB adapter. Test failed. Upload speed is a disaster (worst of all devices).

image

And downloads don't start at all.
2023-01-10T17:21:49+01:00 diskstation kernel: [806423.856275] r8152 3-1:1.0 eth1: Tx timeout
2023-01-10T17:21:49+01:00 diskstation kernel: [806423.862963] r8152 3-1:1.0 eth1: Tx status -2
2023-01-10T17:21:51+01:00 diskstation kernel: [806425.820037] r8152 3-1:1.0 eth1: get_registers -108
2023-01-10T17:21:51+01:00 diskstation kernel: [806425.825106] r8152 3-1:1.0 eth1: get_registers -71
2023-01-10T17:21:51+01:00 diskstation kernel: [806425.870979] xhci-hcd xhci-hcd.2.auto: URB transfer length is wrong, xHC issue? req. len = 4, act. len = 4294967292

So: None of my 3 different adapters do the trick.

@javitoalon
Copy link

I just tried using a powered Dell usb dock and same result: uploads are fine, with downloads breaks.
Weird thing is that internally you can still ping the 2.5Gbe but not from outside.
After ifconfing eth1 down and up, it comes to normal again.

@Romeo1984
Copy link

Confirmed - I too tried a powered USB doc with the same result.

@javitoalon
Copy link

I have seen people reporting DS218j is working fine with an Asus dongle, both for upload and downloads.
Is DS220j so different from it?
I know DS218j is Armada38x but still, it is kind of weird it works so bad for our DS220j with regular RTL8156B usbs.

@Romeo1984
Copy link

Yes. The Armada chipsets are reported to be working fine. The DS220J uses the Realtek RTD1296 chipset. Completely different from some of the other "J" models.
Affected Models using this chipset:
DS420j
DS220j
RS819
DS418
DS218
DS218play
DS118

Reference: https://kb.synology.com/en-global/DSM/tutorial/What_kind_of_CPU_does_my_NAS_have

@Romeo1984
Copy link

@bb-qq How can I "Help" with this? I have Linux experience, and two other Synology models: DS720+ and a DS214Play.

@sounds2k
Copy link

sounds2k commented Mar 4, 2023

I've got an ioSafe 218 (essentially a more rugged DS218). It's got 2GB of RAM - the same as many of the units which are stable, but have Intel CPUs. The driver crashes when trying to download (even fairly small ... 520MB), this is with it connected to the rear ports where the link comes up at 2.5GbE. However, if I plug it into the front port the link comes up at 1GbE and appears to be stable ... although of course that's no faster than the built-in NIC. I was able to download a 11GB file - but speed was poor (circa 35MB/s). Downloading the same file over the internal NIC (also at 1GbE) does over 100MB/s. An upload over the 2.5GbE (connected to the front port) topped out at just under 40MB/s, with it in the one of the rear ports I saw a peak of 95MB/s. So it would appear to be CPU/driver related, rather than RAM ... ?

@dlyubimov
Copy link

ethtool -K eth2 tso off
ethtool -K eth2 gso off
ethtool -K eth2 sg off

actually this did make it stable for me, except the speed was dropped below 1Mbps. (for me, it is eth1 with sudo of course).

It does show as a 2.5G link, dhcp doesn't work still. (DS218 on rear usb 3.0). Perhaps it may be useful to try these settings one by one?

image
image

@dlyubimov
Copy link

Hm... i take it back. switching scatter-gather off seems to also switch tso and gso off automatically. With scatter gather off, the speed drops, which seems to improve stability, but if it is run long enough, it eventually still gets stuck.

With scatter gather on speed is high, and switching tso or or gso off does not change anything, speed is high, but the crash is much easier to reproduce. (I have almost convinced myself to just drop $300 on a new DS220+ shell and move on).

@dlyubimov
Copy link

Played a little bit more with this on ds218. No parameters in the usbcore module made any difference (except for changing autosuspend which causes the interface go defunct right away if set !=-1). Changing ring made no difference either.

Same symptoms in the logs as in this thread: everything starts with a tx timeout. Driver tries to send a usb reset call in response to that, and everything goes downhill from there, errors in bulk tx callback, and eventually not being able to read the registers. The TX timeout value is set 5*HZ, and i wonder if that is materially different in this chipset, maybe it makes sense to bump it up for bulk frames.

Also noticed that the driver file has a slightly different line count from the 2.16.3 i had downloaded from the realtek site. I assume all changes from the source are benign, or I am mistaken and there are no changes.

@GorgiGR
Copy link

GorgiGR commented Apr 25, 2023

@bb-qq How can I "Help" with this? I have Linux experience, and two other Synology models: DS720+ and a DS214Play.

I don't know whether this helps...?

https://bugzilla.kernel.org/show_bug.cgi?id=198931#c96

@bb-qq
Copy link
Owner Author

bb-qq commented Apr 30, 2023

Thank you all for the information you have provided.
Unfortunately, I have not yet found a way to improve stability. The problem seems to be occurring at the lower layers and I have no idea where to start looking.

However, I noticed that the recently released DSM7.2beta has a new Linux kernel version.

$ head -6 ds.rtd1296-7.1/usr/local/sysroot/usr/include/linux/syno_autoconf.h
/*
 *
 * Automatically generated file; DO NOT EDIT.
 * Linux/arm64 4.4.180 Kernel Configuration
 *
 */

$ head -6 ds.rtd1296-7.2/usr/local/sysroot/usr/include/linux/syno_autoconf.h
/*
 *
 * Automatically generated file; DO NOT EDIT.
 * Linux/arm64 4.4.302 Kernel Configuration
 *
 */

The kernel update is unlikely to improve anything, but if anyone has tried it, please let me know.
Packages compatible with DSM 7.2 are available here.
https://github.com/bb-qq/r8152/releases/tag/2.16.3-4

@GorgiGR
Copy link

GorgiGR commented Apr 30, 2023

The kernel update is unlikely to improve anything, but if anyone has tried it, please let me know. Packages compatible with DSM 7.2 are available here. https://github.com/bb-qq/r8152/releases/tag/2.16.3-4

As mentioned in Comment #96 of the bugzilla link I also posted 5 days ago, a similar issue has been resolved when the comment author updated to Kernel 5.16 in Debian. It may be completely unrelated, but it is evidence that kernel updates sometimes may indeed solve issues like this. Unfortunately the DSM is still a long way from kernel 5.xx.

@dlyubimov
Copy link

dlyubimov commented May 2, 2023

I tried with DSM7.2 RC and the new 7.2 release of the driver. Unfortunately i must report that it was stable for copying about 3Gb before it failed in the same manner as before. As before, it is double failure, as usb reset command does not recover the driver state.

To be specific, I was running DSM 7.2-64551.

@dlbomber1974
Copy link

So I had to uninstall and reinstall the driver and now it says it failed to start. I have the latest update 7.2 which I am suspecting may have some changes that interferes with the install now. Can someone confirm? Now it keeps asking for me to repair it.

@Dayofwonder
Copy link

Dayofwonder commented May 22, 2023

So I had to uninstall and reinstall the driver and now it says it failed to start. I have the latest update 7.2 which I am suspecting may have some changes that interferes with the install now. Can someone confirm? Now it keeps asking for me to repair it.

Same here. I just installed DSM 7,2 final version, had to uninstall the old driver and tried to install [2.16.3-4] and now it asks me to repair the driver. Do we have to execute the SSH command once more to get the new driver working?
Upate: The driver works now, after applying this again:
sudo install -m 4755 -o root -D /var/packages/r8152/target/r8152/spk_su /opt/sbin/spk_su

@dlbomber1974
Copy link

dlbomber1974 commented May 22, 2023 via email

@Dayofwonder
Copy link

I tried the command once it failed as the instructions called for as well. It appears something changed in the new release. We need an updated script. Hopefully the dev sees our conversation.

As added in my post above, it works for me now. I installed the driver with an error, executed the sudo command and tried to install the driver again successfully. Connection is up now, I will test it later on.
By the way: My DSM 7.2 is the final version, not a beta or RC version.

@dlyubimov
Copy link

On 7.2-64561, unfortunately, it is still broken the same way. I did download the record 5+Gb before it crashed though, but alas.

@bb-qq
Copy link
Owner Author

bb-qq commented Jul 15, 2023

The Realtek driver from which this package is based has been updated to 2.17.1.
It does not seem to contain any changes that might improve the stability with the rtd1296, but you can try it if you like.
https://github.com/bb-qq/r8152/releases/tag/2.17.1

@andrus2049
Copy link

andrus2049 commented Jul 19, 2023

Synology DS218
DSM 7.2-64570 Update 1

r8152-rtd1296-2.17.1-1_7.2.spk

Asus USB-C2500 USB Type-A 2.5G Base-T Ethernet Adapter
https://www.asus.com/networking-iot-servers/wired-networking/wired-adapters/usb-c2500/

Installation as suggested
NAS rebooted after installation
Fixed IP assigned to the USB dongle
1500 MTU

Internal 1 GB ethernet also connected using a different static IP.

Tests (samba)
UPLOAD (virtual machines)

  1. 15 GB VM (1 15 GB file + small files)
  2. 15 GB VM (1 15 GB file + small files)
  3. 87 GB VM (1 55 GB file + 1 27 GB file + small files)
    all OK, upload speed around 160-170 MB/s

DOWNLOAD
tested download of many files of various sizes, I was successful when downloading files up to 22 MB, but larger files caused a network interface lock with no further file access, which could be resolved stopping and restarting the driver in the Package Center.

@NikitaOsotsky
Copy link

I'm not sure if it could be the cause, but recently a new update of the SMB service was released.

Version: 4.15.13-0871
(2023-09-26)
Fixed Issues

Fixed an issue where certain clients could cause continuous increase in SMB memory usage.

Perhaps this could have indirectly influenced it.

@perseus177
Copy link

Hi
Any update ?
I want to add 2.5Gbit to my DS218play, but seems to still not working or ?

@AndreasArvidsson
Copy link

Anyone know if this also effects RTD1619B? If I can't get it working on my current DS220j I might have better luck with a newer DS223j?

@Aurelien771
Copy link

Hi,
With my DS218 Play, and a Cable Matters 2.5 Gb Ethernet Adapter, same issues than you, Upload is okay, with good performances, but when i try to open a movie or download one thing on the Syno, I lost the connection... DSM 7.2.1-69057 Update 4, last driver RTD1296 : 2.17.1-1.
Any news?

@tsponzie
Copy link

tsponzie commented Mar 7, 2024

I have a DS418, just wanted to say, I tried the latest version with a Cable Matters 2.5Gb USB3 and same outcome! Unfortunately, I really need 2.5G so I might have to sell it - but came hoping for a fix (:

@newtrim
Copy link

newtrim commented Apr 25, 2024

Wondered if you could release a package to work on DSM 6.2, as I'm currently keeping that on my DS220j as it still supports wifi usb dongles. Wanted to use a 1Gb/s r8153 to up the nic speed. My DS220j has only 512MB ram and Synology do not recommand DSM 7 on this because of performance issues. DSM6.2 might help with the reliability issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests