Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silicon Labs Multiprotocol addon failing intermittently causing HAP, Thread and Zigbee issues #3192

Open
evakq8r opened this issue Aug 26, 2023 · 114 comments
Labels

Comments

@evakq8r
Copy link

evakq8r commented Aug 26, 2023

Describe the issue you are experiencing

Every few days to a week, the Silicon Labs Multiprotocol addon will stop communicating and indicate 'resource temporarily unavailable'. yet does not make mention of which resource this is. When this addon stops, it breaks my light integration (Nanoleaf via Thread using HAP) and Zigbee sensors via ZHA.

Restarting the addon doesn't fix the issue, nor does restarting home assistant. Usually requires a complete reboot of the host, and even then sometimes it will repeatedly indicate 'resource temporarily unavailable'.

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

Silicon Labs Multiprotocol

What is the version of the add-on?

2.3.2

Steps to reproduce the issue

Wish I knew, as it fails whenever it wants to (sometimes 3am in the morning, sometimes 5pm in the afternoon).

System Health information

System Information

version core-2023.8.4
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.4
os_name Linux
os_version 6.1.37
arch x86_64
timezone Australia/Adelaide
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 5000
Installed Version 1.32.1
Stage running
Available Repositories 1335
Downloaded Repositories 37
AccuWeather
can_reach_server ok
remaining_requests 14
Home Assistant Cloud
logged_in false
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud failed to load: unreachable
Home Assistant Supervisor
host_os Home Assistant OS 11.0.dev20230705
update_channel beta
supervisor_version supervisor-2023.08.3
agent_version 1.5.1
docker_version 23.0.6
disk_total 30.8 GB
disk_used 23.3 GB
healthy true
supported true
board ova
supervisor_api ok
version_api ok
installed_addons Samba share (10.0.2), Network UPS Tools (0.12.0), Matter Server (4.9.0), Silicon Labs Multiprotocol (2.3.2), Mosquitto broker (6.2.1), Zigbee2MQTT (1.32.2-1), Custom deps deployment (1.3.3), Home Assistant Google Drive Backup (0.111.1), ESPHome (2023.8.2), Terminal & SSH (9.7.1), Studio Code Server (5.10.1), PS5 MQTT (1.3.1), Whisper (1.0.0), Piper (1.3.2), Advanced SSH & Web Terminal (15.0.7), Silicon Labs Flasher (0.2.0)
Dashboards
dashboards 3
resources 24
views 11
mode storage
Recorder
oldest_recorder_run 16 August 2023 at 08:32
current_recorder_run 25 August 2023 at 20:34
estimated_db_size 933.92 MiB
database_engine sqlite
database_version 3.41.2

Anything in the Supervisor logs that might be useful for us?

Nothing relating to these issues.

Anything in the add-on logs that might be useful for us?

Logger: bellows.zigbee.application
Source: /usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py:643
First occurred: 10:49:25 (4040 occurrences)
Last logged: 17:00:02

ControllerApplication reset unsuccessful: ConnectionRefusedError(111, "Connect call failed ('172.30.32.1', 9999)")
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 640, in _reset_controller_loop
    await self._reset_controller()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 661, in _reset_controller
    await self.connect()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 133, in connect
    self._ezsp = await bellows.ezsp.EZSP.initialize(self.config)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 164, in initialize
    await ezsp.connect(use_thread=zigpy_config[conf.CONF_USE_THREAD])
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 181, in connect
    self._gw = await bellows.uart.connect(self._config, self, use_thread=use_thread)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 414, in connect
    protocol, _ = await _connect(config, application)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 385, in _connect
    transport, protocol = await zigpy.serial.create_serial_connection(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/serial.py", line 31, in create_serial_connection
    transport, protocol = await loop.create_connection(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1085, in create_connection
    raise exceptions[0]
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1069, in create_connection
    sock = await self._connect_sock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 973, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 628, in sock_connect
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 668, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.30.32.1', 9999)

Additional information

HAOS is hosted in a VM on my Unraid server. My Unraid server is still able to see and interact with the USB device(s) when HAOS fails to. For context:

image

I have the SkyConnect USB as well as a Sonoff Zigbee 3.0 USB Dongle Plus V2” (model “ZBDongle-E”). I mainly use the Skyconnect for everything, and the Sonoff is a recent purchase. Both are flashed with the latest version of the MultiPAN firmware.

I have tried both the stable and beta version of HAOS, no change to the outcome. Sometimes it works for several days, sometimes it fails > 5 times a day.

@evakq8r
Copy link
Author

evakq8r commented Aug 26, 2023

Update:

Only way to restore Home Assistant to a functional state was to unplug the SkyConnect from its USB port, replug, then reboot Unraid. Simply turning the VM for HAOS off and on nor just rebooting the host without the replug was not enough.

FWIW, I have a PCIe USB Hub card in the server, however the SkyConnect USB is in the motherboards USB2.0 port, rather than the hub, to try and limit the hub being a cause.

@spartandrew18
Copy link

spartandrew18 commented Sep 1, 2023

@agners I would like to report some similar experience that may not be entirely related to the same error you are receiving. However, I am having similar symptoms. Recently I added my HA SkyConnect to the Apple Border Router thread network via using the TLC Dataset provided from Nanoleaf. I have my devices all on one mesh network and things work great when they are up. For some reason my SkyConnect will just shut off. I believe it is being overloaded somehow. The only solution to the problem is to unplug and replug the device. Below is what Silicon Labs Multiprotocol addon logs look like.

core_silabs_multiprotocol_2023-09-01T00-59-20.370Z.log

BEFORE CRASH OF SKYCONNECT
image

AFTER CRASH OF SKYCONNECT
image

@agners
Copy link
Member

agners commented Sep 1, 2023

@spartandrew18 it seems that the otbr-agent is continuing to restart in a loop?

Do you maybe have the logs from when things first failed? You can get more logs through the CLI:

ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol

You can also redirect into a location where you have access to so you can upload the logs:

ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol > /config/multiprotocol.log

@evakq8r
Copy link
Author

evakq8r commented Sep 2, 2023

New Update:

Silicon Labs Multiprotocol addon just crashed:

WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #176 : Property-get::PROP_ENDPOINT_STATE aborted
WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #210 : Could not read endpoint state on the secondary
WARNING in function 'system_send_open_endpoint_ack' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #122 : Failed to acknowledge the open request for endpoint #12. Bad file descriptor

Repeats itself every 100ms, which caused my HA VM to hang in responsiveness. Had to force the VM to reboot from Unraid. The addon attempts to startup after the VM/HAOS reboot, but same thing occurs:

[23:30:56:630555] Info : New client connection using library v4.3.1.0
otbr-agent[212293]: 53d.04:20:11.933 [C] Platform------: CPC endpoint open failed
otbr-agent[212293]: 53d.04:20:11.933 [C] Platform------: mCpcBusSpeed = 115200
otbr-agent[212293]: 53d.04:20:11.933 [C] Platform------: Init() at radio.cpp:121: Failure
close cpc endpoint: Resource temporarily unavailable
[23:30:57] INFO: otbr-agent ended with exit code 1 (signal 0)...
OTBR_FORWARD_INGRESS  all opt    in * out wpan0  ::/0  -> ::/0  
Chain OTBR_FORWARD_INGRESS (0 references)
target     prot opt source               destination         
DROP       all      anywhere             anywhere             PKTTYPE = unicast
DROP       all      anywhere             anywhere             match-set otbr-ingress-deny-src src
ACCEPT     all      anywhere             anywhere             match-set otbr-ingress-allow-dst dst
DROP       all      anywhere             anywhere             PKTTYPE = unicast
ACCEPT     all      anywhere             anywhere            
otbr-ingress-deny-src
otbr-ingress-deny-src-swap
otbr-ingress-allow-dst
otbr-ingress-allow-dst-swap
OTBR_FORWARD_EGRESS  all opt    in wpan0 out *  ::/0  -> ::/0  
Chain OTBR_FORWARD_EGRESS (0 references)
target     prot opt source               destination         
ACCEPT     all      anywhere             anywhere            
[23:30:57] INFO: OTBR firewall teardown completed.
[23:30:57:782940] Info : Client disconnected

Thread and Zigbee (as expected) have failed:-

ZHA:

Logger: zigpy.application
Source: /usr/local/lib/python3.11/site-packages/zigpy/application.py:198
First occurred: 23:25:23 (8 occurrences)
Last logged: 23:30:49

Couldn't start application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 195, in startup
    await self.connect()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 138, in connect
    await ezsp.connect(use_thread=self.config[CONF_USE_THREAD])
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 179, in connect
    self._gw = await bellows.uart.connect(self._config, self, use_thread=use_thread)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 414, in connect
    protocol, _ = await _connect(config, application)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/bellows/uart.py", line 385, in _connect
    transport, protocol = await zigpy.serial.create_serial_connection(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/zigpy/serial.py", line 36, in create_serial_connection
    transport, protocol = await loop.create_connection(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1085, in create_connection
    raise exceptions[0]
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1069, in create_connection
    sock = await self._connect_sock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 973, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 628, in sock_connect
    return await fut
           ^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 668, in _sock_connect_cb
    raise OSError(err, f'Connect call failed {address}')
ConnectionRefusedError: [Errno 111] Connect call failed ('172.30.32.1', 9999)

HomeKit Bridge (for the Thread lights via HAP) do not show any useful logs, however all 30 integrations are in constant reload mode.

Rebooting the VM did not resolve the issue. Unraid can still see the NabuCasa Skyconnect plugged in and responding:

image

lsusb -d 10c4:ea60 -v

Bus 005 Device 002: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x10c4 Silicon Labs
  idProduct          0xea60 CP210x UART Bridge
  bcdDevice            1.00
  iManufacturer           1 Nabu Casa
  iProduct                2 SkyConnect v1.0
  iSerial                 3 <redacted>
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0020
    bNumInterfaces          1
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              100mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
Device Status:     0x0000
  (Bus Powered)

The fix for this at the moment (this time around anyway) was a full server reboot, but it took two attempts and several manual reboots of the addon to make it work and talk with the USB, which is less than ideal.

For what it's worth, the SkyConnect USB is plugged directly into the motherboards USB port (ASRock Rack X570-2L2T) using the included extension cable that come with the unit, to avoid interference.

Any help with this would be appreciated. Thanks.

@spartandrew18
Copy link

@spartandrew18 it seems that the otbr-agent is continuing to restart in a loop?

Do you maybe have the logs from when things first failed? You can get more logs through the CLI:

ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol

You can also redirect into a location where you have access to so you can upload the logs:

ha host logs --boot 0 --lines 100000 -t addon_core_silabs_multiprotocol > /config/multiprotocol.log

multiprotocol.log

@agners
Copy link
Member

agners commented Sep 4, 2023

Thanks for the logs. Some interesting snippets:

Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: radio tx timeout
Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: RCP failure detected
Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: Trying to recover (1/100)
Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Wait for response timeout
Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: RCP failure detected
Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Trying to recover (2/100)
Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Wait for response timeout
Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: RCP failure detected
Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Trying to recover (3/100)
Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Wait for response timeout
Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: RCP failure detected
Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Trying to recover (4/100)
Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Wait for response timeout
Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: RCP failure detected
Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Trying to recover (5/100)
Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Wait for response timeout
Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: RCP failure detected
Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Trying to recover (6/100)
Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12
Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE
Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394752] WARNING : In function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12
Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394768] WARNING : In function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE
Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:495051] Info : Client disconnected

It seems the OTBR detects that the radio isn't communicating anymore (it can't receive frames any longer). The CPC daemon similarly detects a "retransmit timeout"

WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12

The OTBR then ends up in a restart loop, presumably for the same reason (the radio stopped responding):

Aug 30 04:32:58 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58] INFO: Starting otbr-agent...
...
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58:732529] Info : New client connection using library v4.3.1.0
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #176 : Property-get::PROP_ENDPOINT_STATE aborted
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #210 : Could not read endpoint state on the secondary
...
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: close cpc endpoint: Success
Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:59] INFO: otbr-agent ended with exit code 1 (signal 0)...

The second error in the log looks very similar.

@spartandrew18
Copy link

spartandrew18 commented Sep 4, 2023

Thanks for the logs. Some interesting snippets:


Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: radio tx timeout

Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: RCP failure detected

Aug 30 04:22:33 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:39.085 [W] Platform------: Trying to recover (1/100)

Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Wait for response timeout

Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: RCP failure detected

Aug 30 04:22:37 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:43.186 [W] Platform------: Trying to recover (2/100)

Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Wait for response timeout

Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: RCP failure detected

Aug 30 04:22:41 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:47.282 [W] Platform------: Trying to recover (3/100)

Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Wait for response timeout

Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: RCP failure detected

Aug 30 04:22:45 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:51.378 [W] Platform------: Trying to recover (4/100)

Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Wait for response timeout

Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: RCP failure detected

Aug 30 04:22:49 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:55.474 [W] Platform------: Trying to recover (5/100)

Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Wait for response timeout

Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: RCP failure detected

Aug 30 04:22:53 homeassistant addon_core_silabs_multiprotocol[474]: otbr-agent[306]: 06:19:59.570 [W] Platform------: Trying to recover (6/100)

Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12

Aug 30 04:22:55 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE

Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394752] WARNING : In function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12

Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:394768] WARNING : In function 'core_set_endpoint_in_error' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1125 : Setting ep#12 in error, new state: SL_CPC_STATE_ERROR_DESTINATION_UNREACHABLE

Aug 30 04:22:56 homeassistant addon_core_silabs_multiprotocol[474]: [00:22:55:495051] Info : Client disconnected

It seems the OTBR detects that the radio isn't communicating anymore (it can't receive frames any longer). The CPC daemon similarly detects a "retransmit timeout"


WARNING in function 're_transmit_timeout' in file /usr/src/cpc-daemon/server_core/core/core.c at line #1801 : Retransmit limit reached on endpoint #12

The OTBR then ends up in a restart loop, presumably for the same reason (the radio stopped responding):


Aug 30 04:32:58 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58] INFO: Starting otbr-agent...

...

Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:58:732529] Info : New client connection using library v4.3.1.0

Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #176 : Property-get::PROP_ENDPOINT_STATE aborted

Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: WARNING in function 'property_get_single_endpoint_state_and_reply_to_pending_open_callback' in file /usr/src/cpc-daemon/server_core/system_endpoint/system_callbacks.c at line #210 : Could not read endpoint state on the secondary

...

Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: close cpc endpoint: Success

Aug 30 04:32:59 homeassistant addon_core_silabs_multiprotocol[474]: [00:32:59] INFO: otbr-agent ended with exit code 1 (signal 0)...

The second error in the log looks very similar.

Is it something to do with my hardware setup? Or is it possible it can fixed via software?

@evakq8r
Copy link
Author

evakq8r commented Sep 7, 2023

Further update; new crash, same outcome but slight variation in the logs:

ZHA:-

Logger: bellows.zigbee.application
Source: runner.py:186
First occurred: 10:25:19 (5 occurrences)
Last logged: 10:26:19

Watchdog heartbeat timeout: TimeoutError()

Silicon Labs Multiprotocol:-

[10:54:39] INFO: Starting zigbeed...
[10:54:39] INFO: Setup OTBR firewall...
[10:54:39] INFO: Starting otbr-agent...
otbr-agent[95918]: [NOTE]-AGENT---: Running 0.3.0
otbr-agent[95918]: [NOTE]-AGENT---: Thread version: 1.3.0
otbr-agent[95918]: [NOTE]-AGENT---: Thread interface: wpan0
otbr-agent[95918]: [NOTE]-AGENT---: Radio URL: spinel+cpc://cpcd_0?iid=2&iid-list=0
otbr-agent[95918]: [NOTE]-ILS-----: Infra link selected: enp5s0
otbr-agent[95918]: [INFO]-NCP-----: OpenThread log level changed to 5
otbr-agent[95918]: 50d.15:48:26.792 [C] Platform------: CPC init failed. Ensure radio-url argument has the form 'spinel+cpc://cpcd_0?iid=<1..3>'
otbr-agent[95918]: 50d.15:48:26.792 [C] Platform------: Init() at cpc_interface.cpp:96: Failure
close cpc endpoint: Resource temporarily unavailable
[10:54:41] INFO: otbr-agent ended with exit code 1 (signal 0)...

Specifically the following line:

otbr-agent[95918]: 50d.15:48:26.792 [C] Platform------: CPC init failed. Ensure radio-url argument has the form 'spinel+cpc://cpcd_0?iid=<1..3>

I don't believe have seen this specific error, but everything else is the same.

Restarting the SLM addon shows the following logs:

s6-rc: info: service mdns: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service mdns successfully started
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/check-cpcd-shm.sh
[11:20:14] INFO: Starting mDNS Responder...
Default: mDNSResponder (Engineering Build) (Aug 23 2023 18:27:14) starting
cont-init: info: /etc/cont-init.d/check-cpcd-shm.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service banner: starting
-----------------------------------------------------------
 Add-on: Silicon Labs Multiprotocol
 Zigbee and OpenThread multiprotocol add-on
-----------------------------------------------------------
 Add-on version: 2.3.2
 You are running the latest version of this add-on.
 System: Home Assistant OS 11.0.dev20230705  (amd64 / qemux86-64)
 Home Assistant Core: 2023.9.0
 Home Assistant Supervisor: 2023.08.3
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[11:20:15] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[11:20:15] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2023-09-07 11:20:15 homeassistant universal_silabs_flasher.flash[185] INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.3.1', ezsp_version=None, ot_rcp_version=None, fw_type=<FirmwareImageType.RCP_UART_802154: 'rcp-uart-802154'>, baudrate=460800)
2023-09-07 11:20:15 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2023-09-07 11:20:17 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.CPC at 460800 baud
2023-09-07 11:20:21 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.CPC at 115200 baud
2023-09-07 11:20:26 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.CPC at 230400 baud
2023-09-07 11:20:30 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.EZSP at 115200 baud
2023-09-07 11:20:35 homeassistant universal_silabs_flasher.flasher[185] INFO Probing ApplicationType.SPINEL at 460800 baud
Error: Failed to probe running application type
s6-rc: warning: unable to start service universal-silabs-flasher: command exited 1
/run/s6/basedir/scripts/rc.init: warning: s6-rc failed to properly bring all the services up! Check your logs (in /run/uncaught-logs/current if you have in-container logging) for more information.
/run/s6/basedir/scripts/rc.init: fatal: stopping the container.
s6-rc: info: service mdns: stopping
s6-rc: info: service banner: stopping
Default: mDNSResponder (Engineering Build) (Aug 23 2023 18:27:14) stopping
s6-rc: info: service banner successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
[11:20:40] INFO: mDNS ended with exit code 4 (signal 0)...
s6-rc: info: service mdns successfully stopped

The fix now is a physical unplug and replug of the USB into the server, then a full host restart. Simple reboot of the VM did not change the outcome.

It also looks like others are reporting similar issues (#3193) so it is not an isolated incident.

Devs, please advise what further information you require to resolve/investigate this. Physical host restarts to resolve a VM issue are far from ideal (in my case anyway) and we'd like to be able to assist to resolve this problem as soon as practical.

@MattWestb
Copy link

Have some with this problem one ESP module and can try using it as serial port over Ethernet ?
It was working before but have not testing latest versions but it shall being fixed if having one separate RCP module connected to the ESP.

@puddly
Copy link
Collaborator

puddly commented Sep 14, 2023

If you would like to try experimental firmware for the SkyConnect that should eliminate the need for physical resets (source NabuCasa/silabs-firmware-builder#33), I've attached a pre-compiled copy here: skyconnect_rcp-uart-802154_4.3.1_wd.gbl.zip (zipped so it can be uploaded to GitHub, unzip it first).

If you have any step-by-step information for how to reproduce this issue, it would be very useful!

(CC @kylev, from #3193)

@kylev
Copy link

kylev commented Sep 14, 2023

Not much interesting about my setup (city setting, very busy 2.4Ghz band). I'm on 2023.9.2 with a mix of Wifi (Sengled, Cync, generic), zigbee (misc mfg), and 3 Nanoleaf bulbs.

Flashing with the web tool seemed to work, but the probed GBL metadata seems identical during boot. Should your build appear differently?

NabuCasaMetadata(metadata_version=1, sdk_version='4.3.1', ezsp_version=None, ot_rcp_version=None, fw_type=<FirmwareImageType.RCP_UART_802154: 'rcp-uart-802154'>, baudrate=460800)

simulti_firmware_3192.log

@puddly
Copy link
Collaborator

puddly commented Sep 14, 2023

Thanks!

The metadata will be identical, especially when probing. This also makes sure the addon doesn't re-flash the bundled firmware.

@evakq8r
Copy link
Author

evakq8r commented Sep 15, 2023

If you would like to try experimental firmware for the SkyConnect that should eliminate the need for physical resets (source NabuCasa/silabs-firmware-builder#33), I've attached a pre-compiled copy here: skyconnect_rcp-uart-802154_4.3.1_wd.gbl.zip (zipped so it can be uploaded to GitHub, unzip it first).

If you have any step-by-step information for how to reproduce this issue, it would be very useful!

(CC @kylev, from #3193)

I've installed the experimental firmware and so far so good. Unfortunately I haven't had the failures again since my last batch of logs, but nothing has changed... so the inconsistencies aren't helping resolve the issue.

As for steps on replicating the issue, there-in lies the problem; I don't actually do anything to trigger the problems, they just happen. I've setup additional logging from the unRAID host in the event the registered USB ID disconnects or does something weird from its perspective, but HA doesn't give any more indication of what caused the failure other than what I provided on original report.

@kylev
Copy link

kylev commented Sep 15, 2023

Overnight, the container went into the cpcdcrash-loop with Property-get::PROP_ENDPOINT_STATE aborted and Property-get::PROP_ENDPOINT_STATE aborted errors. I restarted the container, and it entered the probing instead of detection stage and failed.

The new firmware didn't seem to have an effect.

@puddly
Copy link
Collaborator

puddly commented Sep 15, 2023

Perfect, thank you for the feedback. The watchdog isn't integrated tightly into CPC so it looks like just the CPC part is crashing, not the whole firmware. I'll post an updated one later next week.

@kylev
Copy link

kylev commented Sep 15, 2023

Just to ensure clarity: I had to un-plug and re-plug the SkyConnect. The new firmware didn't seem to have an effect.

Whatever initially breaks with the dongle triggers the cpcd crash loop but leaves the container up. Subsequently restarting the whole container results in the firmware flasher attempting to probe the device at multiple baudrates, failing, and crashing the container (and triggering watchdog restart).

After re-plugging the dongle, the container comes up on the next watchdog restart.

I suspect if you tweak the cpcd finish script to crash the container we'd see 1 cpcd crash followed by many probe failures.

@kylev
Copy link

kylev commented Sep 16, 2023

Seems reproducible: same symptoms this morning (crash loop cpcd, restarting container results in firmware flasher probe failures, replug SkyConnect fixes it).

I've just re-flashed the SkyConnect with your build again, just to ensure I didn't make an error last time. Will comment if anything changes.

@spartandrew18
Copy link

Any progress made on the fix?

@theblackhole
Copy link

theblackhole commented Oct 10, 2023

To everyone having this problem, are you all using Nanoleaf devices? Because in my case, it's been a while since I had to restart the addon + unplug-replug my skyconnect.
I don't know if it's pure luck because it's random, if it's the latest Nanoleaf firmware update (3.5.41) or the various HA updates I installed but it's stable a the moment.

@DunklerPhoenix
Copy link

Nope. Not a single nanoleaf. Only hue, aqara, ikea and some thread devices.

@evakq8r
Copy link
Author

evakq8r commented Oct 11, 2023

To everyone having this problem, are you all using Nanoleaf devices? Because in my case, it's been a while since I had to restart the addon + unplug-replug my skyconnect. I don't know if it's pure luck because it's random, if it's the latest Natoleaf firmware update (3.5.41) or the various HA updates I installed but it's stable a the moment.

Unfortunately that firmware only applies to the Matter over Thread Nanoleaf bulbs. The HomeKit over Thread (non-Matter) bulbs latest firmware is 1.6.49.

I've started to add Matter over Thread Nanoleaf bulbs to my mix of things, and out of the box they are 3.2.0, which the Android Nanoleaf indicates has a 'critical firmware upgrade', pointing to 3.5.37 as the FW in question. They all default to 3.5.41 once upgraded though.

@theblackhole
Copy link

theblackhole commented Oct 12, 2023

it's been a while since I had to restart the addon + unplug-replug my skyconnect.

I spoke too soon, I had to restart/unplug-replug today.

@evakq8r
Copy link
Author

evakq8r commented Oct 13, 2023

it's been a while since I had to restart the addon + unplug-replug my skyconnect.

I spoke too soon, I had to restart/unplug-replug today.

Mine also failed yesterday as well. I've now split my Skyconnect to do just OTBR work (Sonoff ZBDongle-E to do the Zigbee work), as well as put both USB devices on extension cables to minimise interference.

Skyconnect crashed again about 6 hours after the split duties, and needed a replug to a different USB port to come back to life.

@puddly
Copy link
Collaborator

puddly commented Oct 13, 2023

Can you describe the Zigbee and Thread devices you have on your networks? Any Zigbee Green Power?

I've been able to replicate a crash and will try to get a firmware out that possibly mitigates it but there may be multiple concurrent bugs here causing issues.

@DunklerPhoenix
Copy link

DunklerPhoenix commented Oct 13, 2023

I have the same crashes and have:

Zigbee (values from ZHA):
1x FYRTUR block-out roller blind (IKEA of Sweden)
1x LCT012 (Hue)
3x LCT015
1x lumi.airmonitor.acn01 (aqara)
2x lumi.curtain.acn002 (aqara)
5x lumi.magnet.acn001 (aqara)
1x lumi.remote.b1acn01 (aqara)
1x lumi.sen_ill.mgl01 (xiaomi)
1x lumi.sensor_cube (aqara)
3x lumi.sensor_motion.aq2 (aqara)
2x lumi.vibration.aq1 (aqara)
6x lumi.weather (aqara)
1x Remote Control N2 (IKEA of Sweden)
1x RWL021 (hue)
4x TRADFRI bulb GU10 CWS 345lm (IKEA of Sweden)
2x TRADFRI on/off switch (IKEA of Sweden)
1x TRADFRI open/close remote (IKEA of Sweden)
1x TRADFRI remote control (IKEA of Sweden)
4x TRADFRI Signal Repeater (IKEA of Sweden)
12x TRADFRIbulbE14WWclear250lm (since 10 months not at power, but paired) (IKEA of Sweden)
1x TRADFRIbulbE27WSglobeopal1055lm (IKEA of Sweden)
1x TRADFRIbulbE27WWclear250lm (IKEA of Sweden)

Thread (via Homekit Controller):
8x Eve Thermo 20EBP1701 (eve systems)
-> Takes sometimes hours to reconnect after restart of sky connect

Wifi (via Homekit Controller):
1x Netatmo Main Module
1x Indoor Module
1x Outdoor Module

@evakq8r
Copy link
Author

evakq8r commented Oct 13, 2023

Think I'll need to close this issue out as my network topology has changed substantially since first logging. I've now split OTBR and Zigbee duties into two devices (OTBR being Skyconnect, Zigbee being a Sonoff ZBDongle-E until my second Skyconnect turns up). Skyconnect is now OpenThreadRCP only, Sonoff is now Zigbee only.

To answer your question though;

Zigbee:
18x Aqara motion sensors
1x Door lock
2x Zigbee magnetic door sensors
5x Zigbee USB Repeaters

Thread:
5x Nanoleaf Matter over Thread bulbs
26x Nanoleaf Homekit over Thread bulbs

I'm in the midst of replacing the HK bulbs with MoT, for various reasons.

I've removed the multiprotocol addon and reconfigured my Zigbee channel to be elsewhere (multiprotocol required the channels to be the same), and reinstalled the standalone OTBR addon in HACS.

@MattWestb
Copy link

@DunklerPhoenix Id the Eve Thermo 20EBP1701 one sleeper / reduced thread device that can only being one child or one full future thread device that can being router / leader ?

@DunklerPhoenix
Copy link

I'm not sure if I understand you correctly.

The Eve Thermo is just a child and cant be used as a router or leader.
It's battery powered and so I think a reduced thread device.
But I'm not sure. sry

https://www.evehome.com/de/eve-thermo

@puddly
Copy link
Collaborator

puddly commented Jan 6, 2024

All current versions are affected, unfortunately, so there's no version that you can roll back to. You can try out the firmware I have here but it doesn't include all of the changes in the addon so it would be easiest to wait for a few days.

@Wardy125
Copy link

Wardy125 commented Jan 6, 2024

No problem, thank you :)

@puddly
Copy link
Collaborator

puddly commented Jan 30, 2024

I believe the latest version of the addon (2.4.4) fixes most of the problems people are facing. Let me know how it works for you.

@davidanthoff
Copy link

I've started using the custom one-off version that was posted here a while ago, and then updated to the newer versions of the add-on, and ever since that I've not had any bugs or problems. Before that things regularly broke. Well done and thanks :)

@Wardy125
Copy link

I've started using the custom one-off version that was posted here a while ago, and then updated to the newer versions of the add-on, and ever since that I've not had any bugs or problems. Before that things regularly broke. Well done and thanks :)

My exact experience, thanks for the fix!

@satmandu
Copy link

I've started using the custom one-off version that was posted here a while ago, and then updated to the newer versions of the add-on, and ever since that I've not had any bugs or problems. Before that things regularly broke. Well done and thanks :)

My exact experience, thanks for the fix!

Does this mean you're using the default firmware the plugin provides now?

@Wardy125
Copy link

I've started using the custom one-off version that was posted here a while ago, and then updated to the newer versions of the add-on, and ever since that I've not had any bugs or problems. Before that things regularly broke. Well done and thanks :)

My exact experience, thanks for the fix!

Does this mean you're using the default firmware the plugin provides now?

Well, I’ve updated the add-on to the latest official (2.4.4), and have auto-flash enabled.

I haven’t manually flashed anything since the custom one that puddly linked to in the earlier comments.

Is there a way I can query the firmware to be sure?

@satmandu
Copy link

In theory with auto flash enabled it should enable the current firmware, and it looks like 4.4.0 is the current firmware from a week or so ago?

@puddly
Copy link
Collaborator

puddly commented Jan 31, 2024

If you have automatic flashing enabled, the exact firmware in the addon will be installed. The firmware version string incorporates the Git commit and the tree hash.

@davidanthoff
Copy link

Yes, I have auto flashing on, so I'm no longer on the custom firmware but just the add-on firmware. Things work great.

@satmandu
Copy link

If you have automatic flashing enabled, the exact firmware in the addon will be installed. The firmware version string incorporates the Git commit and the tree hash.

Sadly, autoflashing doesn't work for my skyconnect device:

 Add-on version: 2.4.4
 You are running the latest version of this add-on.
 System: Home Assistant OS 11.5.rc2  (amd64 / generic-x86-64)
 Home Assistant Core: 2024.1.5
 Home Assistant Supervisor: 2024.01.0
-----------------------------------------------------------
 Please, share the above information when looking for help
 or support in, e.g., GitHub, forums or the Discord chat.
-----------------------------------------------------------
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[19:55:43] INFO: Checking /dev/ttyUSB0 identifying CP2102N USB to UART Bridge Controller from Silicon Labs.
[19:55:43] WARNING: No firmware found for the selected device, assuming firmware is installed.
s6-rc: info: service universal-silabs-flasher successfully started
s6-rc: info: service cpcd-config: starting

Is there a specific firmware I should install manually using the web flasher at https://skyconnect.home-assistant.io/firmware-update/ ? The one in this thread? Something from https://github.com/NabuCasa/silabs-firmware/tree/main/RCPMultiPAN/beta ?

@puddly
Copy link
Collaborator

puddly commented Jan 31, 2024

@satmandu You have one of the very rare batch of SkyConnects that don't identify as a SkyConnect.

You can fix it by installing the SkyConnect CP2102N Programmer addon from the development repo (https://github.com/home-assistant/addons-development/) and running it. The SkyConnect should then be identified properly after you unplug it and plug it back in.

@satmandu
Copy link

@puddly Thanks! That appears to have changed the USB ID....

s6-rc: info: service universal-silabs-flasher: starting
[20:20:18] INFO: Checking /dev/ttyUSB0 identifying SkyConnect v1.0 from Nabu Casa.
[20:20:18] INFO: Starting universal-silabs-flasher with /dev/ttyUSB0
2024-01-30 20:20:19.071 macmini71 universal_silabs_flasher.flash INFO Extracted GBL metadata: NabuCasaMetadata(metadata_version=1, sdk_version='4.3.1', ezsp_version='7.3.1.0', ot_rcp_version='SL-OPENTHREAD/2.3.1.0_GitHub-e6df00dd6' (2.3.1.0), cpc_version='4.3.1-4f7f9e99-dirty-de58d93e' (4.3.1), fw_type=<FirmwareImageType.RCP_UART_802154: 'rcp-uart-802154'>, baudrate=460800)
2024-01-30 20:20:19.071 macmini71 universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-01-30 20:20:21.077 macmini71 universal_silabs_flasher.flasher INFO Probing ApplicationType.CPC at 460800 baud
2024-01-30 20:20:21.092 macmini71 universal_silabs_flasher.flasher INFO Detected ApplicationType.CPC, version '4.3.2' at 460800 baudrate (bootloader baudrate None)
2024-01-30 20:20:21.092 macmini71 universal_silabs_flasher.flash INFO Firmware version '4.3.1-4f7f9e99-dirty-de58d93e' (4.3.1) does not match expected version '4.3.2'
2024-01-30 20:20:21.608 macmini71 universal_silabs_flasher.flasher INFO Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud
2024-01-30 20:20:22.626 macmini71 universal_silabs_flasher.flasher INFO Detected bootloader version '2.1.1'
2024-01-30 20:20:22.626 macmini71 universal_silabs_flasher.flasher INFO Detected ApplicationType.GECKO_BOOTLOADER, version '2.1.1' at 115200 baudrate (bootloader baudrate 115200)
NabuCasa_SkyConnect_RCP_v4.3.1_rcp-uart-hw-802154_460800.gbl
s6-rc: info: service universal-silabs-flasher successfully started
s6-rc: info: service cpcd-config: starting
[20:20:43] INFO: Using known baudrate of 460800 for cpcd!

Is this what I should be seeing?

@HFeenstra
Copy link

As the topic mentions the Multiprotocol is causing issues here as well on my SkyConnect. I have been stopping this Multiprotocol and went back to just Zigbee this morning (Before the protocol crashes); had to reconnect/re-add most of the Zigbee devices. Luckily some did come up.

So as for now it is best not to use the Multiprotocol on the SkyConnect??

Before I went back to just the Zigbee on the stick:
I also restored a backup of 2 weeks ago and do not know if I can keep on upgrading to newer HA versions and other addons. What is the current status, can we in general upgrade to latest versions??

@theblackhole
Copy link

@puddly

I believe the latest version of the addon (2.4.4) fixes most of the problems people are facing. Let me know how it works for you.

In my case, while the 2.4.4 seems to solve the 2.4.3 fatal crashes, it also seems to make my Thread network worse than any previous versions (I only have Matter over Thread devices ATM but I want to believe in multiprotocol hahaha 😄)

2.4.3 Thread instability logs
otbr-agent[314]: 21:32:45.900 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1070, chksum:8710, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-67.625
otbr-agent[314]: 21:32:45.900 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:ca79:d165:6707:82bf]:5540
otbr-agent[314]: 21:32:45.901 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:59461
otbr-agent[314]: 21:33:58.048 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[314]: 21:33:58.184 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[314]: 21:33:58.705 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[314]: 21:33:58.760 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[314]: 21:33:59.395 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[314]: 21:33:59.453 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[314]: 21:34:03.907 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1063, chksum:b9a5, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-74.0
otbr-agent[314]: 21:34:03.911 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[314]: 21:34:03.911 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:41729
otbr-agent[314]: 21:34:06.904 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1071, chksum:088b, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-66.0
otbr-agent[314]: 21:34:06.908 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:ca79:d165:6707:82bf]:5540
otbr-agent[314]: 21:34:06.908 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:41729
otbr-agent[314]: 21:34:06.908 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1072, chksum:85ad, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-73.25
otbr-agent[314]: 21:34:06.908 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[314]: 21:34:06.908 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:41729
otbr-agent[314]: 21:34:08.906 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1071, chksum:c35e, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-73.625
otbr-agent[314]: 21:34:08.910 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[314]: 21:34:08.911 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:41729
otbr-agent[314]: 21:34:12.909 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1080, chksum:b857, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-74.0
otbr-agent[314]: 21:34:12.913 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[314]: 21:34:12.914 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:41729
otbr-agent[314]: 21:34:15.899 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1087, chksum:6bca, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-74.25
otbr-agent[314]: 21:34:15.924 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[314]: 21:34:15.925 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:41729
otbr-agent[314]: 23:09:11.373 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:fb9d, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[314]: 23:09:11.431 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[314]: 23:09:11.431 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788
otbr-agent[314]: 1d.00:05:53.198 [W] Platform------: Handle transmit done failed: ChannelAccessFailure
otbr-agent[314]: 1d.00:05:53.198 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:83, chksum:5473, ecn:no, to:0xffff, sec:yes, error:ChannelAccessFailure, prio:net
otbr-agent[314]: 1d.00:05:53.198 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[314]: 1d.00:05:53.198 [N] MeshForwarder-:     dst:[ff02:0:0:0:0:0:0:1]:19788
otbr-agent[314]: 1d.00:31:19.290 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:e65f, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[314]: 1d.00:31:19.291 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[314]: 1d.00:31:19.291 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788
otbr-agent[314]: 1d.00:31:20.290 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:da37, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[314]: 1d.00:31:20.290 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[314]: 1d.00:31:20.291 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788

I don't know if this information can help but I also saw the ChannelAccessFailure I have in #3408 logs.

FYI I restarted the addon (w/ SkyConnect unplug-replug), restarted HA (2024.1.5), HAOS and even the VM host, same result: it works then it becomes unstable (several cycles of few seconds/minutes of downtime where devices are unavailable followed by a recovery) then it breaks completly after a while (and "a while" here is faster than previous versions. Like minutes/hours vs days/weeks in older versions like the one from this summer)

Since a picture is worth a thousand words, here's my entities history.
image
You can see lots of tiny black bars where devices are unavailable and the stable => unstable => fully broken cycle I described

@puddly
Copy link
Collaborator

puddly commented Jan 31, 2024

ChannelAccessFailure is the radio firmware reporting that it is refusing to send a packet: the channel is congested. This means that your environment is too noisy. Make sure your SkyConnect is away from interference sources such as SSDs, 2.4GHz WiFi, USB 3.0 ports, and so on.

@theblackhole
Copy link

Thanks.
Strange because I haven't changed anything in my setup since its installation and I don't remember seeing this message in the previous logs (which I looked at each time there was a connection problem).
I'll scan and change my 2.4Ghz WiFi channels and/or skyconnect + find a longer USB extension cable than the one provided with the dongle to see if that helps.

@MattWestb
Copy link

Before I went back to just the Zigbee on the stick: I also restored a backup of 2 weeks ago and do not know if I can keep on upgrading to newer HA versions and other addons. What is the current status, can we in general upgrade to latest versions??

If rolling back one backup you is also reversing the current state of the frame counter so all devices is throwing the frames as they is thinking its one reapply attack so its expected.
If rolling back one backup you shall increasing the frame-counter by the instruction. Some times is enough rebooting one router and the network and the coordinator is coming online OK.

@HFeenstra
Copy link

HFeenstra commented Feb 1, 2024

@MattWestb Sorry I don’t fully understand what you’re saying.

@sergeykad
Copy link

sergeykad commented Feb 1, 2024

I updated everything to the last versions and still getting an error. Restart + reconfiguring ZHA fixes its, but it always returns after a day or two.

Logger: homeassistant.config_entries
Source: config_entries.py:406
First occurred: 8:27:57 PM (1 occurrences)
Last logged: 8:27:57 PM

Error setting up entry SONOFF Zigbee 3.0 USB Dongle Plus V2 for zha: Incorrect firmware installed
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/__init__.py", line 163, in async_setup_entry
    zha_gateway = await ZHAGateway.async_from_config(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 196, in async_from_config
    await instance.async_initialize()
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 214, in async_initialize
    await app.startup(auto_form=True)
  File "/usr/local/lib/python3.11/site-packages/zigpy/application.py", line 225, in startup
    await self.connect()
  File "/usr/local/lib/python3.11/site-packages/bellows/zigbee/application.py", line 148, in connect
    await ezsp.startup_reset()
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 125, in startup_reset
    await self.reset()
  File "/usr/local/lib/python3.11/site-packages/bellows/ezsp/__init__.py", line 151, in reset
    await self._gw.reset()
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 406, in async_setup
    result = await component.async_setup_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/__init__.py", line 191, in async_setup_entry
    raise ConfigEntryError("Incorrect firmware installed") from exc
homeassistant.exceptions.ConfigEntryError: Incorrect firmware installed

@theblackhole
Copy link

I'll scan and change my 2.4Ghz WiFi channels and/or skyconnect + find a longer USB extension cable than the one provided with the dongle to see if that helps.

@puddly Unfortunately it didn't help. I changed the channel to an better/empty one (with the help from a scanner app), successfully got rid of the ChannelAccessFailure errors but this "stable => unstable => fully broken" cycle is still here

2.4.3 Thread instability logs (after channel switch)
otbr-agent[308]: 1d.09:33:59.127 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:69c8:2d12:af49:5ec2]:5540
otbr-agent[308]: 1d.09:33:59.127 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:35203
otbr-agent[308]: 1d.09:34:08.870 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:35:20.451 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:35:20.531 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:35:21.067 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:35:21.118 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:35:21.721 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:35:21.764 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:35:31.164 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1087, chksum:9a50, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-66.0
otbr-agent[308]: 1d.09:35:31.168 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:69c8:2d12:af49:5ec2]:5540
otbr-agent[308]: 1d.09:35:31.168 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:44717
otbr-agent[308]: 1d.09:43:43.950 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:44.073 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:44.569 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:44.667 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:45.800 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:46.569 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:47.076 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:47.521 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:47.617 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:47.690 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:47.716 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:47.828 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:48.101 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:48.118 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:48.135 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:48.169 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:48.232 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:43:48.256 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:45.175 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:45.248 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:45.274 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:45.828 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:45.926 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:46.495 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:46.534 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:45:54.181 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1079, chksum:cae0, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-67.375
otbr-agent[308]: 1d.09:45:54.185 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[308]: 1d.09:45:54.185 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:58433
otbr-agent[308]: 1d.09:45:56.178 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1072, chksum:9845, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-67.0
otbr-agent[308]: 1d.09:45:56.178 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[308]: 1d.09:45:56.178 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:58433
otbr-agent[308]: 1d.09:47:07.400 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:47:07.899 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:47:09.448 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:47:09.592 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:47:10.120 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:47:10.160 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.09:47:17.184 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1064, chksum:f31d, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-66.0
otbr-agent[308]: 1d.09:47:17.190 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:69c8:2d12:af49:5ec2]:5540
otbr-agent[308]: 1d.09:47:17.190 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:49287
otbr-agent[308]: 1d.09:47:19.184 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1071, chksum:0915, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-68.25
otbr-agent[308]: 1d.09:47:19.190 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[308]: 1d.09:47:19.190 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:49287
otbr-agent[308]: 1d.09:47:25.179 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1058, chksum:dd74, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-68.125
otbr-agent[308]: 1d.09:47:25.184 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[308]: 1d.09:47:25.184 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:49287
otbr-agent[308]: 1d.10:01:21.027 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:01:21.137 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:01:21.663 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:01:21.711 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:01:22.305 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:01:22.360 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:01:27.295 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1070, chksum:ada5, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-65.75
otbr-agent[308]: 1d.10:01:27.304 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:69c8:2d12:af49:5ec2]:5540
otbr-agent[308]: 1d.10:01:27.318 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:54051
otbr-agent[308]: 1d.10:01:27.319 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1070, chksum:ada5, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-66.0
otbr-agent[308]: 1d.10:01:27.319 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:69c8:2d12:af49:5ec2]:5540
otbr-agent[308]: 1d.10:01:27.320 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:54051
otbr-agent[308]: 1d.10:04:27.198 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:04:27.290 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:04:27.845 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:04:27.912 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:04:28.506 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:04:28.544 [W] Platform------: [netif] Failed to transmit, error:Parse
otbr-agent[308]: 1d.10:04:39.321 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1087, chksum:6951, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-66.625
otbr-agent[308]: 1d.10:04:39.340 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:1344:7bda:1c41:cadf]:5540
otbr-agent[308]: 1d.10:04:39.341 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:44049
otbr-agent[308]: 1d.10:04:42.321 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:1065, chksum:ab74, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-66.25
otbr-agent[308]: 1d.10:04:42.327 [N] MeshForwarder-:     src:[fd35:68f1:e42d:1:69c8:2d12:af49:5ec2]:5540
otbr-agent[308]: 1d.10:04:42.327 [N] MeshForwarder-:     dst:[fd35:68f1:e42d:1:f6f7:a48:7162:eefc]:44049
otbr-agent[308]: 1d.16:18:15.203 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:a449, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[308]: 1d.16:18:15.208 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[308]: 1d.16:18:15.208 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788
otbr-agent[308]: 1d.16:18:16.239 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:19af, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[308]: 1d.16:18:16.239 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[308]: 1d.16:18:16.240 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788
otbr-agent[308]: 1d.16:18:17.208 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:4485, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[308]: 1d.16:18:17.208 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[308]: 1d.16:18:17.208 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788
otbr-agent[308]: 1d.16:26:06.209 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:58cb, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[308]: 1d.16:26:06.209 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[308]: 1d.16:26:06.209 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788
otbr-agent[308]: 1d.16:26:07.207 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:6217, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[308]: 1d.16:26:07.207 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[308]: 1d.16:26:07.207 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788
otbr-agent[308]: 1d.16:26:08.219 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:96, chksum:8597, ecn:no, to:829cf2c19ab787e5, sec:no, error:NoAck, prio:net
otbr-agent[308]: 1d.16:26:08.219 [N] MeshForwarder-:     src:[fe80:0:0:0:18c0:51bb:ce16:e378]:19788
otbr-agent[308]: 1d.16:26:08.220 [N] MeshForwarder-:     dst:[fe80:0:0:0:809c:f2c1:9ab7:87e5]:19788

Timeline after channel change :
image

Dont hesitate if I can do something to help you pinpoint the issue :)

@agners
Copy link
Member

agners commented Feb 2, 2024

@theblackhole

I only have Matter over Thread devices ATM but I want to believe in multiprotocol hahaha 😄

TBH, personally, I've kinda lost my faith 🙈 😉

This is my Matter devices running on a Thread network with dedicated/pure OpenThread Border Router add-on. Black bars are reboots 😆 😉 .

image

(that one black bar was probably just that particular Nanoleaf bulb crashing 🙈 )

Especially since you don't have any Zigbee devices, I can really recommend switching to the pure/dedicated firmware.

From a Multiprotocol stand point, it would also be interesting to see if you have the same errors on the pure OTBR firmware (if so, then it is probably more related to you RF environment).

Switching to the dedicated firmware is rather easy: Just disable multiprotocol and install the OTBR add-on (see also this guide). In the Thread configuration page you'll be able to reconfigure the newly added OTBR to use your old (the preferred) network.

image

With that your devices should talk to your new OTBR and be reachable soon after you've reconfigured the OTBR.

Btw, the pure OTBR has TREL enabled as well. TREL allows Thread border routers to pass Thread frames through WiFi/Ethernet, and hence lower the network load on the mesh. One of the main selling points of Thread IMHO 🤩

@MattWestb
Copy link

@agners Your screenshot is private so cant look on it :-(

@theblackhole
Copy link

@agners

Especially since you don't have any Zigbee devices, I can really recommend switching to the pure/dedicated firmware.

From a Multiprotocol stand point, it would also be interesting to see if you have the same errors on the pure OTBR firmware (if so, then it is probably more related to you RF environment).

Ok I'll try switching to pure OTBR in few days and see how it goes.

Btw, the pure OTBR has TREL enabled as well. TREL allows Thread border routers to pass Thread frames through WiFi/Ethernet, and hence lower the network load on the mesh. One of the main selling points of Thread IMHO 🤩

Hum interesting. Isn't that how the Nanoleaf Desktop app can control my Thread+Matter bulbs or this have nothing to do with it? Because I'm not pure OTBR and yet the application works (not all the time but sometimes yes)

@MattWestb

@agners Your screenshot is private so cant look on it :-(

I can see them. Could be a temporary connectivity/github issue or an ad blocker maybe ?

@MattWestb
Copy link

Looks like one bug is fixed in the thread communication that can being the problem we have seen (at least for HomeKit devices but i dont knowing if its also helping matter connected ones).
Jc2k/aiohomekit#358

@agners
Copy link
Member

agners commented Feb 3, 2024

Hum interesting. Isn't that how the Nanoleaf Desktop app can control my Thread+Matter bulbs or this have nothing to do with it? Because I'm not pure OTBR and yet the application works (not all the time but sometimes yes)

This is not related. Afaik, the Nanoleaf bulb talk their own IP based protocol with the bulbs. And that works with both add-ons.

By default, any Thread border router which gets a frame routes it through the (RF-only) mesh. Imagine you have two border routers, and one is close and one is far away from a particular bulb. Without TREL, the packet will travel through the RF mesh, even though it is far away. With TREL, the frame will get forwarded to the closer router via Ethernet, and only then goes through the RF network. If you have a single border router, and for smaller mesh networks it doesn't' really matter. But it can make ea difference for large network.

@rzr
Copy link

rzr commented Feb 13, 2024

May you forward the issue to:

https://community.silabs.com/

And crosslink to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests