Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkManager connectivity check ignores interfaces with no outgoing route #3053

Open
alexgg opened this issue Mar 3, 2023 · 3 comments
Open

Comments

@alexgg
Copy link
Contributor

alexgg commented Mar 3, 2023

On a device with both ethernet and LTE, where ethernet is the primary interface, if it goes down, the connection should switch to LTE.

This usually works fine, but if the wired uplink doesn't have a route to the internet at boot, the NetworkManager's connectivity checks result in a time out, rather than finding the wired uplink isn't available but LTE is.

Disconnecting the wired interface, either physically, or via nmcli device disconnect $IFNAME, then reconnecting it will result in NetworkManager correctly identifying that the LTE modem is preferable.

Could be related to https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/350

@nilstoedtmann
Copy link

nilstoedtmann commented Mar 9, 2023

Hi. I think we are the ones who discovered this issue and reported it via Balena's chat support.

It's quite a devastating bug that has left two (out of 11) of our devices offline and required us to arrange site visits.

To re-iterate the issue:

Assume you have two Internet uplinks that usually both work:

  • enp1s0 wired (dhcp)
  • wwan0 LTE

Now at some point, the wired uplink on enp1s0 fails somewhere on its path (but enp1s0 still has link, it's not getting unplugged!). As expected, BalenaOS detects the uplink fail and successfully falls back onto wwan0, all is good.

But now reboot. The device comes back up, but this time fails to realise that enp1s0 has no working Internet uplink, and does not fall back onto wan0. The device stays offline - and not even a reboot helps! To the contrary, it was a reboot that triggered this bug.

What does resolve the issue is to pull the plug on enp1s0 so it no longer has link, so NetworkManager downs enp1s0 and finally falls back onto wwan0

Affected versions: BalenaOS 2.113.4 (genericx86-64-ext) and earlier

@nilstoedtmann
Copy link

Note that other than in the mentioned NM issue #350, in our case the WWAN connection was never stopped.

It might still be related, because at boot time, it takes the WWAN interface significantly longer than the wired one to come up.

@alexgg
Copy link
Contributor Author

alexgg commented May 4, 2023

Just a note that adding dedicated DNS servers for specific interfaces addresses this. For example, balenaOS by default adds a Google public DNS resolver that is used by the system resolver. Another resolver can be added in config.json for a different interface, like dnsServers: 1.1.1.1@wlan0. That will bind the Cloudfare public DNS resolver to use use the wlan0 wireless interface.
With such configuration, fallover between a primary ethernet a secondary wireless work fine and each interface uses its own DNS resolver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants