Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkManager connectivity check also changes cellular connectivity state to limited when ethernet connection is broken #3016

Open
everhardt opened this issue Feb 9, 2023 · 5 comments

Comments

@everhardt
Copy link

everhardt commented Feb 9, 2023

I have a Compulab IOT-gate-imx8 device and tested balenaOS 2.108.29 as it includes the fix for #2964. The good news is that connectivity checks are performed again, the bad news is that it doesn't work properly.

The device is running with both eth0 and wwan0 (also called cdc-wdm0) connections having internet connection (both with state "FULL" in terms of NetworkManager). If I now break the eth0 connection, I would expect that at the next NetworkManager connectivity check, it detects that eth0 has no internet connection ("LIMITED" in terms of NetworkManager) and that it then increases the route-metric of eth0 with 20000 and switches routing to wwan0.

I tried this and it actually detects properly that the eth0 connection is broken, but it also thinks the wwan0 is broken, see these (filtered) journalctl logs:

Feb 09 09:52:04 27eb737 NetworkManager[1358]: <debug> [1675936324.3566] connectivity: (eth0,IPv4,146) start request to 'https://api.balena-cloud.com/connectivity-check' (try resolving 'api.balena-cloud.com' using system resolver)
Feb 09 09:52:04 27eb737 NetworkManager[1358]: <debug> [1675936324.3569] connectivity: (eth0,IPv6,147) start request to 'https://api.balena-cloud.com/connectivity-check' (try resolving 'api.balena-cloud.com' using system resolver)
Feb 09 09:52:04 27eb737 NetworkManager[1358]: <debug> [1675936324.3580] connectivity: (wwan0,IPv4,160) start request to 'https://api.balena-cloud.com/connectivity-check' (try resolving 'api.balena-cloud.com' using system resolver)
Feb 09 09:52:04 27eb737 NetworkManager[1358]: <debug> [1675936324.3584] connectivity: (wwan0,IPv6,161) skip connectivity check due to no IP address configured
Feb 09 09:52:04 27eb737 NetworkManager[1358]: <debug> [1675936324.3592] connectivity: (wwan0,IPv6,161) check completed: NONE; no IP address configured
Feb 09 09:52:34 27eb737 NetworkManager[1358]: <debug> [1675936354.3837] connectivity: (eth0,IPv4,146) failure to resolve name: Error resolving “api.balena-cloud.com”: Name or service not known
Feb 09 09:52:34 27eb737 NetworkManager[1358]: <debug> [1675936354.3838] connectivity: (eth0,IPv4,146) check completed: LIMITED; resolve-error
Feb 09 09:52:34 27eb737 NetworkManager[1358]: <debug> [1675936354.3839] device[c87841ab5c6d5ef0] (eth0): connectivity state changed from FULL to LIMITED
Feb 09 09:52:34 27eb737 NetworkManager[1358]: <debug> [1675936354.4331] connectivity: (wwan0,IPv4,160) failure to resolve name: Error resolving “api.balena-cloud.com”: Name or service not known
Feb 09 09:52:34 27eb737 NetworkManager[1358]: <debug> [1675936354.4332] connectivity: (wwan0,IPv4,160) check completed: LIMITED; resolve-error
Feb 09 09:52:34 27eb737 NetworkManager[1358]: <debug> [1675936354.4333] device[5c1c4171581cdefa] (cdc-wdm0): connectivity state changed from FULL to LIMITED
@everhardt
Copy link
Author

By the way, these are the settings NetworkManager reports when it starts:

Feb 09 09:36:19 27eb737 NetworkManager[1358]: <info>  [1675935379.4474] dns-mgr: init: dns=default,systemd-resolved rc-manager=resolvconf

IIUC balena-os does not have systemd-resolved but dnsmasq?

@everhardt
Copy link
Author

The contents of resolv.dnsmasq -> /var/run/resolvconf/interface/NetworkManager in my case are:

nameserver 192.168.1.1
nameserver 8.8.8.8
nameserver 8.8.4.4

The first line is coming from the dhcp of eth0, the other two from wwan0.

I might be off on a tangent here, but I suspect that the DNS resolver uses the eth0 interface for all three name servers, even if eth0 cannot reach them. I think this could be solved if resolvconf would instead write

nameserver 192.168.1.1@eth0
nameserver 8.8.8.8@wwan0
nameserver 8.8.4.4@wwan0

so that the DNS resolver would know which interface to use.

@jellyfish-bot
Copy link

[mpous] This has attached https://jel.ly.fish/a235fd1d-49fb-44bb-aa4f-496d04dbb20b

@majorz
Copy link
Contributor

majorz commented Feb 24, 2023

Summarizing here some of the discussion we had on the other support thread. We found out that this is happening when NetworkManager's CheckConnectivity is called explicitly from a container when connection to some remote server is lost - for the purpose of regaining connectivity sooner and not relying on the check interval only. When the default behavior with relying on the connectivity check interval all is working well, although we still saw similar DNS errors.

Those are hinted about on here and reportedly the connectivity check works better when systemd-resolved is present:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/57d226d3f08d8a904a554367e799c9c367032b0d

Although the per-interface connectivity check was removed later on:
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/e6dac4f0b67e5abd10e0f8a82e040d8374f607a8

We do not currently used systemd-resolved, but instead we use dnsmasq as a DNS forwarder and cache. Possibly if we switch to using systemd-resolved the connectivity check will work better, but this still has to be investigated.

Workarounds can be implemented on the container side for this as well - like adjusting metrics on the secondary interface from a container if both interfaces fail. Another possibility is disabling the NM connectivity check completely and doing a custom container solution that fits specific use-cases.

@alexgg
Copy link
Contributor

alexgg commented May 4, 2023

@everhardt I think you're right and the workaround here is to set specific DNS resolvers to the extra interfaces using the dnsServers entry in config.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants