Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

siproxd: init requires that interface_inbound and interface_outbound … #801

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Azureit
Copy link

@Azureit Azureit commented Jan 13, 2023

…be set

siproxd will not work without if_inbound and if_outbound defined in the config file. Tell meaningful error about interface_inbound and interface_outbound

Maintainer: @micmac1

Signed-off-by: Azureit azurite@mailfence.com

…be set

siproxd will not work without if_inbound and if_outbound defined in the config file.
Tell meaningful error about interface_inbound and interface_outbound

Signed-off-by: Azureit <1774476+Azureit@users.noreply.github.com>
Signed-off-by: Azureit <1774476+Azureit@users.noreply.github.com>
Function 'network_get_physdev' doesn't work with wireguard interfaces, fix this by changing the function to 'network_get_device'

Signed-off-by: Azureit <1774476+Azureit@users.noreply.github.com>
@micmac1
Copy link
Collaborator

micmac1 commented Jan 14, 2023

Hallo @Azureit

Thanks for bringing this up. I'll look into it in a little while more thoroughly. What caught my attention so far:

a) the "exit 1" in "die()". It's a function so it should return but not exit. It would exit the whole init process and that would mean procd can't keep track. See also [1].
b) Commit subjects should follow convention "package: short description"
c) Signed-off-by needs a valid mail address and full name.

I'd also like to ask @guidosarducci to take a look at this as he kindly provided the procd support some time ago.

[1] #790

Kind regards,
Seb

@micmac1
Copy link
Collaborator

micmac1 commented Jan 25, 2023

I added the changes to my siproxd init script and ran it with default config. It ran the ubus calls and each time waited around 30 seconds.

ubus -t 30 wait_for network.interface network.lan
<30 seconds waiting around>
ubus -t 30 wait_for network.interface network.wan
<30 seconds waiting around>

I suppose that is unexpected.


ifstatus lan
{
	"up": true,
<snip>
ifstatus wan
{
	"up": true,

Please review this again.

Kind regards,
Seb

@Azureit
Copy link
Author

Azureit commented Jan 26, 2023

I added the changes to my siproxd init script and ran it with default config. It ran the ubus calls and each time waited around 30 seconds.

ubus -t 30 wait_for network.interface network.lan
<30 seconds waiting around>
ubus -t 30 wait_for network.interface network.wan
<30 seconds waiting around>

I suppose that is unexpected.

I adapted that ubus command from here, but I try to run that command via ssh and it timeout 30 seconds too.
With ubus list command I notice network.lan doesn't exist, it exists network.interface.lan and ubus -t 30 wait_for network.interface.lan works as expected with no wait because lan is active.
We need to use ubus -t 30 wait_for network.interface.lan to check if interface is valid. But the problem is ubus wait_for command waits for the interface to be up or is just confirms the existence of the interface?
I guess we could loop command ubus call network.interface.lan status until up=true
The ubus wait_for command would be best if support wait for status variable changes, in our case wait up=true

Signed-off-by: Azureit <1774476+Azureit@users.noreply.github.com>
Signed-off-by: Azureit <1774476+Azureit@users.noreply.github.com>
@Azureit
Copy link
Author

Azureit commented Jan 29, 2023

@micmac1
Finally I rewrite setup_networks() to handle wireguard interface correctly.
The problem was the wireguard interface is being initialized two times(don't know why), so the ubus -t 30 wait_for network.interface.wg0 would wait for the first initialization and before the next command run, the the wireguard interface "Interface 'wg0' has lost the connection", this would make the command to find the .l3_device device name fail, now it will retry to find the .l3_device device name.
What do you think?

@Azureit
Copy link
Author

Azureit commented Feb 1, 2023

@micmac1 what is the proper way to tell procd there was a problem at siproxd init and abort the siproxd init.
A simple replace exit 1 for a return doesn't work, I tested and it will still run siproxd.

@guidosarducci
Copy link
Contributor

@micmac1 @Azureit I missed this the first time around so just having a look but having some trouble following along...

How to reproduce the issue you see? I see #800 which suggests some kind of race problem, but it lacks a log file showing any details of the network startup and siproxd initialization. I also don't see the relevant siproxd or network config details. Could you explain #801 (comment) a bit more clearly?

I tried reproducing with a simple setup but I don't see the siproxd error from #800 . Using current OpenWrt 23.05.2, I created a wg interface from Luci called wg_voip, put option interface_outbound wg_voip into the default siproxd config, and enabled some siproxd debugging. After rebooting I see:

Mon Nov 20 16:57:02 2023 daemon.notice netifd: Interface 'lan' is enabled
Mon Nov 20 16:57:02 2023 daemon.notice netifd: Interface 'lan' is setting up now
Mon Nov 20 16:57:02 2023 daemon.notice netifd: Interface 'lan' is now up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' is enabled
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' is setting up now
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' is now up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wan' is enabled
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wan6' is enabled
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wg_voip' is setting up now
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Network device 'eth0' link is up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Network device 'lo' link is up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' has link connectivity
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wg_voip' is now up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Network device 'wg_voip' link is up
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: siproxd.c:241 INFO:siproxd-0.8.4dev-none 2023-11-11T17:18:29 arm-openwrt-linux-gnu starting up
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:293 running w/uid=0, euid=0, gid=0, egid=0
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:329 changing uid/gid to nobody
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:332 changed gid to 65534 - Ok
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:336 changed egid to 65534 - Ok
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:340 changed euid to 65534 - Ok
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: rtpproxy_relay.c:119 INFO:Current thread stacksize is 128 kB
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: sock.c:139 INFO:bound to port 5060
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: register.c:119 WARNING:registration file may be corrupt or URLMAP_SIZE has been resized
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: siproxd.c:351 INFO:siproxd-0.8.4dev-none 2023-11-11T17:18:29 arm-openwrt-linux-gnu started
Mon Nov 20 16:57:06 2023 daemon.notice netifd: Network device 'lan1' link is up
Mon Nov 20 16:57:06 2023 daemon.notice netifd: bridge 'br-lan' link is up
Mon Nov 20 16:57:06 2023 daemon.notice netifd: Interface 'lan' has link connectivity

So no error like daemon.err siproxd[1]: utils.c:401 ERROR:Don't know what interface to look for - configuration error?

@guidosarducci
Copy link
Contributor

Hmm, I did notice a strange behaviour with wg interfaces (and also our default wan!). See for example:

root@OpenWrt:~# (. /lib/functions/network.sh; for i in wan lan wg_voip; do for f in network_get_device network_get_physdev; do eval $f _dev $i; echo "$f($i),ret="$_dev,$?; done; done;)
network_get_device(wan),ret=,1
network_get_physdev(wan),ret=wan,0
network_get_device(lan),ret=br-lan,0
network_get_physdev(lan),ret=br-lan,0
network_get_device(wg_voip),ret=wg_voip,0
network_get_physdev(wg_voip),ret=,1

So ..._physdev works with wan but not wg_voip, and ...device work with wg_voip but not wan, and br-lan work with both. It seems like every time I look the behaviour changes.

This commit address the problem I saw, but not sure if it applies to your issue. Worth a try though...
guidosarducci@8a0e178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants