Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No new subflow when removing an endpoint and adding a new one #416

Open
ChrisChoke opened this issue Jun 21, 2023 · 12 comments
Open

No new subflow when removing an endpoint and adding a new one #416

ChrisChoke opened this issue Jun 21, 2023 · 12 comments
Labels

Comments

@ChrisChoke
Copy link

Hey Team,

thanks for this great project and the work to get it in the linux kernel. nice job.
i ve been using mptcpV0 since a while and want to migrate my setup to mptcpV1. but i have some strange behavior when my interfaces is reconnecting.

My setup has a client with 2 or more cellular interfaces while my server just have one wired interface. On both site i run Debian 12. Kernel 6.1.
But i tried Kernel 6.3 from sid and your mptcp-next/export Kernel as well with a patch from #391.

On Server site i have openVPN up an running via mptcpize enabled openvpn@server.service ss command show me listen mptcp port.
mptcpd.service is stopped and masked.

On client site i using openvpn via mptcpize run openvpn --some-options. mptcpd is stopped and masked. i setup endpoint via ip mptcp endpoint manually.
the limits are on subflow 8 and add_addr_accepted 8
when all provider i use support mptcp it looks great. but in my case i have a provider which seems to block mptcp since of beginning 2021 if i remember.
so in that case i creating a udp openvpn tunnel to another server and use the created tap interface as subflow. sometimes i get it up and running.

root@mptcp-v1-client:/home/user1# ss -M4tn
Netid      State      Recv-Q      Send-Q              Local Address:Port                Peer Address:Port       Process
tcp        ESTAB      0           0                  <cellular1-address>:54628             <external-address>:30194
tcp        ESTAB      0           0                 10.220.0.2%tap2:49913             <external-address>:30194
tcp        ESTAB      0           0                 10.30.0.2%bond0:46789             <external-address>:30194
mptcp      ESTAB      0           0                  <cellular1-address>:54628             <external-address>:30194

but if i reconnect the wwan interface which i am using for the udp connection and the tap interface is closed and created again, the subflow dont come back.
another scenario is when i reconnect wwan1 which is the mainflow part, the wwan1 dont come back to the mptcp connection as well. ``ìp mptcp monitor```show
me nothing about SF_CLOSED or something. just this:

initial connection:

[         CREATED] token=ade516d1 remid=0 locid=0 saddr4=<cellular1-address> daddr4=<external-address> sport=54628 dport=30194
[     ESTABLISHED] token=ade516d1 remid=0 locid=0 saddr4=<cellular1-address> daddr4=<external-address> sport=54628 dport=30194
[       SF_CLOSED] token=ade516d1 remid=0 locid=1 saddr4=10.30.0.2 daddr4=<external-address> sport=49657 dport=0 backup=1 error=104 ifindex=9
[  SF_ESTABLISHED] token=ade516d1 remid=0 locid=1 saddr4=10.30.0.2 daddr4=<external-address> sport=46789 dport=30194 backup=1 ifindex=9
[  SF_ESTABLISHED] token=ade516d1 remid=0 locid=3 saddr4=10.220.0.2 daddr4=<external-address> sport=49913 dport=30194 backup=0 ifindex=8

after reconnect wwan1:

[       SF_CLOSED] token=ade516d1 remid=0 locid=0 saddr4=0.0.0.0 daddr4=<external-address> sport=54628 dport=0 backup=0
[       SF_CLOSED] token=ade516d1 remid=0 locid=2 saddr4=<cellular1-address> daddr4=<external-address> sport=43183 dport=0 backup=0 ifindex=4

the second sceanrio is that the tap interface is the mainflow mptcp connection like so:

root@mptcp-v1-client:/home/user1# ss -M4tn
Netid      State      Recv-Q      Send-Q                  Local Address:Port              Peer Address:Port      Process
tcp        ESTAB      0           0                <cellular1-address>%wwan1:34899           <external-address>:30194
tcp        ESTAB      0           0                          10.220.0.2:60974           <external-address>:30194
tcp        ESTAB      0           0                     10.30.0.2%bond0:56391           <external-address>:30194
mptcp      ESTAB      0           0                          10.220.0.2:60974           <external-address>:30194

if i reconnect wwan1, all is fine and the interface come back to the mptcp connection. but if i reconnect the wwan2 for the udp tap interface,
this interface dont come back to the mptcp connection. in this case i see in ip mptcp monitor this:

initial connection:

[         CREATED] token=2f578878 remid=0 locid=0 saddr4=<cellular1-address> daddr4=<external-address> sport=60234 dport=30194
[     ESTABLISHED] token=2f578878 remid=0 locid=0 saddr4=<cellular1-address> daddr4=<external-address> sport=60234 dport=30194
[          CLOSED] token=2f578878
[         CREATED] token=66fe4d59 remid=0 locid=0 saddr4=10.220.0.2 daddr4=<external-address> sport=60974 dport=30194
[     ESTABLISHED] token=66fe4d59 remid=0 locid=0 saddr4=10.220.0.2 daddr4=<external-address> sport=60974 dport=30194
[  SF_ESTABLISHED] token=66fe4d59 remid=0 locid=2 saddr4=<cellular1-address> daddr4=<external-address> sport=34899 dport=30194 backup=0 ifindex=4
[       SF_CLOSED] token=66fe4d59 remid=0 locid=1 saddr4=10.30.0.2 daddr4=<external-address> sport=52003 dport=0 backup=1 error=104 ifindex=9
[  SF_ESTABLISHED] token=66fe4d59 remid=0 locid=1 saddr4=10.30.0.2 daddr4=<external-address> sport=56391 dport=30194 backup=1 ifindex=9

reconnecting interface of mptcp mainflow:

root@mptcp-v1-client:/home/user1# ip mptcp monitor
[       SF_CLOSED] token=66fe4d59 remid=0 locid=0 saddr4=10.220.0.2 daddr4=<external-address> sport=60974 dport=30194 backup=0
[          CLOSED] token=d94ab374
[       SF_CLOSED] token=66fe4d59 remid=0 locid=0 saddr4=0.0.0.0 daddr4=<external-address> sport=60974 dport=0 backup=0
[       SF_CLOSED] token=66fe4d59 remid=0 locid=3 saddr4=<cellular2-address> daddr4=<external-address> sport=50551 dport=0 backup=0 error=104 ifindex=6
[       SF_CLOSED] token=66fe4d59 remid=0 locid=0 saddr4=0.0.0.0 daddr4=<external-address> sport=60974 dport=0 backup=0
[       SF_CLOSED] token=66fe4d59 remid=0 locid=3 saddr4=10.220.0.2 daddr4=<external-address> sport=40877 dport=0 backup=0 ifindex=10

the destination port printed out as 0.
If the tap interface is recreated with openvpn the ifindex number is increasing. so i always have a new ifindex number. can this be a problem?

This setup works very well and stable with mptcpV0. Most of this run with mptcpV1 as well, but i run into this trouble when a cellular interface is reconnecting.
my endpoints on client site are configured with subflow fullmesh just like on the server side, too.

ip rules and routing tables for the interfaces are setted up.

hope we can explain this behavior and could find a working solution. :-)
if you need further information, feel free asking. i will help where i can. but i am not familiar with gdb or something, here need some explanation beforehand.

greets
Chris

@matttbe
Copy link
Member

matttbe commented Jul 5, 2023

Hi Chris,

Thank you for this bug report! (and sorry for the delay)

(Note that MPTCPv0 and MPTCPv1 refers to the protocol, not the implementation)

if i reconnect wwan1, all is fine and the interface come back to the mptcp connection. but if i reconnect the wwan2 for the udp tap interface, this interface dont come back to the mptcp connection.

When wwan2 is reconnected, I guess you re-configure the routing rules (ip rule from <IP WWAN2> table <TABLE> and ip route default via <GATEWAY WWAN2> dev wwan2 table <TABLE>) and the endpoints (ip mptcp endpoint add <IP WWAN2> dev wwan2 <signal or subflow [fullmesh]>) that have been removed when the interface has been put down, right?

my endpoints on client site are configured with subflow fullmesh just like on the server side, too.

On the server side, I guess you use signal instead of subflow, right?

@ChrisChoke
Copy link
Author

ChrisChoke commented Jul 6, 2023

Good Morning Mat,

When wwan2 is reconnected, I guess you re-configure the routing rules (ip rule from <IP WWAN2> table <TABLE> and ip route default via <GATEWAY WWAN2> dev wwan2 table <TABLE>) and the endpoints (ip mptcp endpoint add <IP WWAN2> dev wwan2 <signal or subflow [fullmesh]>) that have been removed when the interface has been put down, right?

Yes that's right, I flush all tables ip rule flush table <table>,
ip route flush table <table>
And the ip mptcp endpoint, too. In endpoint will show you <ip address> dev if9 subflow which is the index of the iFace which I put down. So here I have to reconfigure because the index number will increase when I create new one.

On the server side, I guess you use signal instead of subflow, right?

Well, on the server I tried both but couldn't observe an other behavior. It's a bit unclear for me what I need to configure. On the shadowsocks (tessares) or RedHat tutorial it is described as signal on the server site but did not understand why I have to configure with signal.

Chris

@matttbe matttbe changed the title reestablishment on reconnecting mainflow ifaces and tap ifaces No new subflow when removing an endpoint and adding a new one Jul 6, 2023
@matttbe matttbe added bug and removed question labels Jul 6, 2023
@matttbe
Copy link
Member

matttbe commented Jul 6, 2023

And the ip mptcp endpoint, too. In endpoint will show you dev if9 subflow which is the index of the iFace which I put down. So here I have to reconfigure because the index number will increase when I create new one.

OK, thank you, maybe we don't cover well this case where a new endpoint is added later on. It would be good to try to reproduce it in a simpler setup.

I just changed the title of the ticket, I hope it represents well the issue you have. If not, feel free to modify it.

On the server side, I guess you use signal instead of subflow, right?

Well, on the server I tried both but couldn't observe an other behavior. It's a bit unclear for me what I need to configure. On the shadowsocks (tessares) or RedHat tutorial it is described as signal on the server site but did not understand why I have to configure with signal.

You are not the only one to be confused by that, we should improve something there but not sure what :)
If you have multiple interfaces, you need to add MPTCP endpoints for each additional IP addresses you want to use in the MPTCP connection. You can use the flag subflow, typically on the client side (to create additional subflows) or signal, typically on the server side (to announce additional IP addresses).

If your server doesn't have additional IP addresses, no need to configure additional endpoints.

@pabeni
Copy link

pabeni commented Jul 17, 2023

When wwan2 is reconnected, I guess you re-configure the routing rules (ip rule from <IP WWAN2> table <TABLE> and ip route default via <GATEWAY WWAN2> dev wwan2 table <TABLE>) and the endpoints (ip mptcp endpoint add <IP WWAN2> dev wwan2 <signal or subflow [fullmesh]>) that have been removed when the interface has been put down, right?

Yes that's right, I flush all tables ip rule flush table <table>, ip route flush table <table> And the ip mptcp endpoint, too. In endpoint will show you <ip address> dev if9 subflow which is the index of the iFace which I put down. So here I have to reconfigure because the index number will increase when I create new one.

Could you please list the full configuration steps you use, including devices and endpoints removal and recreation? Just to avoid natural language ambiguity. A shell script including all the relevant commands would be ideal.

Thank!

@ChrisChoke
Copy link
Author

good morning team,

sorry for my big delay. i am currently on vacation and since 17 jul. i am back in office on 14 aug. (yeah thats large vacations 😃 )
so i will try to share all of this how i do set up my devices.

Chris

@ChrisChoke
Copy link
Author

hey hey,

sorry for delay, daily business and cold delayed me a liitle bit.
i attached a zip with the steps i do on interface setup and recreation. its not pretty but i think you can understand what i do.
hope it can help us 👍

Chris
mptcpv1_git.zip

@pabeni
Copy link

pabeni commented Nov 13, 2023

i attached a zip with the steps i do on interface setup and recreation. its not pretty but i think you can understand what i do. hope it can help us 👍

The scripts leaves several questions open/gray areas: how many vpn tunnels are you using in your test? does that use a tun or a tap interface? are the defined endpoints all 'fullmesh' ? it looks there are also 'backup' ones.

It would be more clear if you could provide, after recreating your setup, the output of the following commands:
ip mptcp endpoint
ip route
ip -4 addr
ss -MteimO
nstat -az Tcp* MPTcp*

And additionally the paired "ip mptcp monitor" output and finally the output of the above commands just after the failures.

[ SF_CLOSED] token=66fe4d59 remid=0 locid=1 saddr4=10.30.0.2 daddr4= sport=52003 dport=0 backup=1 error=104 ifindex=9

Note that this subflow closed due to a connection reset (errno=104) and the 0 dport value is really unexpected here.
Possibly the NL PM is trying to create new subflow towards the peer of the first subflow, just after such subflow has been closed (and thus disconnected, zeroing the dport).

I suspect something like the attached patch below could help, @ChrisChoke: could you please give it a shot in your testbed?
diffs.txt

@ChrisChoke
Copy link
Author

hey paolo,

thank you very much for reply. i am compiling kernel at the moment at will test after finished.

graph TD;
    A[client]-->| wwan1 tun tcp | B[server];
    A[client]-->| wwan2 tap udp| C[compat server];
    C[compat server]-->| eth0 | B[server];

i hope this diagram will help a bit. i use 2 tunnels. one tun tcp tunnel for mptcp in case that the provider supports mptcp and one tap udp tunnel to a compat server without mptcp kernel in case the provider block native mptcp.
the created tap interface will be an endpoint for mptcp.

the backup interface is the created tun interface.

do i need to install the patched kernel on client and server site?! or just on client?!

Chris

@pabeni
Copy link

pabeni commented Nov 14, 2023

thank you very much for reply. i am compiling kernel at the moment at will test after finished.
i hope this diagram will help a bit. i use 2 tunnels. one tun tcp tunnel for mptcp in case that the provider supports mptcp and one tap udp tunnel to a compat server without mptcp kernel in case the provider block native mptcp. the created tap interface will be an endpoint for mptcp.

To be sure I'm on the same page: do you want to use mptcp as transport for the openvpn tunnel? And not as the application level protocol?

Out of sheer ignorance on my side is not clear what are the different between the tun and tap tunnels WRT the protocol headers stacking. Could you please list the expected headers for egress packets on each interface?

Please, additionally share the other info as per my previous comment.

do i need to install the patched kernel on client and server site?! or just on client?!

Just on the side actually creating the additional subflows, that is the client.

@ChrisChoke
Copy link
Author

To be sure I'm on the same page: do you want to use mptcp as transport for the openvpn tunnel? And not as the application level protocol?

no i dont think so. its application level. i use mptcpize run openvpn <commands>. so that openvpn use mptcp for tunnel.
but for my case that vodafone do not support mptcp and block it. i create a tap interface without mptcpize or something.
and that tap interface will be an endpoint for the mptcpize openvpn tunnel. so this tap interface should be a subflow and listed in ss -M4tn.

Out of sheer ignorance on my side is not clear what are the different between the tun and tap tunnels WRT the protocol headers stacking. Could you please list the expected headers for egress packets on each interface?

okay, now i dont understand what i should do or describe :-) i am sorry.

i will recreate my setup with the new patched kernel at the moment.
i hope i can share more details of your request from previous comment later today.

Chris

@ChrisChoke
Copy link
Author

hey paolo,

i attached your requested output.
some explanation for the tests:

test1:
here the tap interface is the main mptcp flow with his own initial subflow. and the wwan1 is a subflow.
i reconnected wwan2 interface which is the parent interface of the tap interface. after reconnect it did not come back to the mptcp connection.

test2:
here the tap interface is the main mptcp flow with his own initial subflow. and the wwan1 is a subflow.
i reconnected wwan1 interface. the wwan1 comes back to the mptcp connection. so here is no problem.

test3:
here the wwan1 interface is the main mptcp flow with his own inital sublow. the tap interface dont join the mptcp connection. i reconnected wwan2, but it still doesnt join the mptcp connection.

test4:
the same like test3 but the monitor show other results.
in test3 you can see tap interface sf_closed but not in test4.
in test4 you can see sf_closed from nativ wwan2 ip address but not in test3.
its a bit strange.

Chris
patched_tests.zip

@ChrisChoke
Copy link
Author

hey guys, how are you? its been a long time since i heard anything.
But i come back with some fresh news about my case.

so since february/march its looks like vodafone updated their setup in the field. Now they do not block mptcp anymore. My mptcp capable tests are successful now. So my backup solution for this cellular connection (the solution to create tap vpn interfaces for using as sublow.) will fade more and more in the background.

but one behavior leave me some questions in my head.

My established connections for example. Both are natively mptcp capable.

root@mptcp-v1-client:/home/user1# ss -M4tn
Netid      State      Recv-Q      Send-Q              Local Address:Port                Peer Address:Port       Process
tcp        ESTAB      0           0                  <cellular1-address>:54628             <external-address>:30194  #### initial subflow from mptcp mainflow connection, right?
tcp        ESTAB      0           0                 <cellular2-address>%wwan2:49913        <external-address>:30194
tcp        ESTAB      0           52                       10.0.1.1:22                     10.0.1.80:55127
mptcp      ESTAB      0           0                  <cellular1-address>:54628             <external-address>:30194  #### mainflow connection

if i restart/loose connection from wwan1/cellular1. which behavior should i expect? (the interface which is "Netid" mptcp)

currently the subflow(tcp part) will change his state to FIN_WAIT and disappears after a while.
the mainflow (mptcp part) still ESTAB.
if the interface is back and up and running i dont get a new subflow of this interface. i would expect i should get something like this via ss command:

Netid      State      Recv-Q      Send-Q              Local Address:Port                Peer Address:Port       Process
tcp        ESTAB      0           0                 <cellular1-address>%wwan1:12345             <external-address>:30194

the second question is: What will happen if the mptcp-interface will go down and come back with a new ip-address?! what sould i expect in this case?!
because the interfaces are cellular devices it could be possible that i get a new ip-address via dhcp from my ISP.

looking forward to resolve my issues.

Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants