Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intermittent long stalling when using UDP EDIT: TCP likely affected too. Problem resolvable by Cloak client restart #228

Open
LindaFerum opened this issue Aug 2, 2023 · 4 comments

Comments

@LindaFerum
Copy link

LindaFerum commented Aug 2, 2023

It appears UDP has a problem where after serving very active shadowsocks UDP forwarding instance for a few hours connection will just drop and permanently stall on shadowsocks side (shadowsocks rust's log at this point shows that the services behind it are trying to send more packets but nothing arrives from Cloak's side) while Cloak client appears reasonably okay.

I believe it is a Cloak problem and not an shadowsocks or network or downstream software problem because so far I have been able to immediately resolve the issue by restarting Cloak client (while restarting shadowsocks-rust or the program actually generating the UDP packets does nothing)

It may be pretty hard to reproduce because it requires very active UDP connection (just web browsing doesn't cut it EDIT: watching a youtube video seems to "do the trick", browsing was done using OpenVPN[UDP+Socks5 to shadowsocks client))
Configuring shadowsocks UDP to use more than one worker thread (shadowsocks rust port allows that) seems to trigger the issue faster

@LindaFerum LindaFerum changed the title More love for UDP connections please intermittent failure / stalling when using UDP (shadowsocks(UDP)->Cloak->Internet) Aug 2, 2023
@LindaFerum LindaFerum changed the title intermittent failure / stalling when using UDP (shadowsocks(UDP)->Cloak->Internet) intermittent long stalling when using UDP (shadowsocks(UDP)->Cloak->Internet), resolvable by Cloak client restart Aug 2, 2023
@LindaFerum
Copy link
Author

LindaFerum commented Aug 3, 2023

Additional observation:
I tried to obtain a log using "verbosity trace"

While it is quite obvious that something is not allright (the log moves extremely fast at first in stdout but then suddenly slows to a crawl) no error is ever displayed (TRAC[2023-08-03T08:08:28+03:00] 135 read from stream 1 with err style messages, they just sort of slow down to a crawl as problem asserts itself)

Restarting cloak client (as I mentioned before) resolves the problem for a bit

EDITED TO ADD

This also happens when using TCP mode, albeit far less frequently

TCP configurations tested:
OpenVPN(TCP)<->Cloak
OpenVPN(TCP)<->shadowsocks-rust<->Cloak

(OpenVPN used to convert UDP traffic to TCP)

At consistently high load the connection would just stall eventually (openVPN losing connection)

TCP connections tend to eventually (seconds to minutes) recover from stall (so yeah, TCP works better) but there's definitely something weird going on here on Cloak's part (during a stall, restarting OpenVPN or shadowsocks does not help, but restarting client does help, suggesting it's same problem as I initially ran into with UDP)

@LindaFerum LindaFerum changed the title intermittent long stalling when using UDP (shadowsocks(UDP)->Cloak->Internet), resolvable by Cloak client restart intermittent long stalling when using UDP EDIT: TCP likely affected too. Problem resolvable by Cloak client restart Aug 3, 2023
@LindaFerum
Copy link
Author

LindaFerum commented Aug 4, 2023

Can consistently reproduce the "TCP variant" of hiccup problem via following procedure:

VM1 (runs browser with youtube video and a terminal with ping constantly trying to ping 8.8.8.8)
|
VM2 (OpenVPN, TCP mode with SOCKS proxy option (TCP) enabled, config is ProtonVPN's free tier TCP server with socks-proxy directive added)
|
VM3 (runs cloak configured to serve TCP connection to the SOCKS proxy)
|
internet
|
VPS, with Cloak server and SOCKS proxy to which TCP connection is delivered via cloak
|
more internet :)
|
ProtonVPN's free VPN (TCP of course)
|
more internet :)

Connection starts great and works reliably for 4-15 minutes
Then ping suddenly stalls for multiple seconds.
Sometimes it self recovers fast
Sometimes it takes a while.

Usually it does not break connection

Nothing in Cloak's log
Nothing in open VPN log (unless connection breaks in which case it does usual TCP openvpn dance)

Evidence it is a Cloak issue and not say, networking:

Replacing Cloak in VM3 with Dante in chaining config completely resolves the situation, no more hiccups.

When running Dante and Cloak in VM3 in parallel (on different ports) just switching between two OpenVPN configs (exactly same, but one points to Dante's port on VM3 the other to Cloak's port) allows to immediately switch between "hiccups present" and "no hiccups"

EDIT: I will continue running this VM periodically from the "lab" (rich term for my rickety setup) and see how it goes in terms of "TCP hiccuping" , will also set up a roughly similar VM testbed for UDP but it's a bit trickier to get good comparator there, UDP support in SOCKS kinda sux)

EDIT:
So running those two (the "through TCP cloak" and "raw TCP socks" chain) on same uplink (good country, no filtering/blocking)

I'm finding that

  1. actually the hiccups with Cloak are very intermittent and "luck based" so maybe something external (network conditions?) is triggering them
  2. never happen on SOCKS-TCP variant so it's not entirely reducible to network problems
  3. playing around with number of connections (and for some reason StreamTimeout though this may be placebo :) ) seems to have some effect, I've found that on my particular connection 5 is the happy connection number (Cloak TCP almost never "hiccups") while 3,4, and 6 all have inferior performance.

@notsure2
Copy link
Contributor

notsure2 commented Aug 7, 2023

Cloak tunnels udp packets inside tcp and since it's now tcp there is no more udp packet loss, so protocols that depend on sensing udp packet loss to optimize their rate get confused. Also it's affected by the same issue you described for tcp as well.

@LindaFerum
Copy link
Author

Hm, I think it has something to do with how Cloak handles its "outer layer" TCP connection (possibly some small intermittent issue in connectivity which is unavoidable at some point triggers it to manifest) and UDP just gets hit harder due to being encapsulated inside affected TCP (so you get "two problems" instead of one in some weird way)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants