New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection between Peers Fails under higher Loads #2216
Comments
Interesting. Some questions: (1) How many streams is Veeam creating and using during these congestion incidents? Try stopping the job when it appears congested and try to send some data between the two ZT nodes by some other method like |
I'll try to reproduce a similar behaviour without Veeam later. I'm not sure if Veeam is doing anything special, but it's definitely taking down a ZT Connection completely 😁 |
Ok thanks for the info, but when the backup job seems to be failing the only way we can tell if the ZT link is actually down is if we cease the job and then try to use the link with something else. Otherwise a ping is just another packet competing for a scarce resource that it probably won't get. Can you try that? I'll try to do some saturation testing on my end. |
As soon as i cease the Backup Job, it takes about 10 seconds and the link starts working again, so it is only temporarily failing. I'll report later on some test results. |
We've seen something similar. If you make 2 small, single CPU VMs and then iperf between them, they fall over. Capping the bandwidth on the zt interface to some lower number seemed to avoid it. You need to experiment a little to find the best limit. I'm not familiar with how to do it on Windows. It might be a little more tricky if it's incoming traffic. Maybe Veeam has configuration options for that. You also might be able to run multiple instances of ZeroTier on your server. 1 per CPU core? Sorry to interrupt. Just adding a little more context. |
I'm not able to reproduce the issue between the two same hosts with iperf and multiple connections only. Also in a test environment with 2 vCPUs and 50 iperf connections I'm getting 1.38 Gbit/s without big congestion problems. Somehow Veeam is triggering something that ZT doesn't like, it seems to be related to simultaneous streams but not only to this. Wireguard or Tailscale are not affected in a similar way. Could it be related to ZTs MTU of 2800, Jumbo Frames, Layer 2? Sounds stupid, but to me it seems as if the ZT connection is saying to Veeam: Bring it on, I have a lot more bandwidth, I'll handle it, but then drops everything because the actual link is extremly congested and queuing is no longer possible. As if Veeam can't detect how congested the link already is. Any idea how to test or debug it further? |
We’re experiencing that the network connection between Peers on a ZT network fails under higher loads with many connections.
In our case we‘re using ZT to connect a Windows Server with a Backup Storage and we‘re using Veeam for Backups. In the default config, Veeam uses many simultaneous connections to exhaust the maximum bandwidth. The available bandwidth between the two peers is around 900 Mbit/s. If we run the job, the ZT connection gets really bad (Pings with 8x time, packet loss) and after a few minutes it drops completely and recovers after the Backup Job finally fails.
If we configure Veeam to only use a single connection and if we limit the bandwidth to around 600 Mbit/s everything works and the ZT network keeps stable.
I‘d expect ZT to handle high traffic loads correctly, but currently it seems that high loads / especially many connections are a problem for ZT.
The text was updated successfully, but these errors were encountered: