Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent "failed to exchange" errors #396

Open
timkgh opened this issue May 6, 2024 · 11 comments
Open

Frequent "failed to exchange" errors #396

timkgh opened this issue May 6, 2024 · 11 comments

Comments

@timkgh
Copy link

timkgh commented May 6, 2024

dnsproxy[1303399]: 2024/05/06 00:40:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exchange

dnsproxy v0.71.1

Requests also take a long time to time out when this happens, around 6s.

It happens for both DoH (h3://) and DoT (tls://) upstreams, on Linux both amd64 and arm64. Observed for both Quad9 and Cloudflare upstreams.

@ameshkov
Copy link
Member

ameshkov commented May 6, 2024

We'll need more information about this, could you please record a full log?

@timkgh
Copy link
Author

timkgh commented May 7, 2024

Does this help?

2024/05/05 22:01:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exc
hange ;play.google.com.        IN         A in 26.008209ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAABAAA...": use of closed network connection

2024/05/06 00:40:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exc
hange ;addons-pa.clients6.google.com.        IN         A in 5.979076ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAAB...": H3_NO_ERROR

2024/05/06 11:10:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exchange ;waa-pa.clients6.google.com.        IN         A in 31.069075ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAABAAA...": use of closed network connection

2024/05/06 13:05:59 [error] dnsproxy: upstream https://1.1.1.2:443/dns-query failed to exchange ;addons-pa.clients6.google.com.        IN         A in 26.284076ms: requesting https://1.1.1.2:443/dns-query: Get_0rtt "https://1.1.1.2:443/dns-query?dns=AAABAAA...": use of closed network connection

@ameshkov
Copy link
Member

ameshkov commented May 8, 2024

Not really:(

We'll need to see the whole lifetime of a connection to figure out what's going on with them.

Could you please share the log in private via devteam@adguard.com?

@timkgh
Copy link
Author

timkgh commented May 8, 2024

H3_NO_ERROR looks strange, doesn't it?

Can this not be reproduced/debugged on your side by using --all-servers with h3:// and tls:// for both 9.9.9.11 and 1.1.1.2? I'm sure you'll get to see these errors after a while.

@ameshkov
Copy link
Member

ameshkov commented May 9, 2024

All those related to H3/Quic may be caused by different quic-go library issues, I am tbh more interested in what you had with DoT.

Errors in the log may be reproduced, at some point the server closes the connection and we handle this gracefully, just repeating the request. I don't yet understand where the 6 seconds timeout is getting from. If it happens with DoT than this is very strange. If it happens with H3/DoQ then it's understandable since quic-go does not handle GOAWAY frames from the server properly at the moment.

@timkgh
Copy link
Author

timkgh commented May 9, 2024

The timeouts (or errors that take a long time) happen with TLS too:

2024/05/09 10:38:24 [error] dnsproxy: upstream tls://149.112.112.11:853 failed to exchange ;fireoscaptiveportal.com.        IN         A in 6.112033ms: reading response from tls://149.112.112.11:853: EOF

2024/05/09 14:01:15 [error] dnsproxy: upstream tls://[2620:fe::fe:11]:853 failed to exchange ;vscode-sync.trafficmanager.net.        IN         A in 5.283909ms: reading response from tls://[2620:fe::fe:11]:853: EOF

@ameshkov
Copy link
Member

I inspected the DoT upstream code and tbh I believe this is some network issue and not a bug in dnsproxy.

DoT retries establishing a new connection when there's any error with the cached one but according to your log even though the new connection was established it's getting closed right away. I don't see how dnsproxy itself could do that in the code.

@iJorgen
Copy link

iJorgen commented May 17, 2024

The timeouts (or errors that take a long time) happen with TLS too:

I was suddenly seeing timeouts too with DoT using v0.71.1, so removed all DoT from upstream-servers last week and only using Quic/H3 now without any issues. Strange...

@ameshkov
Copy link
Member

@iJorgen the same DNS upstream?

@iJorgen
Copy link

iJorgen commented May 17, 2024

@iJorgen the same DNS upstream?

Sorry @ameshkov, only keeping logs for a few days and not using DoT upstreams at the moment.
Smells like a new release of DNSProxy soon, so might try DoT again. 😃

@timkgh
Copy link
Author

timkgh commented May 17, 2024

I inspected the DoT upstream code and tbh I believe this is some network issue and not a bug in dnsproxy.

DoT retries establishing a new connection when there's any error with the cached one but according to your log even though the new connection was established it's getting closed right away. I don't see how dnsproxy itself could do that in the code.

I observe the same issues on 2 different cloud provider networks, both IPv4 and IPv6, 2 different architectures (Linux amd64 and arm64), 2 different upstreams (Quad9, Cloudflare) and for both DoT and DoH. Maybe there are no actual issues and it's just log noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants