Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in log: can't resolve request via upstream server; context deadline exceeded #1430

Open
gspannu opened this issue Apr 4, 2024 · 7 comments
Labels
🐞 bug Something isn't working
Milestone

Comments

@gspannu
Copy link

gspannu commented Apr 4, 2024

Blocky log is reporting errors related to context deadline exceeded

Here is brief blocky.log

ERROR can't prefetch 'security.familybuildingsociety.co.uk' resolution failed: resolver: "upstream 'https://cloudflare-dns.com/dns-query'" error: upstream 'https://cloudflare-dns.com/dns-query': can't resolve request via upstream server https://cloudflare-dns.com/dns-query (https://104.16.248.249:443/dns-query): can't perform https request: Post "https://104.16.248.249:443/dns-query": context deadline exceeded

resolver: "upstream 'https://dns.google/dns-query'" error: upstream 'https://dns.google/dns-query': can't resolve request via upstream server https://dns.google/dns-query (https://8.8.8.8:443/dns-query): can't perform https request: Post "https://8.8.8.8:443/dns-query": context deadline exceeded

[2024-04-04 06:25:30] ERROR can't prefetch 'online.familybuildingsociety.co.uk' resolution failed: resolver: "upstream 'https://dns.google/dns-query'" error: upstream 'https://dns.google/dns-query': can't resolve request via upstream server https://dns.google/dns-query (https://8.8.8.8:443/dns-query): can't perform https request: Post "https://8.8.8.8:443/dns-query": context deadline exceeded

resolver: "upstream 'tcp-tls:dns.google'" error: upstream 'tcp-tls:dns.google': can't resolve request via upstream server tcp-tls:dns.google (8.8.8.8:853): read tcp 81.106.200.72:45735->8.8.8.8:853: i/o timeout

The main error is context deadline exceeded

My blocky configuration

upstreams:
  init:
    strategy: fast
  groups:
    default:
      - https://dns.google/dns-query
      - https://cloudflare-dns.com/dns-query
      - tcp-tls:1dot1dot1dot1.cloudflare-dns.com
      - tcp-tls:dns.google
  strategy: parallel_best
  timeout: 2s
  # userAgent: "custom UA"

Any idea why this happens? Or how to resolve?


As an additional note: I do not get these errors when I use my local Unbound instead of upstream Google/Cloudflare.
Blocky configuration

upstreams:
  init:
    strategy: fast
  groups:
    default:
      - "[::1]:1:53535"
  strategy: strict
  timeout: 2s
  # userAgent: "custom UA"

@talmuth
Copy link

talmuth commented Apr 6, 2024

Two of my docker containers stopped to respond to DNS queries, when I checked console - both of them were reporting same problem. They were unable to reach any upstream, including local BIND that I use to manage internal DNS. Both worked find after simply restarting containers

@gspannu
Copy link
Author

gspannu commented Apr 6, 2024

Yes, a restart usually does fix the issue.... but I would like to understand what is causing the context deadline exceeded error.

@kwitsch
Copy link
Collaborator

kwitsch commented Apr 7, 2024

Some questions about this behavior:

  • is an UDP+TCP resolver configured in the default upstream group?
  • is a bootstrap resolver configured?
  • does blocky run native or containerized?
  • are conditional and custom requests also affected?

@kwitsch
Copy link
Collaborator

kwitsch commented Apr 7, 2024

For the first config in the first post it is advised to configure a bootstrap resolver.

@gspannu
Copy link
Author

gspannu commented Apr 7, 2024

Some questions about this behavior:

  • is an UDP+TCP resolver configured in the default upstream group?
  • is a bootstrap resolver configured?
  • does blocky run native or containerized?
  • are conditional and custom requests also affected?
  1. No. Only DoH and DoT upstream resolvers are configured.
  2. Bootstrap resolver is configured.
  3. Running Native (on OPNsense router, i.e. FreeBSD)
  4. Have not logged or analysed in detail, so not sure.

@kwitsch
Copy link
Collaborator

kwitsch commented Apr 11, 2024

After reading some about the cause it seems most likely that some resources aren't correctly freed in a previous request.

Could you set the log level to debug and provide the entries of the 2-3 queries before it fails again?

@kwitsch kwitsch added the 🐞 bug Something isn't working label Apr 11, 2024
@kwitsch kwitsch added this to the v0.24 milestone Apr 11, 2024
@kwitsch
Copy link
Collaborator

kwitsch commented Apr 11, 2024

My current guess: TCP pool exhaustion in the HTTP client since the defer body closure is missing a drain method

        defer func() {
                // drain body in case of error 
                _,_ := io.ReadAll(httpResponse.Body)
                // close drained body 
                util.LogOnError(ctx, "can't close response body ", httpResponse.Body.Close())
        }()

If my guess is correct you should have log entries starting with http return code should be or http return content type should be prior to the mentioned error in your first post.

@0xERR0R 0xERR0R modified the milestones: v0.24, v0.25 May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants