Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: retry on DNS queries. #1180

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

ldez
Copy link
Member

@ldez ldez commented Jun 3, 2020

Fixes #1008

@MichaelMure
Copy link

Tested today, it does improve the situation. However, I would advice to also fail the process early here if a DNS error still happen despite the retry.

@ldez ldez requested review from wyattjoh and dmke June 8, 2020 16:08
@ldez
Copy link
Member Author

ldez commented Jun 8, 2020

However, I would advice to also fail the process early here if a DNS error still happen despite the retry.

This not needed because the DNSError don't produce a retry and by example a timeout can have several reasons.

Copy link
Member

@dmke dmke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what the expected behaviour for --dns-timeout ought to be. If I provide --dns-timeout=30, I'd expect the SOA query to finish within 30s (give or take a few seconds). Please correct me if I'm wrong, but I believe the query now might take up to 2 minutes.

Comment on lines +239 to +241
bo.InitialInterval = dnsTimeout
bo.MaxInterval = 2 * bo.InitialInterval
bo.MaxElapsedTime = 4 * bo.InitialInterval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sets the max. wait time to 2 minutes if I invoke lego with --dns-timeout=30, doesn't it? Should bo.InitialInterval be dnsTimeout/4 instead?

Copy link
Member Author

@ldez ldez Jun 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial interval must be the same as the timeout use by the DNS client.

The dnsTimeout cannot be used as MaxElapsedTime because it can be too short with the current value.

Currently dnsTimeout is the timeout for one DNS query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial interval must be the same as the timeout use by the DNS client

I see. Then the documentation of the --dns-timeout flag needs to be modified to something like "this is the minimum timeout given to resolve SOA queries; on network errors the actual timeout might be considerably longer".

Alternatively, we can reduce the default timeout down to something like 2 or 3s to approximate the previous behaviour (on that note, this should work in most cases, because DNS usually resolves fast enough). To accommodate #1008, we could also bump bo.MaxInterval and bo.MaxElapsedTime to 10*initialInterval and 20*initialInterval respectively.

Copy link
Member Author

@ldez ldez Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reduce the default timeout.

But I think that multiply by 10 or 20 is too big: the dnsQuery is used by several functions in a loop (by example a loop on ns) and this can become very very long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I've missed the context; I thought this was part of the fetchSoaByFqdn methods. Sorry for the confusion on my part.

challenge/dns01/nameserver.go Outdated Show resolved Hide resolved
Comment on lines +239 to +241
bo.InitialInterval = dnsTimeout
bo.MaxInterval = 2 * bo.InitialInterval
bo.MaxElapsedTime = 4 * bo.InitialInterval
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I've missed the context; I thought this was part of the fetchSoaByFqdn methods. Sorry for the confusion on my part.

@ldez ldez removed this from the v3.8 milestone Jul 2, 2020
@ldez
Copy link
Member Author

ldez commented Jul 2, 2020

I am wondering about the impact and side effects of this PR. I need to take a little more time to think.

@ldez ldez added this to the v3.9 milestone Jul 2, 2020
@ldez ldez removed this from the v3.9 milestone Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Issues creating certificates for subdomain with route53
3 participants