Implement RFC8555 (ACME spec) § 8.2 #168

dcow · 2020-01-29T06:20:28Z

Section 8.2 of the ACME spec details exactly how client and server retry should be handled during a challenge validation. We should implement this part of the spec. Namely, retry state needs to be included in the challenge resource, client requests should be throttled, and we should set appropriate retry-after headers on the challenge resource response and include error information about the result of each attempted validation.

More details:
#162 (comment)

wesgraham · 2020-02-04T02:56:05Z

Hey all :)
Quick question on the desired approach:
I'm about to modify both validate() calls (on http01 and dns01 challenges) in acme/challenge.go to account for retries. The simple approach I'm thinking of would be to make a blocking call to time.sleep() in between attempts. I recognize this could also be made non-blocking (using timer tasks/channels) - however it would likely require additional changes to the module. Any thoughts here?

dcow · 2020-02-04T05:20:46Z

We need to be able to cancel future retries if a client request to validate comes in while we'd otherwise be sleeping. As far as I can tell, that pushes things toward the timer approach so we can cancel pending retries and start over when new requests come in. If that's overly complicated, I guess sleeping routines could check whether they've been canceled when they wake up or something, but after 3 minutes of thought, I think the timer approach is still preferable. cc @maraino or @dopey for a second opinion.

wesgraham · 2020-02-05T04:35:07Z

Makes sense - in that case I'll utilize the timer approach. Will also modify ValidateChallenge() in acme/authority.go to actively read retry messages generated by the timer task. Will submit a PR by end of week.

dcow · 2020-04-30T11:57:38Z

I'm picking this up in #242.

tomasfreund · 2021-09-06T12:01:05Z

Hi, any updates on this? Some of our ACME challenges occasionally get stuck in invalid state because of (probably) transient DNS errors or propagation delays. If this was implemented, it would probably fix a lot of our problems. Thanks

dopey · 2021-09-07T23:48:30Z

Hey @tomasfreund we ended up tabling this without merging. Then the implementer moved on to a different project, and we've pretty much let it sit since then because there hasn't been much interest from the community.

To my recollection, my main issue with this PR (in the state that we left it) was that it was "complicated". It required state to be saved between executions of step-ca. The way it was being stored, and then refreshed on start up, wound up being leaked into the ca.json, which I wasn't happy with. Basically, the PR needs more work before we could merge it.

I'll bring this up at our weekly triage meeting, but my guess is that we don't have bandwidth to prioritize this issue at this time, given that the workaround is probably just to retry the request.

dcow · 2021-09-08T00:30:32Z

If you're okay with this only being guaranteed to work in non "HA" contexts (single instance of step-ca) then you should be able to take out the ordinal (implication being that any updates to the retry state would need to be transactional and handled by the backing store and the transaction would need to span the length of the validation attempt) or just leave it in and pin it to 0 in the code and hide it from the config json for the time being.

Beyond that, I believe the remaining work is just one more pass to address outstanding comments on the PR (there are some minor but blocking logic updates needed) and then a pass on the tests to be more comprehensive. I believe I updated the existing tests which cover the changes made in the PR and they generally pass. However, the PR also involves converting some cases where the server would have returned a 4xx into a 2xx + error. I discovered there was no existing coverage for those specific scenarios and recall we wanted to add some coverage as well as run a smoke test using existing clients to make sure they properly handle the updated behavior.

That's about all I remember after a quick skim of #242.

dopey · 2021-09-08T00:37:05Z

Agree w/ everything @dcow said above.

I think we weren't sure at the time, and still aren't, if we wanted folks trying to use an HA setup to need extra configuration. We're no further along in answering that question.

dopey · 2021-09-15T18:00:20Z

Hey @tomasfreund we had the opportunity to discuss this issue a bit this morning and came away with a few questions / comments.

There are some ACME clients that will wait to finalize (or request verification) until they've first verified that the DNS probe is successful. Which ACME client are you using, and is it possible that switching clients would be an easy workaround?
We're trying to gauge how many users are affected by similar issues. For those affected, please drop a thumbs up on the issue (or this comment) or leave a comment. I know it's silly, but it helps us with prioritization. Thanks!

tomasfreund · 2021-09-16T10:01:31Z

Hi, we are using cert-manager which tries to verify the DNS but our dns-01 challenges still get stuck sometimes.
We get There was a problem with a DNS query during identifier validation in the cert-manager challenge and "error":{"type":"urn:ietf:params:acme:error:dns","detail":"There was a problem with a DNS query during identifier validation"} on the CA side.
We have found that creating a new challenge or restarting cert-manager helps most of the time.
I suspect that this might have something to do with DNS propagation - even if cert-manager checks the existence of the DNS record with a lookup, it does not guarantee that is has already propagated to the DNS server the CA is using. If there is no retry there, I can understand why it would fail.
This is a really big issue for us because having to monitor and manually retry challenges defeats the purpose of automatic certificate management.

sourishkrout changed the title ~~Implement RFC8555 § 8.2~~ Implement RFC8555 (ACME spec) § 8.2 Jan 29, 2020

sourishkrout added the good first issue label Jan 29, 2020

wesgraham mentioned this issue Feb 7, 2020

[RFC] Implement ACME RFC 8555, challenge retries #181

Closed

dcow self-assigned this Apr 30, 2020

dcow added area/acme ACME enhancement and removed good first issue labels Apr 30, 2020

dcow added this to the v0.15.0 milestone Apr 30, 2020

sourishkrout linked a pull request May 12, 2020 that will close this issue

ACME (RFC 8555) § 8.2 Challenge Retries #242

Open

sourishkrout mentioned this issue Jun 4, 2020

ACME http01Challenge randomly not works with k8s after pull new docker image #149

Closed

dopey added the needs triage Waiting for discussion / prioritization by team label Sep 7, 2021

dopey removed the needs triage Waiting for discussion / prioritization by team label Oct 13, 2021

maraino removed this from the v0.15.0 milestone Jul 14, 2022

jacobgreenleaf mentioned this issue Dec 10, 2023

[Bug]: DNS-01 ACME validation failures do not mark challenge as invalid #1365

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement RFC8555 (ACME spec) § 8.2 #168

Implement RFC8555 (ACME spec) § 8.2 #168

dcow commented Jan 29, 2020 •

edited

wesgraham commented Feb 4, 2020 •

edited

dcow commented Feb 4, 2020

wesgraham commented Feb 5, 2020

dcow commented Apr 30, 2020

tomasfreund commented Sep 6, 2021

dopey commented Sep 7, 2021

dcow commented Sep 8, 2021 •

edited

dopey commented Sep 8, 2021

dopey commented Sep 15, 2021

tomasfreund commented Sep 16, 2021

Implement RFC8555 (ACME spec) § 8.2 #168

Implement RFC8555 (ACME spec) § 8.2 #168

Comments

dcow commented Jan 29, 2020 • edited

wesgraham commented Feb 4, 2020 • edited

dcow commented Feb 4, 2020

wesgraham commented Feb 5, 2020

dcow commented Apr 30, 2020

tomasfreund commented Sep 6, 2021

dopey commented Sep 7, 2021

dcow commented Sep 8, 2021 • edited

dopey commented Sep 8, 2021

dopey commented Sep 15, 2021

tomasfreund commented Sep 16, 2021

dcow commented Jan 29, 2020 •

edited

wesgraham commented Feb 4, 2020 •

edited

dcow commented Sep 8, 2021 •

edited