New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle multiple concurrent Azure DNS01 challenges for the same FQDN #6351
Handle multiple concurrent Azure DNS01 challenges for the same FQDN #6351
Conversation
/ok-to-test |
/retest |
1 similar comment
/retest |
d2e2b24
to
d93a816
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Alright rebasing seems to have fixed the bot and tests not passing |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. |
/remove-lifecycle stale I'll rebase this eventually ... |
@eplightning @irbekrm @maelvls Sorry for tagging you, but is there any update on this whatsoever? The PR was created quite a long time ago and similar fixes have been merged across other DNS providers. There has been quite a lot of mentions on this e.g.:
I can confirm this also happens on a single cluster attempting to issue either multiple certificates or multiple dnsNames using the same less-priviliged domain in Azure. I'm not a GO expert, but I have Azure infrastructure available to help you test, if you instruct me how. Currently I'm installing cert-manager through the official helm chart, but I imagine I'll have to do something else in this case. Furthermore, is there any known workaround as long as this issue persists? // Oliver |
Signed-off-by: Bartosz Slawianowski <bartosz.slawianowski@natzka.com>
Signed-off-by: Bartosz Slawianowski <bartosz.slawianowski@natzka.com>
d93a816
to
8bb7273
Compare
Signed-off-by: Bartosz Slawianowski <bartosz.slawianowski@natzka.com>
8bb7273
to
747d88c
Compare
Signed-off-by: Bartosz Slawianowski <bartosz.slawianowski@natzka.com>
@oliverplann I rebased the PR, so whatever happens next is up to this project's maintainers. I'm not aware of any workarounds, whenever this issue happens either a manual DNS update is required or the certificate issuance needs to be somehow attempted again - probably by removing CertificateRequest? (I don't really remember at this point honestly). |
d643cb8
to
0f6eaa9
Compare
/retest |
Thank you for acting quick. I hope the maintainers will prioritize this... |
TLDR Original Similarly to a multi-cluster setup, if you are using a delegation domain (source) to issue multiple certificates or As long as this bug exists, it's facilitated through the default value of I've found a suitable workaround until this PR has been adressed. Sharing it here in case anyone else needs a working cert-manager without manually interacting with cert-manager resources or Azure DNS: Using Helm: Set the value Keep in mind that It will take significantly longer for cert-manager to validate challenges and issue certificates this way, depending on the amount of certs and Here's an example of what
|
Hello @eplightning I've tested your PR using AKS & Azure DNS and it works perfectly! Thank you very much for your contribution. My findings below. Challenges are now adding separate entries/values to the
Secrets & certificates are created:
Please let me know if anything is missing before we can merge this PR :-) // Oliver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
We're not really able to easily test Azure changes so big thanks to @oliverplann for testing this! And of course, thank you to everyone else involved for your patience and your efforts here. This should make it into 1.15 for sure.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SgtCoDFish The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pull Request Motivation
This PR fixes a race condition in Azure DNS that happens when issuing certificates concurrently in different clusters as described in #6229.
Similar issues were already fixed for AWS/GCP/Cloudflare in other PRs:
To make this work in Azure DNS, this PR makes use of optimistic concurrency control supported by Azure DNS API via standard If-Match, If-None-Match, Etag headers: https://learn.microsoft.com/en-us/rest/api/dns/record-sets/create-or-update?tabs=HTTP
Current version simply unconditionally overwrites any changes. New version introduced in this PR does the following for all updates:
Try to fetch an already existing matching record. If found, Azure DNS will return the record set and its version (Etag).
Modify record set as necessary: Append if doesn't exist on
Present
, remove if exists onCleanUp
Call appropriate Azure API:
If after modifications no records remain, call
Delete
API. Etag will ensure we will only delete record set if no modifications happened concurrently. Call is skipped if step 1 didn't find a matching record set.If there are records and a matching record set was not found: call
CreateOrUpdate
API withIf-None-Match: *
. This will ensure we only create and not update (otherwise we would overwrite other changes).If there are records and we found a matching record set: call
CreateOrUpdate
API withIf-Match: $Etag
. Call will only succeed if no other updates happened concurrently.Kind
/kind bug
Release Note