Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

certbot-dns-ovh: old DNS entries are not removed, leading to a renewal failure #9888

Open
przemub opened this issue Feb 6, 2024 · 3 comments
Assignees

Comments

@przemub
Copy link

przemub commented Feb 6, 2024

My operating system is (include version):

Ubuntu 22.04.3 + Docker 25.0.2

I installed Certbot with (snap, OS package manager, pip, certbot-auto, etc):

certbot/dns-ovh Docker image, tag 666c3768e2e5, certbot version 2.8.0

I ran this command and it produced this output:

certbot renew

certbot_1            | Renewing an existing certificate for prem.moe and 5 more domains
certbot_1            | Failed to renew certificate prem.moe-0001 with error: Error adding TXT record: 400 Client Error: Bad Request for url: https://eu.api.ovh.com/1.0/domain/zone/prem.moe/record

Certbot's behavior differed from what I expected because:

It crashes when verifying the second of the six zones due to a 400 error returned on an update. I looked into it and there were ~300 acme_challenge entries over all 6 DNS zones. The first one had over 200 entries. Removing the old entries made the verification pass and work fine. I suspect that the timestamp or the token expires at some point during the processing and the API returns 400.

I'd expect that the records are always cleared, including when the certbot crashes. The problem was exacerbated since on each try new records were created in the first of the zones. Maybe deleting the existing entries at the start of the verification would be a good idea?

Here is a Certbot log showing the issue (if available):

Logs are stored in /var/log/letsencrypt by default. Feel free to redact domains, e-mail and IP addresses as you see fit.

letsencrypt.log

Here is the relevant nginx server block or Apache virtualhost for the domain I am configuring:

Irrelevant.

@avocadio
Copy link

avocadio commented Feb 8, 2024

Same issue observed.

@bmw
Copy link
Member

bmw commented Feb 9, 2024

@adferrand, if and only if you have the time and interest to take a look at this, I'd appreciate you doing so. if not, I can probably start poking at it late next week.

I don't think we necessarily have to fix this now, but I think it'd be good to at least understand the problem to help us know how to prioritize it. I believe our ovh plugin code is trying to clean up after itself.

@bmw bmw self-assigned this Feb 9, 2024
@bmw bmw added the area: dns label Feb 9, 2024
@bmw
Copy link
Member

bmw commented Feb 16, 2024

after looking at the attached log and https://github.com/AnalogJ/lexicon/blob/v3.17.0/src/lexicon/_private/providers/ovh.py, it doesn't seem to me that certbot is leaking any records at least on that run. that lexicon file contains a number of helpful logging messages and my read after searching the log for create_record and delete_record is certbot failed to create the first record and then tried to delete all records that would have been created on that run which is presumably a noop.

i think it'd be helpful to see a log, ideally from the latest version of certbot, that doesn't have this pattern. while certbot does not attempt to clean up records from prior runs (and I'm hesitant to make it try to as I think it could delete something the user didn't want us to), it does at least try to clean up all records it creates on the given run even if it crashes so seeing a log where that process fails would helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants