New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random DNSSEC errors after 12.1 update #890
Comments
Thanks for the feedback. I have observed during some testing that on rare occasions, Cloudflare DNS returns a response without the necessary RRSIG records that are needed to validate the response. When you retry again then you will get back the needed RRSIG. So, it may cause a one-off case like you observed. The case with DNS server is that such a response causes validation failure (RRSIG Missing) and gets cached as a failure (negative) answer for a few seconds. If the same domain is queried again and the cache expires, the DNS server will retry again but till then it will answer from cache. Also, if the cache contains a valid answer which is expired, it wont be overwritten by failure answer. Which is why in your tests you get Since, the domain's RRSIG records are valid only for 2 days in this case, the issue here is noticed. If they had configured the RRSIG with say a 7 days expiry then you would not have noticed this issue at all since the stale answer in the cache would have been valid. If the issue is frequent and is bothering you, try to remove Cloudflare from your forwarders and just keep Quad9 and observe if this issue repeats. |
Thank you for the comprehensive response. I have now removed Cloudflare DNS from my forwarders list and will follow up in a few days to report whether this resolves the issue. If the issue lies with Cloudflare's DNS, it would be wise to report it on the Cloudflare forums. Should I post it or do you prefer to do so yourself? In the case of the former, is there any extra detail that I should place in the forum post? |
This issue of missing RRSIG record is not reproducible. It happens very rarely and retrying it makes it go away. So not sure how this can be demonstrated to them to be taken seriously. |
After a few days of operating the DNS Server without CF in the forwarder list, I must regrettably confirm that the issue persists, occurring at the same frequency as before, approximately 25 times per day. Upon a more detailed examination of the logs, I've observed that a domain which encounters this issue once is likely to experience it again, whereas most other domains will not. The distinguishing factor is unclear, but it appears that something within those particular domains irritates the DNS Server. Below is a list of domains identified so far that have triggered this issue:
I am willing to share my full server logs if they are of any help to you, although I will prefer to do this through a private channel, as I am not to kind of the idea of it being public on the Internet.
I see, nothing to do in this front then. |
Thanks for the analysis and details. Will try to see how this issue can be mitigated. |
First of all: Thanks for the amazing work, this is the first issue I ran into, other than this everything has been flawless! I observed the same issue with www.paypal.com and www.portainer.io in the last days and I'm able to reproduce the issue with at least some of the domains listed above by claudio4. |
Thanks for the feedback. Will add mitigations to fix this issue. |
I also have issues with DNSSEC in 12.1. When enabled, using Quad9 as forwarder (also tried cloudflare), I frequently cannot resolve paypal.com. I've had to disable DNSSEC with 12.1 |
Another Quad9 user here experiencing this issue. I use Quad9 DNS-over-TLS (secured) as resolver. It fail resolving |
Thanks for the feedback. Can you share the output from DNS Client to know if its same issue or something different. Also, were you using encrypted DNS protocol with Quad9? |
Thanks for the feedback. Can you share the output from DNS Client to know if its same issue or something different? |
Sure, here is for another domain but the log is full of these exceptions:
|
After upgrading my server to version 12.1 using the Docker image, I've observed that the resolution of certain domains occasionally fails at random. When this issue arises, attempts to resolve the domain continue to fail for a period, and then, after about a minute, the issue resolves itself without any manual intervention. It's important to note that this only occurs with certain domains, and while I have not observed it affecting two different domains simultaneously, I cannot rule out the possibility. Meanwhile, the resolver functions perfectly for other domains, even when the affected ones are failing.
I checked the logs and when this issue occurs, this exception gets printed in the log:
If I use the DNS Client built-in in the webUI with the server set to "This Server" and with "Enable DNSSEC Validation" and the issue arises, I get this response:
But just waiting a bit a pressing the resolve button again gets me this successful response:
The text was updated successfully, but these errors were encountered: