Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add serve-expired option #918

Open
notherealmarco opened this issue May 10, 2024 · 11 comments
Open

[Feature Request] Add serve-expired option #918

notherealmarco opened this issue May 10, 2024 · 11 comments

Comments

@notherealmarco
Copy link

From Unbound documentation:

serve-expired: If enabled, Unbound attempts to serve old responses from cache
with a TTL of serve-expired-reply-ttl in the response without
waiting for the actual resolution to finish. The actual resolu-
tion answer ends up in the cache later on. Default is "no".

When enabled, if a user queries "example.com", and the server has the answer in the cache, but it's expired, it will answer with a short TTL (like 1 second) while refreshing the value in the cache at the same time.
The vast majority of the times, the IP stays the same even if the TTL expires, and in the rare event that it changes it's not a big deal as the server answered with a very short TTL value, so the client will ask again.
The TTL used in the answer should be configurable by the user.

This improves the performance a lot, and for home usage is totally fine, I have used this feature in the past with other dns resolvers like unbound or adguard home.

Would be awesome to see this feature in Technitium!

@ShreyasZare
Copy link
Member

ShreyasZare commented May 10, 2024 via email

@notherealmarco
Copy link
Author

notherealmarco commented May 10, 2024

Thanks for the request. This feature is already available as Serve Stale option in Settings > Cache section. The feature implements RFC 8767 and is enabled by default.

I think "Serve Stale" only serves the expired record when the server is unable to reach the authoritative name servers, or am I wrong?

What I mean instead, is an option that makes resolutions faster

@ShreyasZare
Copy link
Member

I think "Serve Stale" only serves the expired record when the server is unable to reach the authoritative name servers, or am I wrong?

What I mean instead, is an option that makes resolutions faster

Its exactly the same feature. The DNS server will respond with expired record from cache after waiting for max 1800ms (i.e. just before client timeout of 2000ms) if no upstream/authoritative server responded by that time. So, if there is data in cache then it will always be returned even if the data failed to refresh.

I will update the description in the GUI so that it becomes a bit clear.

@notherealmarco
Copy link
Author

Its exactly the same feature.

it's not. What I meant was an option to make the DNS respond immediately with the expired record and a very low TTL, without waiting 1800ms. It's a completely different thing from RFC 8767 (they should cohexist).
This is implemented in AdGuard Home (cache_optimistic option) and in Unbound (serve-expired option) and it makes a huge difference.
Sorry if I haven't been clear

@ShreyasZare
Copy link
Member

it's not

Point taken. The intention of saying its the same feature was that for the end user, it does not make a difference. This is since, in vast majority of cases, the resolution occurs within a couple of 100ms and thus the wait time is the same and not 1800ms. The 1800ms is the timeout value (max wait time) which is when the waiting stops and stale data from cache is served as a response before the client times out.

The serve stale feature will also respond with stale data immediately without waiting if it had recently tried and failed to refresh the answer similar to how Unbound does by default.

The Unbound implementation is prior to RFC 8767, which is mentioned in the section 8 of the RFC. The serve stale waiting feature is slightly better compared to answering expired data immediately. Take for example, a domain was resolved on Friday and its response was cached. It was again resolved on Monday (2 days later). If the expired data is served immediately, rare chances are that the IP address may have been updated and the user sees an error. Waiting for few milliseconds will give the updated IP address preventing such a scenario.

@raphielscape
Copy link

The serve stale feature will also respond with stale data immediately without waiting if it had recently tried and failed to refresh the answer similar to how Unbound does by default.

Unbound doesn't depend on whether the refresh fails or not, quoting their documentation:

Before trying to resolve, Unbound will also consider expired cached records as possible answers. If such a record is found it is immediately returned to the client (cache response speed!). But contrary to normal cache replies, Unbound continues resolving and hopefully updating the cached record.
With prefetch, Unbound tries to update a cached record (after first replying to the client) when the current TTL is within 10% of the original TTL value. The logic is similar to serve-expired: if a cached record is found and the record is within 10% of the TTL, it is returned to the client but Unbound continues resolving in order to update the record. Although prefetching comes with a small penalty of ~10% in traffic and load from the extra upstream queries, the cache is kept up-to-date, at least for popular queries.

Serving stale data increases cache hit and makes DNS resolution a lot faster for the client, expired data is not served immediately if the cached TTL is less than 10% of upstream TTL.

@ShreyasZare
Copy link
Member

The serve stale feature will also respond with stale data immediately without waiting if it had recently tried and failed to refresh the answer similar to how Unbound does by default.

Unbound doesn't depend on whether the refresh fails or not, quoting their documentation:

I did not say Unbound does that. I said that Technitium DNS server's serve stale responds immediately only if it had recently tried and failed to refresh. Otherwise, it waits to see if the data can be refreshed before serving the stale response.

Before trying to resolve, Unbound will also consider expired cached records as possible answers. If such a record is found it is immediately returned to the client (cache response speed!). But contrary to normal cache replies, Unbound continues resolving and hopefully updating the cached record.
With prefetch, Unbound tries to update a cached record (after first replying to the client) when the current TTL is within 10% of the original TTL value. The logic is similar to serve-expired: if a cached record is found and the record is within 10% of the TTL, it is returned to the client but Unbound continues resolving in order to update the record. Although prefetching comes with a small penalty of ~10% in traffic and load from the extra upstream queries, the cache is kept up-to-date, at least for popular queries.

Serving stale data increases cache hit and makes DNS resolution a lot faster for the client, expired data is not served immediately if the cached TTL is less than 10% of upstream TTL.

Technitium DNS Server also has prefetch and auto prefetch features which work in background to update the data in cache before it expires.

@raphielscape
Copy link

raphielscape commented May 15, 2024

Yes, but the current implementation optimistically assumes that every query is fast (that it can resolve in less than ~1.8 seconds) rather than optimistically assuming that every query is slow and has a major potential to be slow, even a 1-second delay in resolving itself is pretty noticeable, some ISPs also sometimes throttle DNS requests that are not from their servers to the extreme, or that they have a strict queue because of oversubscribing that causing DNS request to be delayed a lot, making the DNS query sometimes slowed down, satellites network also have pretty high latency to sometimes making DNS queries miss the target (Starlink during peak times also sometimes has a high DNS query time), waiting for the data to be successfully refreshed also can delay resolution by a lot in this cases

@notherealmarco
Copy link
Author

notherealmarco commented May 15, 2024

1-second delay in resolving itself is pretty noticeable

the resolution occurs within a couple of 100ms and thus the wait time is the same

Considering that nowadays a lot of people (including me) have Gigabit internet service, 100ms is a lot for a DNS lookup, considering that to load a website correctly, dozens of DNS lookups have to be done.
I tried using adguard home with the optimistic cache enabled, and after the cache is populated, most websites load instantly!

rare chances are that the IP address may have been updated and the user sees an error

it surely could happen, but it's really rare, and since the DNS server should answer with a low TTL, a page refresh will fix it (as in the meantime the server has updated the cache).
I used the optimistic cache feature in AdGuard for months and never encountered such an issue.
Anyway, this should be a feature that is disabled by default to prevent unwanted scenarios, but would be awesome to see it in Technitium DNS.

@ShreyasZare
Copy link
Member

Considering that nowadays a lot of people (including me) have Gigabit internet service, 100ms is a lot for a DNS lookup, considering that to load a website correctly, dozens of DNS lookups have to be done.
Let's imagine the scenario where a webpage has 9 external resources (javascripts, images from CDNs...), this will require 10 DNS lookups, and considering an average of 100ms for a lookup, the browser will spend 1s in DNS lookups only, which nowadays is a lot. For this reason, even 100ms delay is pretty noticeable.

I think you have a confusion on how the serve stale feature works. Its not going to wait for 1800ms or 100ms for all queries. If the resolution happens in 1ms, it will instantly respond and not wait for anything. The max wait of 1800ms is only a timeout value that would be hit only in cases when the upstream server does not respond quicker than that.

Another thing is that, these DNS lookups happen concurrent. So you cannot add up time taken by 10 lookups to get total time. So its going to take ~100ms for 10 lookups and not 1 sec in that imaginary scenario where it took 100ms for upstream to respond for each request.

it surely could happen, but it's really rare, and since the DNS server should answer with a low TTL, a page refresh will fix it (as in the meantime the server has updated the cache).

Web browsers also cache DNS responses for a minimum period so refreshing does not fix this immediately.

I tried using adguard home with the optimistic cache enabled, and after the cache is populated, most websites load instantly!

You will get similar experience with this serve stale feature too.

Anyway, this should be a feature that is disabled by default to prevent unwanted scenarios, but would be awesome to see it in Technitium DNS.

satellites network also have pretty high latency to sometimes making DNS queries miss the target

@raphielscape Agreed that on some type of networks, it would cause a longer wait. The serve stale standard does not have any specification for such scenarios.

I am exploring how such an option could be added with the current serve stale implementation. Will update here once a solution is found.

@ShreyasZare
Copy link
Member

I am adding a few options in the next update that will allow changing the serve stale feature internal parameters. One of those option will allow to update the wait time which can then be set to 0 to allow unbound like working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants