Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New plugin "reverse lookup cache" #6459

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

czeumer
Copy link

@czeumer czeumer commented Jan 11, 2024

1. Why is this pull request needed and what does it do?

It adds a new plugin that allows IP addresses returned by name lookups (A/AAA) to be cached and used as results for later reverse DNS requests (PTR).

It is useful in scenarios where monitoring or security solutions need to correctly resolve the target DNS name of an IP address. Since many services hosted on content delivery networks or in the cloud do not have matching PTR records, this plugin can greatly improve the information value of IP addresses used in a certain context. For example, the console.aws.amazon.com's IP resolves to a751973eac2731385.awsglobalaccelerator.com in a reverse lookup, and the IP addresses of login.microsoft.com does not resolve at all.

The plugin uses groupcache in a clustered environment. On a Kubernetes cluster, the cache uses service discovery.

2. Which issues (if any) are related?

3. Which documentation changes (if any) need to be made?

The plugin could be added to the list of plugins available.

4. Does this introduce a backward incompatible change or deprecation?

no

@Tantalor93
Copy link
Collaborator

As a start, maybe we should consider making this plugin an external plugin?

.vscode/local_dnstap.sh Outdated Show resolved Hide resolved
Copy link
Collaborator

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why this isn't part of the regular cache plugin?

Dockerfile Outdated Show resolved Hide resolved
@SuperQ
Copy link
Collaborator

SuperQ commented Jan 11, 2024

FYI, github.com/golang/groupcache is a bit of an unmaintained package. I haven't fully reviewed this, but are you sure groupcache is appropriate here? The groupcache data model is such that there is no TTL concept. So results in the cache may never evicted or refreshed. This doesn't match well with DNS results which are explicitly TTL'd.

@czeumer
Copy link
Author

czeumer commented Jan 11, 2024

I'm wondering why this isn't part of the regular cache plugin?

The PTR is generated synthetically. So for the original cache plugin there is actually nothing to cache.

@czeumer
Copy link
Author

czeumer commented Jan 11, 2024

We chose groupcache because it is already in the dependencies of coredns, and I tried to introduce as few new dependencies as possible. The actual caching is done in memory on each node. groupcache is used to query peer nodes if they hold information for local cache misses (cache filling). Regarding the TTL, we chose not to use the TTL set on the original replies for cache expiration but a globally configured value. Many replies have very short TTLs set. The main use-case for the plugin is looking up names when enriching logging and monitoring data. This may happen quite a while after the official expiration of the original answer. For a use case where the results must be aligned, could we introduce a special cache TTL configuration value of original?

@chrisohaver
Copy link
Member

chrisohaver commented Jan 11, 2024

This sounds useful to me, and IMO the use case is compelling enough for consideration as an internal plugin.

One possible issue is that the answers this plugin produces will be slightly different than "real" answers, and which one we respond with will depend on order and timing of prior lookups. This will need to be addressed in the README. A potential problematic permutation being that if the upstream reverse lookup is a negative answer with a long TTL, it could effectively block the response that this plugin produces until the TTL expires. Other less problematic differences could include the number of records with the PTR name (e.g. as a result of n:n PTR:A relationships).

@chrisohaver
Copy link
Member

Another issue: This doesn't work with multiple instance of CoreDNS, since they each cache responses independently. From a client perspective this would result in flip-flopping of answers between real answers and the synthesized ones, depending on which instance handles the PTR request.

@czeumer
Copy link
Author

czeumer commented Jan 11, 2024

Another issue: This doesn't work with multiple instance of CoreDNS, since they each cache responses independently. From a client perspective this would result in flip-flopping of answers between real answers and the synthesized ones, depending on which instance handles the PTR request.

This is only true, if the groupcache feature is not enabled. With groupcache enabled locally missing items will be queried from the groups peers. The flipping you mentioned was the actual trigger to implement a distributed caching mechanism.

@chrisohaver
Copy link
Member

We chose groupcache because it is already in the dependencies of coredns

It's was indirect before, but with this proposed change, a direct dependency.

@czeumer
Copy link
Author

czeumer commented Jan 11, 2024

This sounds useful to me, and IMO the use case is compelling enough to include as an internal plugin.

One possible issue is that the answers this plugin produces will be slightly different than "real" answers, and which one we respond with will depend on order and timing of prior lookups. This will need to be addressed in the README. A potential problematic permutation being that if the upstream reverse lookup is a negative answer with a long TTL, it could effectively block the response that this plugin produces until the TTL expires. Other less problematic differences could include the number of records with the PTR name (e.g. as a result of n:n PTR:A relationships).

The possible inconsistency is a very good point. I´ve updated the readme.

The 1:n problem is partly tackled by sorting the results by last "active" query. But this is only usable for clients beeing actually aware of this sorting. The basic assumption is, that most clients will simply select the first returned answer an thus the most probable one.

@chrisohaver
Copy link
Member

I'm wondering why this isn't part of the regular cache plugin?

The PTR is generated synthetically. So for the original cache plugin there is actually nothing to cache.

Original cache plugin caches any non-error responses. That a record is synthesized would not matter.

Order of plugins matters here though - in directives you've put it before cache plugin, in which case it would not get cached. In plugin.cfg though, you've put it after cache, in which case the responses it synthesizes would be cached. (this needs to be made consistent BTW, so that the order does not change when someone compiles in new external plugins)

Signed-off-by: Carsten Zeumer <carsten.zeumer@autonubil.de>
@czeumer czeumer force-pushed the reverse-dns-cache branch 2 times, most recently from b9128d9 to 45d2700 Compare February 7, 2024 16:09
Signed-off-by: Carsten Zeumer <carsten.zeumer@autonubil.de>
@chrisohaver
Copy link
Member

Looking back on this, there are several of caveats (most are unavoidable).

  • The most prominent/obvious one is that a reverse response cannot be synthesized until a lookup on the name is done. IOW sometimes it won't provide an answer.
  • There is also the issue of multiple hostnames with the same IP address, in which case arbitrary subset of names would be synthesized depending on which names had been looked up. IOW when it does provide an answer, it can be inconsistent.
  • It also creates answers that are potentially different than what the upstream (down chain) would respond with, which adds another layer of inconsistency since this plugin trumps the upstream response when it can answer (this could be avoided - but I think it's done intentionally per the use case described).

These make the behavior somewhat inconsistent. For that reason I'm leaning toward making this an external plugin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants