Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: resolve domains in enode strings #8188

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Rjected
Copy link
Member

@Rjected Rjected commented May 9, 2024

Introduces a type that should properly deserialize bootnode enode strings with domains, since the current NodeRecord type does not do this.

fixes #8187

@Rjected Rjected added A-devp2p Related to the Ethereum P2P protocol A-cli Related to the reth CLI A-discv4 Related to discv4 discovery A-discv5 Related to discv5 discovery labels May 9, 2024
Copy link
Member

@emhane emhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to have the new type DNSNodeRecord live in reth-net-common

Comment on lines +436 to +435
pub fn from_unsigned(node_record: DNSNodeRecord) -> Result<Self, secp256k1::Error> {
let DNSNodeRecord { host, udp_port, id, .. } = node_record;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this change before discv5? it doesn't really make much sense to break the public API here, which I know for example flashbots is using already. better to read the boot nodes from file into DNSNodeRecord, and then resolve them to NodeRecord once before adding to either discv4, discv5 or rlpx.

f.write_char(':')?;
self.tcp_port.fmt(f)?;
if self.tcp_port != self.udp_port {
f.write_str("?discport=")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should open an issue to spec this behaviour with discport


// resolve trusted peers if they use a domain instead of dns
for peer in &config.network.trusted_peers {
let resolved = resolve_dns_node_record(peer.clone())
Copy link
Contributor

@sergerad sergerad May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to resolve DNS at this point?

When deploying geth on k8s, we have found its behaviour to eagerly resolve DNS of hostnames in enode strings to be a blocker for trusted peer setups (e.g. trying to resolve hostname for a clusterip of a statefulset that is being deployed in parallel, preventing peers from starting up further).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it work for this use case, what would be the best way to resolve?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hostname for a statefulset pod on k8s should stabilize within a few seconds of the pod being up and running. The issue with geth was that the pods would terminate almost instantly on start up due to the hostname resolution failure.

To give us the best shot of not having the same issue, we could consider some or all of these options:

  1. Build in a configurable number of retries (and maybe backoff strategy) for resolving peer hostnames;
  2. Make the type for accessing peer records handle hostname resolution just in time (e.g. peer.ip_addr() would perform resolution if it had not been done before).

Option 1 would allow operators to ensure start up does not fail on the basis of hostnames that are in the process of being available on cluster.

Option 2 is not guaranteed to solve the issue, but might mitigate the number of retries required.

Let me know if you think we should address this in a separate issue and PR. And if you want me to help finish off this PR at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build in a configurable number of retries (and maybe backoff strategy) for resolving peer hostnames;

yeah, to me this seems like the most likely to work, and option 2 seems more difficult anyways. It would be great if you could help implement it! Feel free to submit a PR to this branch!

@sergerad
Copy link
Contributor

@Rjected I looked into this unit test failure

thread 'dns_node_record::tests::test_url_parse' panicked at crates/net/types/src/dns_node_record.rs:130:9:
assertion `left == right` failed
  left: DNSNodeRecord { host: Domain("10.3.58.6"), tcp_port: 30303, udp_port: 30301, id: 0x6f8a80d14311c39f35f516fa664deaaaa13e85b2f7493f37f6144d86991ec012937307647bd3b9a82abe2974e1407241d54947bbb39763a4cac9f77166ad92a0 }
 right: DNSNodeRecord { host: Ipv4(10.3.58.6), tcp_port: 30303, udp_port: 30301, id: 0x6f8a80d14311c39f35f516fa664deaaaa13e85b2f7493f37f6144d86991ec012937307647bd3b9a82abe2974e1407241d54947bbb39763a4cac9f77166ad92a0 }

The enode prefix in the URL leads the parser to use parse_opaque() on the host string which always returns a Host::Domain whereas we expect a Host::Ipv4 here.

From what I can see our only option is to post-process the url.host after that parsing has occurred. Would be better if we could control the url parsing logic but haven't seen a way to do that in our favour here.

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super obvious to me why we need this.

this basically replaces the Noderecord usage, why?

what's the progress here?

/// cannot be resolved, an error is returned.
///
/// If the host is already an IP address, the [NodeRecord] is returned immediately.
pub async fn resolve_dns_node_record(node_record: DNSNodeRecord) -> Result<NodeRecord, Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a standalone function and not a fn of dnsrecord?

@sergerad
Copy link
Contributor

not super obvious to me why we need this.

this basically replaces the Noderecord usage, why?

what's the progress here?

I have only just started contributing to reth so please take what I say with a grain of salt.

My understanding is that the current impl of NodeRecord does not support domain names, only IPv4 and IPv6 addrs:

pub struct NodeRecord {
    /// The Address of a node.
    pub address: IpAddr

This PR is adding support for domain names. @mattsse are you suggesting we should be adding this support with the existing NodeRecord struct instead of DNSNodeRecord? Given that domain name support will be a requirement for many L2 networks (like the ones I work with).

Apologies if I have misinterpreted anything 🙏

@mattsse
Copy link
Collaborator

mattsse commented May 22, 2024

gotcha,

@Rjected please document all of this because having like 4 different node variants is super confusing

@Rjected Rjected force-pushed the dan/dns-resolution-on-enodes branch from 3510f0f to cdfdaff Compare May 23, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli Related to the reth CLI A-devp2p Related to the Ethereum P2P protocol A-discv4 Related to discv4 discovery A-discv5 Related to discv5 discovery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DNS resolution for bootnode URL
4 participants