Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peering - disconnects refactor #6968

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

macfarla
Copy link
Contributor

@macfarla macfarla commented Apr 19, 2024

PR description

Before this PR, when we get to max peers, we refuse ALL incoming connections regardless of any properties of the incoming peer (with UNKNOWN reason). There were a few spots where we were disconnecting peers for various reasons including repeated timeouts and "useless" responses. This PR aims to consolidate the disconnects, as well as only disconnecting an established peer when we have a better peer to replace it with.

  • Move the disconnection decision from PeerReputation/Peer to EthPeers so the totality of current peers can be considered.
  • use reputation score to sort peers within EthPeers (bestPeerComparator)
  • note in a few places in tests I've explicitly set the comparator to what was used before (uses chain height estimate) to avoid having to update a heap of tests that are dependent on the decision made by the comparator
  • only disconnect "worst" peer if we have max peers (think this will help on holesky for "useless" disconnects but not for socket closed etc - where the connection is already gone).
  • on incoming connection, compare the incoming peer to the current collection of peers and (if at max peers) disconnect whichever compares least favourably - in effect this will be the incoming peer if all our current peers are giving us good responses (reputation score), or it will be an existing peer if any are not giving good responses

example debug log (yes there is a TODO to remove this log)
{"@timestamp":"2024-04-19T00:53:31,297","level":"DEBUG","thread":"nioEventLoopGroup-3-9","class":"EthPeers","message":"comparing worstCurrentPeer PeerId: 0x024e106a70572288... PeerReputation score: 87, timeouts: {3=5}, useless: 0, validated? true, disconnected? false, client: erigon/v2.58.2-125509e4/linux-amd64/go1.21.8, [Connection with hashCode 1276176835 inboundInitiated? true initAt 1713481827797], enode://024e106a70572288701e97724610d120a682f69f24881eeee0ab1f6379646780d93daf11d9226593faf71deb699f24020dbecf801eb3d0da779ef2be641590fa@3.38.172.157:30304 with connectingPeer PeerId: 0x1ec3a5e247e616a3... PeerReputation score: 100, timeouts: {}, useless: 0, validated? true, disconnected? false, client: Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2, [Connection with hashCode 1951644955 inboundInitiated? false initAt 1713488011205], enode://1ec3a5e247e616a347038b9c35a1328529bdf354408fe3c968433df542eb1b2a7c7d1b7b7d481a5b6465baac64877ee53e1396ddf43a3ad0f5f0adbaed659145@188.40.67.160:30303","throwable":""}

Have seen decent results on holesky and mainnet. See screenshots

Fixed Issue(s)

Refs #6805 and #6842

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
…til we actually compare

Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
@macfarla
Copy link
Contributor Author

3 x mainnet nodes
1 is a bit flat
Screenshot 2024-04-19 at 12 53 51 PM

@macfarla
Copy link
Contributor Author

3 x holesky
Screenshot 2024-04-19 at 12 55 44 PM

Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
@Beanow
Copy link

Beanow commented May 5, 2024

As mentioned in #6945 do see a lot less UNKNOWN disconnects once hitting our peer limit.

image
Holesky here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants