Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyse Inbound Disconnect Reasons Per Client #6945

Open
siladu opened this issue Apr 15, 2024 · 14 comments
Open

Analyse Inbound Disconnect Reasons Per Client #6945

siladu opened this issue Apr 15, 2024 · 14 comments
Assignees
Labels

Comments

@siladu
Copy link
Contributor

siladu commented Apr 15, 2024

Related to #6805

We log enode with disconnect reason and client details with enode.
Cross reference these logs and see if there's a pattern

@macfarla
Copy link
Contributor

we get Inbound UNKNOWN disconnects from besu and OpenEthereum eg OpenEthereum/v3.3.5-hbbft-0.9.3-unstable-6b34c53cb-20231028/x86_64-linux-gnu/rustc1.68.2

@macfarla
Copy link
Contributor

running a modified version of #6947 on a holesky node
sample of disconnects (it's been running for under an hr)
excluding MISMATCHED_NETWORK_ID disconnects since on holesky that's the vast majority

note this holesky node only has 2 peers (the 2 bootnodes)

sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ cat /var/log/besu/besu.log | grep -v MISMATCHED_NET | grep bound | awk '{print $7,$9,$10,$13}' | sort
Inbound 0x UNKNOWN besu/v24.4-develop-9ae52a9/linux-x86_64/openjdk-java-21
Inbound 0x04 TOO_MANY_PEERS Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
Inbound 0x04 TOO_MANY_PEERS Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
Inbound 0x04 TOO_MANY_PEERS Nethermind/v1.25.4+20b10b35/linux-x64/dotnet8.0.2
Inbound 0x04 TOO_MANY_PEERS erigon/v2.58.0-9731f716/linux-amd64/go1.21.5
Inbound 0x04 TOO_MANY_PEERS erigon/v2.58.1/linux-amd64/go1.21.6
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Inbound 0x10 SUBPROTOCOL_TRIGGERED bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14
Outbound 0x01 TCP_SUBSYSTEM_ERROR bor/v1.2.7/linux-amd64/go1.20.14

@macfarla macfarla self-assigned this Apr 16, 2024
@macfarla
Copy link
Contributor

updated #6947 to make the analysis easier ^^

@macfarla
Copy link
Contributor

disconnect reasons matched with client type - this is all disconnects (excluding mismatched network ID) on 6609-9x nodes (3 nodes) over about an hour https://docs.google.com/spreadsheets/d/1Bbu7YWwuwRJJ3KByfpnZsw_S1GNo4O9itM7a14GhACo/edit#gid=1710195989
this may prompt more questions than it answers - what else do we want to know?
interesting that besu sends UNKNOWN disconnects more than any other peer - it's because of some code in RlpxAgent where we're checking "should we connect" from a number of sources (callbacks!) but I think it either means too many peers or node isn't ready, or already connected (it's kind of annoying that the real reason is hidden and probably is not a huge refactor but I didn't make a ticket yet)

@macfarla
Copy link
Contributor

Looking at 6609-90 which did not gain any peers via discovery over the last 9 hours

  • all inbound and outbound disconnects where client info is visible (since we disconnected the 2 geth bootnodes with useless responses) are mostly bor, a couple of Nethermind and erigon (inbound) and OpenEthereum (outbound) - but plenty of other connection attempts that time out before it gets to that point
  • literally no Geth in the logs after we disconnect the bootnodes

Inbound

{"@timestamp":"2024-04-16T07:19:19,451","level":"INFO","thread":"nioEventLoopGroup-3-9","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xd9daa011e1f46a4c... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:19:35,559","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x03124cf39fc39a1c... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:19:54,968","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x8e39133d460b26b5... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:20,211","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xd53a11a901865a4f... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:25,312","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x74cc266ba22852bd... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:26,486","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x1bd8a0c41ab6af59... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:20:58,096","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x9e6e852e5eee598c... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:21:03,854","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xfdc3755978ddbfc4... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:21:19,252","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0x288ed364dc9aa0ee... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:22:06,769","level":"INFO","thread":"nioEventLoopGroup-3-9","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x10 SUBPROTOCOL_TRIGGERED - 0xa023f72df1fd1661... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T07:28:01,295","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Inbound - 0x04 TOO_MANY_PEERS - 0xf64105f40f5533e4... erigon/v2.58.1/linux-amd64/go1.21.6 - 2 peers left","throwable":""}

Outbound

{"@timestamp":"2024-04-16T14:13:33,406","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x80ce9c14482cc882... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:14:06,808","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xe20ff77a66de45d2... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:19:01,066","level":"INFO","thread":"nioEventLoopGroup-3-7","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x6a453b0d5fda056a... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:20:31,079","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xa8eb15f64140580c... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:25:12,120","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x4ac190b9050f56b3... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:25:56,615","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xabde70c2be3120ed... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:31:59,692","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xbe6d6cc9093ea947... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:33:39,664","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x2e8ee850bd0ba49a... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:38:24,388","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xc99cfe0d5d31d492... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:40:53,038","level":"INFO","thread":"nioEventLoopGroup-3-7","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x3a67985c41ba86f1... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:41:24,548","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xdfd74a6cad8d2a78... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:46:54,078","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x7acebc19661a700c... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:47:30,784","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xa54c46e100cdfbf7... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:50:04,999","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xcd5ccc6392bce530... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:50:14,074","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x9632ba25514af69e... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:51:35,766","level":"INFO","thread":"nioEventLoopGroup-3-7","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x885d0f84f53d7639... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:53:33,645","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x8fdfb020b8f6a4c1... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T14:58:31,269","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x36f89cd4e8e22776... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:02:31,288","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x0b32358d064c5aa2... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:11:50,271","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xd700e8ab20a92f3a... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:17:46,784","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0x008ed9ab41b53538... bor/v1.2.7/linux-amd64/go1.20.14 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T15:19:36,294","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x92f073faa1b7c747... OpenEthereum/v3.3.3-stable/x86_64-linux-musl/rustc1.47.0 - 0 peers left","throwable":""}

@macfarla
Copy link
Contributor

macfarla commented Apr 16, 2024

some interesting stats (again off 6609-90 which seems to be stuck with zero peers)

sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Timed out" /var/log/besu/besu.log
27883
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Failed to connect to peer" /var/log/besu/besu.log
86300
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Received PONG  packet" /var/log/besu/besu.log
91040
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "New PeerConn" /var/log/besu/besu.log
264
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep Geth /var/log/besu/besu.log
{"@timestamp":"2024-04-16T08:39:34,743","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? true - Outbound - 0x03 USELESS_PEER_USELESS_RESPONSES - 0xa3435a0155a3e837... Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.7 - 1 peers left","throwable":""}
{"@timestamp":"2024-04-16T08:39:35,817","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? true - Outbound - 0x03 USELESS_PEER_USELESS_RESPONSES - 0xac906289e4b7f12d... Geth/v1.13.14-stable-2bd6bd01/linux-amd64/go1.21.7 - 0 peers left","throwable":""}
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$

Fail to connect reasons 87432

  • time out
  • socket exception - connection closed
  • potential peer has too many peers
  • connection refused
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Failed to connect to peer" /var/log/besu/besu.log
87432
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "Timed out"
28319
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "SocketExc"
57689
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "TOO_MANY_PEERS"
1644
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep "Failed to connect to peer" /var/log/besu/besu.log | grep -c "Connection refused"
11
sallymacfarlane@dev-elc-bu-tk-holesky-sally-6609-90:~$ grep -c "Failed to connect to peer" /var/log/besu/besu.log
87907

@macfarla
Copy link
Contributor

I've restarted besu on 6609-90 and it has connected to the 2 bootnodes and discovery looks a bit more varied (normal?)

{"@timestamp":"2024-04-16T22:17:23,738","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x21f5d4f11abc64cd... bor/v1.2.8/linux-amd64/go1.20.5 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:23,738","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x3cde5a3e8b257b48... bor/v1.2.8/linux-amd64/go1.20.5 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:23,741","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x49ce49cf45f5fe39... bor/v1.2.8/linux-amd64/go1.20.5 - 0 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:35,840","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x315e23e64a99f688... Geth/v3.0.0-pulse-stable-7975e02e/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:35,841","level":"INFO","thread":"nioEventLoopGroup-3-3","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x7ab83377a659202d... Geth/v3.0.1-pulse-stable-db2c45dc/linux-amd64/go1.20.8 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:36,063","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x00105127dbf79041... Geth/v3.0.0-pulse-stable-7975e02e/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:36,116","level":"INFO","thread":"nioEventLoopGroup-3-5","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x67dd6cc1cfdd993b... Geth/v3.0.0-pulse-stable-7975e02e/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:17:36,118","level":"INFO","thread":"nioEventLoopGroup-3-6","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0xdc33523c001de018... Geth/v3.0.1-pulse-stable-db2c45dc/linux-amd64/go1.20.8 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:20:28,923","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x01 TCP_SUBSYSTEM_ERROR - 0xfd0f8208b189e548... bor/v1.2.7/linux-amd64/go1.20.14 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:20:43,746","level":"INFO","thread":"nioEventLoopGroup-3-8","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x92f073faa1b7c747... OpenEthereum/v3.3.3-stable/x86_64-linux-musl/rustc1.47.0 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:19,800","level":"INFO","thread":"nioEventLoopGroup-3-4","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x99502e898abbd65b... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:19,804","level":"INFO","thread":"nioEventLoopGroup-3-2","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x5c2ad585a7eb2244... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:25,107","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0xf783d7c28cd1b5dd... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}
{"@timestamp":"2024-04-16T22:21:30,578","level":"INFO","thread":"nioEventLoopGroup-3-1","class":"EthProtocolManager","message":"Disconnect - active Connection? false - Outbound - 0x10 SUBPROTOCOL_TRIGGERED_MISMATCHED_NETWORK - 0x44c394b274b0581c... bor/v1.2.8/linux-amd64/go1.20.5 - 2 peers left","throwable":""}

@macfarla
Copy link
Contributor

also noticed on 6609-00 and 6609-02, where I forcibly added the 2 bootnodes as peers, those nodes haven't added any other peers via discovery (over last 9hours) - whereas other holesky nodes started at the same time are now up to 25 peers
(note 6609-90 just restarted, has 2 peers, still syncing)

Screenshot 2024-04-17 at 8 27 46 AM

@macfarla
Copy link
Contributor

with no further intervention, 6609-90 is now up to 25 peers
Screenshot 2024-04-18 at 10 59 28 AM

@Beanow
Copy link

Beanow commented Apr 18, 2024

I would also suggest changing the function signature on:

boolean shouldConnect(Peer peer, final boolean incoming);

For the ShouldConnectCallback and ProtocolManager interfaces.

The result of this is that any false returned here becomes an UNKNOWN disconnect reason.
Which is typically 80-100k per day once the max peers is saturated. For me on Holesky.
image

The disconnect reason would be rather interesting because:

  1. That makes these metrics more useful.
  2. We communicate that to our peer, making us a better citizen on the network.

So perhaps it should be a DisconnectReason or null return type. Or a dedicated type for this.

@macfarla
Copy link
Contributor

agree @Beanow - I'm actually working on a related fix right now that will help a lot to get rid of UNKOWN disconnects

@macfarla
Copy link
Contributor

@Beanow see #6968 which gets rid of majority of UNKNOWN disconnects. can then do a follow on refactor that changes the method signature as you suggest

@Beanow
Copy link

Beanow commented Apr 28, 2024

@macfarla I also wonder out loud about 2 things:

  • Would there be a substantial difference in quality of peers we receive through DNS vs Peer discovery?
  • If there's research/literature on good heuristics for maintaining peers.

#6968 mainly refers to the PeerReputation, though I'm not well versed in the matter enough to know if - in aggregate - this could lead to cliques or local maximum kinds of issues.

@macfarla
Copy link
Contributor

macfarla commented May 7, 2024

useful command to collect data on disconnect reasons by client name
cat /var/log/besu/besu.log | grep -v MISMATCHED_NET | grep "bound - 0x" | awk '{print $16,",",$19,$22}' | sort

example output from running local mainnet node (latest main) with teku for ~45 min https://docs.google.com/spreadsheets/d/1ZqsoD-wIlxZjOW1RokS8Op50ZxjuyPYkuF4Yp3RDB4g/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants