New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync: Rapidly find and track peer canonical heads #811
Conversation
First component of new sync approach. This module fetches and tracks the canonical chain head of each connected peer. (Or in future, each peer we care about; we won't poll them all so often.) This is for when we aren't sure of the block number of a peer's canonical chain head. Most of the time, after finding which block, it quietly polls to track small updates to the "best" block number and hash of each peer. But sometimes that can get out of step. If there has been a deeper reorg than our tracking window, or a burst of more than a few new blocks, network delays, downtime, or the peer is itself syncing. Perhaps we stopped Nimbus and restarted a while later, e.g. suspending a laptop or Control-Z. Then this will catch up. It is even possible that the best hash the peer gave us in the `Status` handshake has disappeared by the time we query for the corresponding block number, so we start at zero. The steps here perform a robust and efficient O(log N) search to rapidly converge on the new best block if it's moved out of the polling window no matter where it starts, confirm the peer's canonical chain head boundary, then track the peer's chain head in real-time by polling. The method is robust to peer state changes at any time. The purpose is to: - Help with finding a peer common chain prefix ("fast sync pivot") in a consistent, fast and explicit way. - Catch up quickly after any long pauses of network downtime, program not running, or deep chain reorgs. - Be able to display real-time peer states, so they are less mysterious. - Tell the beam/snap/trie sync processes when to start and what blocks to fetch, and keep those fetchers in the head-adjacent window of the ever-changing chain. - Help the sync process bootstrap usefully when we only have one peer, speculatively fetching and validating what data we can before we have more peers to corroborate the consensus. - Help detect consensus failures in the network. We cannot assume a peer's canonical chain stays the same or only gains new blocks from one query to the next. There can be reorgs, including deep reorgs. When a reorg happens, the best block number can decrease if the new canonical chain is shorter than the old one, and the best block hash we previously knew can become unavailable on the peer. So we must detect when the current best block disappears and be able to reduce block number. Signed-off-by: Jamie Lokier <jamie@shareable.org>
This option enables new blockchain sync and real-time consensus algorithms that will eventually replace the old, very limited sync. New sync is work in progress. It's included as an option rather than a code branch, because it's more useful for testing this way, and must not conflict anyway. It's off by default. Eventually this will become enabled by default and the option will be removed. Signed-off-by: Jamie Lokier <jamie@shareable.org>
stint, stew/byteutils, | ||
eth/[common/eth_types, p2p] | ||
|
||
const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably we can use {.booldefine.}
for these tracexxx constants, and then modify it with -d:tracexxx=true/false
. it will reduce accidental commits compared to editing the value of these constants.
But we can add it later, when the next component arrived. for now we can merge it.
And instead of using if
where the constants are used, we can use when
.
Just curious, is this polling of each peer really necessary ? Usually, each peer after sucessfull proof of work validation propagates |
@@ -138,6 +138,7 @@ type | |||
verifyFromOk*: bool ## activate `verifyFrom` setting | |||
verifyFrom*: uint64 ## verification start block, 0 for disable | |||
engineSigner*: EthAddress ## Miner account | |||
newSync*: bool ## --newsync experimental option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems over the top, just remove the old broken one and be done with it - it doesn't have value
## Expansion factor during `SyncHuntBackward` exponential search. | ||
## 2 is chosen for better convergence when tracking a chain reorg. | ||
|
||
doAssert syncLockedMinimumReply >= 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doAssert syncLockedMinimumReply >= 2 | |
static: doAssert syncLockedMinimumReply >= 2 |
No longer relevant |
First component of new sync approach.
This module fetches and tracks the canonical chain head of each connected peer. (Or in future, each peer we care about; we won't poll them all so often.)
This is for when we aren't sure of the block number of a peer's canonical chain head. Most of the time, after finding which block, it quietly polls to track small updates to the "best" block number and hash of each peer.
But sometimes that can get out of step. If there has been a deeper reorg than our tracking window, or a burst of more than a few new blocks, network delays, downtime, or the peer is itself syncing. Perhaps we stopped Nimbus and restarted a while later, e.g. suspending a laptop or Control-Z. Then this will catch up. It is even possible that the best hash the peer gave us in the
Status
handshake has disappeared by the time we query for the corresponding block number, so we start at zero.The steps here perform a robust and efficient O(log N) search to rapidly converge on the new best block if it's moved out of the polling window no matter where it starts, confirm the peer's canonical chain head boundary, then track the peer's chain head in real-time by polling. The method is robust to peer state changes at any time.
The purpose is to:
Help with finding a peer common chain prefix ("fast sync pivot") in a consistent, fast and explicit way.
Catch up quickly after any long pauses of network downtime, program not running, or deep chain reorgs.
Be able to display real-time peer states, so they are less mysterious.
Tell the beam/snap/trie sync processes when to start and what blocks to fetch, and keep those fetchers in the head-adjacent window of the ever-changing chain.
Help the sync process bootstrap usefully when we only have one peer, speculatively fetching and validating what data we can before we have more peers to corroborate the consensus.
Help detect consensus failures in the network.
We cannot assume a peer's canonical chain stays the same or only gains new blocks from one query to the next. There can be reorgs, including deep reorgs. When a reorg happens, the best block number can decrease if the new canonical chain is shorter than the old one, and the best block hash we previously knew can become unavailable on the peer. So we must detect when the current best block disappears and be able to reduce block number.
Also:
Add
--newsync
option and use it. This option enables new blockchain sync and real-time consensus algorithms thatwill eventually replace the old, very limited sync.
New sync is work in progress. It's included as an option rather than a code branch, because it's more useful for testing this way, and must not conflict anyway. It's off by default. Eventually this will become enabled by default and the option will be removed.