Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync: Rapidly find and track peer canonical heads #811

Closed
wants to merge 2 commits into from

Conversation

jlokier
Copy link
Contributor

@jlokier jlokier commented Aug 24, 2021

First component of new sync approach.

This module fetches and tracks the canonical chain head of each connected peer. (Or in future, each peer we care about; we won't poll them all so often.)

This is for when we aren't sure of the block number of a peer's canonical chain head. Most of the time, after finding which block, it quietly polls to track small updates to the "best" block number and hash of each peer.

But sometimes that can get out of step. If there has been a deeper reorg than our tracking window, or a burst of more than a few new blocks, network delays, downtime, or the peer is itself syncing. Perhaps we stopped Nimbus and restarted a while later, e.g. suspending a laptop or Control-Z. Then this will catch up. It is even possible that the best hash the peer gave us in the Status handshake has disappeared by the time we query for the corresponding block number, so we start at zero.

The steps here perform a robust and efficient O(log N) search to rapidly converge on the new best block if it's moved out of the polling window no matter where it starts, confirm the peer's canonical chain head boundary, then track the peer's chain head in real-time by polling. The method is robust to peer state changes at any time.

The purpose is to:

  • Help with finding a peer common chain prefix ("fast sync pivot") in a consistent, fast and explicit way.

  • Catch up quickly after any long pauses of network downtime, program not running, or deep chain reorgs.

  • Be able to display real-time peer states, so they are less mysterious.

  • Tell the beam/snap/trie sync processes when to start and what blocks to fetch, and keep those fetchers in the head-adjacent window of the ever-changing chain.

  • Help the sync process bootstrap usefully when we only have one peer, speculatively fetching and validating what data we can before we have more peers to corroborate the consensus.

  • Help detect consensus failures in the network.

We cannot assume a peer's canonical chain stays the same or only gains new blocks from one query to the next. There can be reorgs, including deep reorgs. When a reorg happens, the best block number can decrease if the new canonical chain is shorter than the old one, and the best block hash we previously knew can become unavailable on the peer. So we must detect when the current best block disappears and be able to reduce block number.

Also:

Add --newsync option and use it. This option enables new blockchain sync and real-time consensus algorithms that
will eventually replace the old, very limited sync.

New sync is work in progress. It's included as an option rather than a code branch, because it's more useful for testing this way, and must not conflict anyway. It's off by default. Eventually this will become enabled by default and the option will be removed.

First component of new sync approach.

This module fetches and tracks the canonical chain head of each connected
peer.  (Or in future, each peer we care about; we won't poll them all so
often.)

This is for when we aren't sure of the block number of a peer's canonical
chain head.  Most of the time, after finding which block, it quietly polls
to track small updates to the "best" block number and hash of each peer.

But sometimes that can get out of step.  If there has been a deeper reorg
than our tracking window, or a burst of more than a few new blocks, network
delays, downtime, or the peer is itself syncing.  Perhaps we stopped Nimbus
and restarted a while later, e.g. suspending a laptop or Control-Z.  Then
this will catch up.  It is even possible that the best hash the peer gave us
in the `Status` handshake has disappeared by the time we query for the
corresponding block number, so we start at zero.

The steps here perform a robust and efficient O(log N) search to rapidly
converge on the new best block if it's moved out of the polling window no
matter where it starts, confirm the peer's canonical chain head boundary,
then track the peer's chain head in real-time by polling.  The method is
robust to peer state changes at any time.

The purpose is to:

- Help with finding a peer common chain prefix ("fast sync pivot") in a
  consistent, fast and explicit way.

- Catch up quickly after any long pauses of network downtime, program not
  running, or deep chain reorgs.

- Be able to display real-time peer states, so they are less mysterious.

- Tell the beam/snap/trie sync processes when to start and what blocks to
  fetch, and keep those fetchers in the head-adjacent window of the
  ever-changing chain.

- Help the sync process bootstrap usefully when we only have one peer,
  speculatively fetching and validating what data we can before we have more
  peers to corroborate the consensus.

- Help detect consensus failures in the network.

We cannot assume a peer's canonical chain stays the same or only gains new
blocks from one query to the next.  There can be reorgs, including deep
reorgs.  When a reorg happens, the best block number can decrease if the new
canonical chain is shorter than the old one, and the best block hash we
previously knew can become unavailable on the peer.  So we must detect when
the current best block disappears and be able to reduce block number.

Signed-off-by: Jamie Lokier <jamie@shareable.org>
This option enables new blockchain sync and real-time consensus algorithms that
will eventually replace the old, very limited sync.

New sync is work in progress.  It's included as an option rather than a code
branch, because it's more useful for testing this way, and must not conflict
anyway.  It's off by default.  Eventually this will become enabled by default
and the option will be removed.

Signed-off-by: Jamie Lokier <jamie@shareable.org>
stint, stew/byteutils,
eth/[common/eth_types, p2p]

const
Copy link
Contributor

@jangko jangko Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we can use {.booldefine.} for these tracexxx constants, and then modify it with -d:tracexxx=true/false. it will reduce accidental commits compared to editing the value of these constants.

But we can add it later, when the next component arrived. for now we can merge it.

And instead of using if where the constants are used, we can use when.

@KonradStaniec
Copy link
Contributor

This module fetches and tracks the canonical chain head of each connected peer. (Or in future, each peer we care about; we won't poll them all so often.)

This is for when we aren't sure of the block number of a peer's canonical chain head. Most of the time, after finding which block, it quietly polls to track small updates to the "best" block number and hash of each peer.

Just curious, is this polling of each peer really necessary ? Usually, each peer after sucessfull proof of work validation propagates NewBlock message to square root of its peers, and after each import NewBlockHashes message to all its peers, wouldn't tracking peer head based on those messages be enough ?

@@ -138,6 +138,7 @@ type
verifyFromOk*: bool ## activate `verifyFrom` setting
verifyFrom*: uint64 ## verification start block, 0 for disable
engineSigner*: EthAddress ## Miner account
newSync*: bool ## --newsync experimental option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems over the top, just remove the old broken one and be done with it - it doesn't have value

## Expansion factor during `SyncHuntBackward` exponential search.
## 2 is chosen for better convergence when tracking a chain reorg.

doAssert syncLockedMinimumReply >= 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
doAssert syncLockedMinimumReply >= 2
static: doAssert syncLockedMinimumReply >= 2

@arnetheduck
Copy link
Member

No longer relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants