Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client Full Sync Support #1000

Closed
9 tasks done
holgerd77 opened this issue Nov 12, 2020 · 7 comments · May be fixed by ethereumjs/ethereumjs-client#183
Closed
9 tasks done

Client Full Sync Support #1000

holgerd77 opened this issue Nov 12, 2020 · 7 comments · May be fixed by ethereumjs/ethereumjs-client#183

Comments

@holgerd77
Copy link
Member

holgerd77 commented Nov 12, 2020

As agreed upon within the team we want to start with implementing a full sync mechanism before going over to something more sophisticated (like e.g. beam sync, see #992), since full sync is the most simple and purist way of syncing and getting to a recent network state.

For a full sync overview see e.g. Ethhub, we will likely want to have some differentiated look at the several properties and will likely not just implement everything down on first round (respectively, also one possibility: ever) but concentrate on the properties we want to prioritize.

Note that this won't be mainnet ready implementation - minimally on first round - but it is rather targeted to be ready for some more lightweight testnets as well as serve simulation, test and development purposes.

Here is some checklist for a first round on what might be the tasks (it's always really helpful to have a look at our diagram to see how things work together):

1. Syncing the blockchain

  • Subclass of sync/sync.ts, I wonder if we have anything fast sync related already in the code (needs some investigation or evidence) and if we should simply repurpose (respectively eventually just: "rename") e.g. FastSynchronizer -> FullSynchronizer, alternative ways of integration respectively inheritance hierarchies, TBD by the sync method properties, needs a more close look into the implemented methods:
    • Synchronizer -> FullSynchronizer
    • Synchronizer -> FastSynchronizer -> FullSynchronizer
    • Synchronizer -> FullSynchronizer -> FastSynchronizer

2. Storing received blocks

3. Block validation

  • Consistency: Turn on validateBLocks (always? triggered by sync mode choice?) on chain creation?
  • Consensus: the validateConsensus (old: validatePow) option, leave for PoW for now for performance reasons? Prioritization question on PoA, is a PoA implementation needed right now (what happens if left out on a relatively intimate network like the Yolo network?)

4. Block verification

I'll take this as: verify the block hash by running the VM, correct me if this is the wrong semantics), on some thought this might be really relatively simple for now, my suggestion:

  • Add vm option to Config, initialize with default v5 VM if no VM passed in (I think it's attractive to do this exposure early on, not much extra cost and then people can directly start playing with their own customized/modfied VM instances)
  • Add VM execution either in the BlockFetcher (e.g. store()) or - maybe better - along the fetched event in the synchronizer
  • Question: does this need an extra vm.ts or BlockVerifier.ts wrapper (I didn't get the suggestion: client.ts TBH) or is directly executing on the VM enough for now?
  • I guess if block verification fails (so this for sure means: there is a consensus bug, right?) I think a respective error should be propagated as an event or channeled through the existing event pipeline and the block synchronization should halt and an error message should be displayed?

5. Serving the network

Serving already synced data to other peers to be a good network citizen 😄

  • Handled in FastEthereumService, not sure on the state there (what is already supported? what needs to be supported minimally?)

Phew, also not un-complex, I would say on first round. But doable. 😄

That's a summary of my current view on this as some basis for further reflection, discussion and implementation. We might want to open dedicated issues on single items if things get too complex and keep this as a tracking issue. But let's see how things evolve.

@holgerd77
Copy link
Member Author

Have made various updates to the initial comment, please read on site.

@jochem-brouwer
Copy link
Member

Hi @holgerd77, we should indeed focus on full sync first. A loose definition here would be to "fetch blocks" and run these on the chain. This is how you get your MPT to the tip of the chain (as opposed to fast sync where you download the entire MPT, or beam sync which selectively downloads nodes of the MPT). For now I think we should not bother with things like PoA and focus on a PoW chain, although PoA can of course be implemented in parallel in the VM and should have semi-high priority.

I think we should keep FastSynchronizer and FullSynchronizer segregated from each other. The only thing they have in common, I'd say, is that they both use BlockFetcher to download the blocks. I find it weird to see that we are using validateBlocks = false and validatePow = false, any reason why this is? Is this because they might depend on parent blocks which we might not have in the DB yet?

Block verification
So the extra part in block verification (besides calling validate to validate the data/header of block) is that we run the VM and we check that VMs reported stateRoot equals what is reported in the block. This throws by default in runBlock of the VM (also throws if logsBloom/gasUsed is incorrect). We can directly use runBlockchain for this: since BlockFetcher puts all blocks in the DB using our blockchain package, this also keeps track of the latest executed head in a specific context (this is the _heads in the blockchain package, which is also saved in the DB), which defaults to the genesis block. When we start putting blocks we can call runBlockchain, which iterates over these blocks. If an error is encountered (which should be a consensus error) then runBlockchain calls into blockchain and deletes this block: we might want to catch when this happens but it seems that this needs modifications to either the VM or blockchain.

So just to be clear here: runBlockchain seems to have all logic which we need here. This runs multiple blocks. It seems to me that every time the blockchain of the client is updated, we call into runBlockchain of the VM which either does nothing if there are no new canonical blocks added, or it will possibly run multiple blocks here (even if there was a reorg this should be handled just fine!). I am not yet sure what the most logical place is for the VM, but yes, we should definitely use our new V5 😄

Serving the network
I think that Fast/Full both need to implement the entire ETH protocol, while Light only needs to implement LES.

@holgerd77
Copy link
Member Author

holgerd77 commented Nov 13, 2020

Hi @jochem-brouwer, thanks for the answer on this, here are some additional notes from my side.

FastSynchronizer/FullSynchronizer

We might be coming from some different angles here. I was looking at the code from FastSynchronizer and after skimming through the different functions like best() or syncWithPeer() I had the impression the close to all (or eventually just: all) of the functionality can be applied in a full sync context, so quite on the opposite side of your statement.

Following conclusions are possible:

a) This is actually a full sync implementation and was unintentionally named "fast sync" by Vinay, this would speak for the "we just rename everything fastsync to fullsync" solution (wouldn't make sense to have laying a falsely implemented fastsync around)
b) Fast sync and full sync have actually more in common than you are anticipating and this is just the base implementation and it was intended to do the more complex parts of the fast sync logic later. This would speak very much for the inheritance solution with Synchronizer -> FullSynchronizer -> FastSynchronizer, since this would mean that we could reuse a lot of the functionality in this scenario
c) My perception/interpretation of the existing methods is wrong and I misread the code, this would speak for a separate solution

All I am saying is that we should have a closer look here to see what is actually the case to come to a structurally clean and well-grounded solution.

My basis for my analysis is this ethereum/go-ethereum#1889 write-up on the fast sync mechanism.

PoA / PoW

The Yolo testnets are actually PoA networks. Therefore my question:

what happens if left out on a relatively intimate network like the Yolo network?

Will this reliably sync (since no-one is likely interfering) or will this get on the wrong track?

Block Verification

Just a small note here: I would assume that if there is - e.g. on the Yolo testnet - really a consensus issue and the block is just deleted with the client not coming to a halt, this will just go into an eternal loop trying to process the same block over and over again (so we should rather catch here)? 🤔

@jochem-brouwer
Copy link
Member

Some notes here:

Regarding syncing, I don't think in (a) that the naming was a mistake due to this comment. In full sync you don't fetch the state trie; you "calculate"/populate it by running the VM. The problem with (b) is that if you make FastSync depend upon FullSync this implies that FastSync also starts processing blocks from the root (which you don't want to do). You also can't just start running it at the head since you don't have the complete state trie at that point. The thing they have in common is that Fast/Full both need BlockFetcher, in Full you start running blocks from the genesis block, while in Fast you first download the entire state trie of a recent block and then start to run blocks from that recent block (not genesis). So it does not seem to me that FastSync inherits FullSync although it does have things in common. This corresponds to the analysis in geth. Note here that it implies fast sync at some point switches to "classical sync" (full sync).

If we connect to Yolo but don't have POA support then the client can get easily attacked as we cannot verify if the blocks are canonical or not. The question is whether that happens on a rather specialized/semi-hidden testnet, but it is far from optimal.

If we process a block using runBlockchain and this deletes the block in the blockchain when it finds an error, then this gets deleted from blockchain's DB. If we then call runBlockchain again, it will not re-process this block as it does not exist in blockchain. However, if the client connects to a rogue node which spams us these bogus blocks then indeed it will keep processing this block, until it finds a valid block (which should have a higher total difficulty than the bogus block), upon which it starts processing the valid canonical blocks and will reject the bogus block (VM will not process this).

@holgerd77
Copy link
Member Author

In case we come to the conclusion that there is nothing fast sync specific yet in the code we nevertheless might want to consider just to rename, we don't want to do a fast sync implementation for quite some time, right? And it likely doesn't make that much sense to have a half-ready fast sync implementation exposed if we won't continue the work there.

Have given the PoA spec some closer look over the weekend, I think I might just give this a try during the next 2-3 weeks, seems very much doable to me and I would be personally interested to take on this.

For this catch case I am not talking about a bogus block but the case where the network distributes a valid one and we have got a consensus issue in our code. Wouldn't that lead to the situation where the same block is requested all over again and? Not an urgent question to solve though, we will just experience stuff like this in "life runs" and should be a somewhat easy fix then.

@jochem-brouwer
Copy link
Member

I think this is a very good proposal; fast sync has all of the code which we need for full sync, and any specific fast-sync code does not appear to be present yet (have not found any). So just renaming this sounds like a good idea here.

Yes, in case that there is a valid block where we have a consensus issue then the blockchain package would keep reporting that the canonical chain is the actual chain. Since this block would get re-added into our blockchain each time we request it, then runBlockchain will keep executing it and thus erroring (deleting it again).

Note that if it was actually a bogus block (which has valid PoW otherwise it is not considered) then at some point the actual canonical chain would take over, since most of the hashrate is (should be) mining on valid blocks and not on bogus blocks.

@holgerd77 holgerd77 transferred this issue from ethereumjs/ethereumjs-client Dec 4, 2020
@holgerd77 holgerd77 changed the title Full Sync Support Client Full Sync Support Dec 4, 2020
@holgerd77
Copy link
Member Author

I would regard this as implemented, will close. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants