feat(bootstrap): More efficient snarked ledger synchronization #292

tizoc · 2024-03-21T19:25:28Z

Starts with a num_accounts query to fetch the number of accounts in the ledger
Using the data obtained from the num_accounts query results, skip all the empty tree levels and go straight to the first node containing child accounts.
Uses as a base the previous available ledger (genesis -> staking -> next epoch -> transition frontier root) and queries only the nodes that changed
Fetches accounts in groups of 64 (up from 2) to lower the amount of queries
Validations for num accounts, hashes and accounts

tizoc · 2024-03-22T17:50:47Z

@binier I still need to clean it up a bit more, and maybe finish a few small things still missing, but if you have time please take a look at the general approach (and actions naming, etc) and let me know if there is anything you would change.

sebastiencs · 2024-03-25T09:32:13Z

node/src/ledger/mod.rs

+    let content_hash = content_hash.0.to_field();
+
+    let computed_hash = (subtree_height..LEDGER_DEPTH).fold(content_hash, |prev_hash, height| {
+        let empty_right = ledger::V2::empty_hash_at_height(height);


ledger::V2::empty_hash_at_height is quite slow, you can use HASH_EMPTIES instead:

openmina/ledger/src/tree.rs

Line 199 in adf6f81

static HASH_EMPTIES: Lazy<Mutex<Vec<Fp>>> = Lazy::new(|| {

If you could also change it to be Lazy<Arc<[Fp; 36]>>, not sure why I've used a mutex there

@sebastiencs good catch. Just pushed the version that uses the cached values

binier

Looks good.
Must change is regarding storing pending mask in self.sync.snarked_ledgers, (unless I'm wrong there), but the rest can be ignored.

binier · 2024-03-23T11:49:03Z

node/src/ledger/ledger_service.rs

+            return Ok(false);
+        }
+
+        let origin = self


Maybe we should use self.mask(hash) here instead? So that we copy mask either from self.snarked_ledgers or self.sync.snarked_ledgers. Otherwise we won't be able to reuse mid sync data (not sure if that falls within scope of this pr yet though).

@binier it falls in the scope of this PR yes, I just don't handle that case yet

Regarding the use of mask, I made the calls more strict (new pending_sync_snarked_ledger_mask method) so that they only check for snarked ledgers, and also ones that exists (instead of creating a new empty mask).

For this case in particular I check both origins (already synced, then syncing).

binier · 2024-03-23T11:50:31Z

node/src/ledger/ledger_service.rs

+            ))?;
+
+        let target = origin.copy();
+        self.snarked_ledgers


We should insert that snarked ledger in self.sync.snarked_ledgers instead right? Coz until we sync mismatched parts, target_snarked_ledger_hash != target.merkle_root().

@binier ufff good catch, when cleaning up some code at some point I got the impression that I was starting to see less reuse than I was expecting, this would explain it.

FYI this small change reduced the 3 snarked ledgers sync time from 60s to 20s, so it was indeed skipping all the reuse because of that mistake.

binier · 2024-03-23T11:53:24Z

node/src/ledger/ledger_service.rs

+        snarked_ledger_hash: LedgerHash,
+        parent: &LedgerAddress,
+    ) -> Result<(LedgerHash, LedgerHash), String> {
+        let mask = self.ctx_mut().sync.snarked_ledger_mut(snarked_ledger_hash);


I think we should use .mask() here as well, because name of fn doesn't imply it will only get hashes if ledger is mid-sync.

binier · 2024-03-26T08:50:24Z

node/src/transition_frontier/genesis/transition_frontier_genesis_config.rs

                let accounts = Vec::<ledger::Account>::binprot_read(&mut bytes)?;

-                let (mut mask, total_currency) = Self::build_ledger_from_accounts(accounts);
+                let (mut mask, total_currency) =
+                    Self::build_ledger_from_accounts_and_hashes(accounts, hashes);
                let ledger_hash = ledger_hash(&mut mask);
                if let Some(expected_hash) = expected_hash.filter(|h| h != &ledger_hash) {


Now since we will be using cached hashes, hash mismatch check seems useless.

How slow is recomputing those hashes? Is it worth storing them?

@binier for the berkeleynet ledger it took about 10 secs here, on mainnet it would take much longer.

yeah fairly slow then... What if we cache "merge" hashes, but recompute and compare account hashes? We could also parallelize hashing.

Any ideas how ocaml node handles this? Is it slow there as well.

in OCaml it is just slow (remember they load from the JSON directly), but only an issue the first time, because after the initial load they keep a persisted version of all the important ledgers which include all the relevant data. We are not doing that yet, so this binprot loading we are doing is just a workaround.

binier · 2024-03-26T10:37:45Z

...src/transition_frontier/sync/ledger/snarked/transition_frontier_sync_ledger_snarked_state.rs

-        end_addr: LedgerAddress,
+        pending_addresses: BTreeMap<LedgerAddress, LedgerAddressQueryPending>,
+        /// Pending num account query attempts
+        pending_num_accounts: Option<LedgerNumAccountsQueryPending>,


Can we combine num_accounts and pending_num_accounts to avoid impossible state? So that num_accounts can't exist unless pending_num_accounts is in accepted state.

Also I'd split this TransitionFrontierSyncLedgerSnarkedState::Pending state into more states.

Like:

NumAccountsFetchPending { ... }, NumAccountsFetchSuccess { ... }, TreeSyncPending { .. }, TreeSyncSuccess{ .. },

Of course assuming that fetching of addresses and num accounts doesn't happen in parallel.
If it's sequential, that should be clear based on state.

Also naming makes it confusing num_accounts and num_accounts_accepted. From first glance I saw num_accounts_accepted and thought it was the response received from Num_accounts that we've verified. Ofc comment made it clear that wasn't the case.
Maybe total_accounts and accounts_synced or something like that would be better.

num_accounts_expected + num_accounts_added ? The validation of the number of accounts claimed by the peer is kind of weak because at most we validate that the depth of the subtree is correct, but any value for num_accounts that matches that depth will be accepted (so it should just be considered an estimate)

Of course assuming that fetching of addresses and num accounts doesn't happen in parallel. If it's sequential, that should be clear based on state.

It is fully sequential (only after we know the number of accounts we know the root node containing all the accounts), so yes, that split makes total sense, will do that

I split the Pending state in two (one for the num accounts and another for the merkle tree sync). In your comment above you added some Success states, which I didn't add. Can you elaborate on that? should I add them too?

Actually, nevermind, it is just cleaner that way, I will add them.

binier · 2024-03-26T10:47:20Z

...src/transition_frontier/sync/ledger/snarked/transition_frontier_sync_ledger_snarked_state.rs

@@ -35,7 +42,23 @@ pub enum TransitionFrontierSyncLedgerSnarkedState {
 }

 #[derive(Serialize, Deserialize, Debug, Clone)]
-pub struct LedgerQueryPending {
+pub enum LedgerQueryQueued {
+    NumAccounts,


If splitting state into parts is taken into account, this can be a struct instead of enum.

NumAccountsFetchPending { ... }, NumAccountsFetchSuccess { ... }, TreeSyncPending { .. }, TreeSyncSuccess{ .. },

I got rid of this one, now the queue is just for the address queries.

binier · 2024-03-26T11:07:49Z

node/src/transition_frontier/sync/transition_frontier_sync_effects.rs

+                // we use it as a base to save work during synchronization.
+                let best_tip = store.state().transition_frontier.sync.best_tip().unwrap();
+                let target = super::ledger::SyncLedgerTarget::staking_epoch(best_tip);
+                let origin = best_tip.genesis_ledger_hash().clone();


For very unlikely edge case.
If we have any ledgers for store.state().transition_frontier.best_tip() (already synced best tip), use that instead of genesis.

@binier can you expand on this one? note sure I follow.

tizoc · 2024-03-28T19:34:17Z

@binier I will probably split the genesis ledger format changes and loading to a separate PR to merge it separately when I start rebasing and squashing these commits before merging. The async staged ledger reconstruction is at #308

If this is not done, the cloned mask will need to recompute its internal hashes, which is expensive.

fixup(bootstrap): adjust enabling conditions review: fix to where mid-sync ledgers are stored and retrieved from review: Split num accounts fetch vs merkle tree sync states in transition frontier review: rename some sync status fields to make them less ambiguous fix(transition-frontier): Check that the ledger doesn't exist in the sync snarked ledgers feat(transition-frontier): Make copying of ledgers for sync more strict fix(transition-frontier): Fix ledger copying handling for current state machine that produces the genesis block and syncs immediately refactor(transition-frontier): Remove some code duplication refactor(transition-frontier): Add success states for NumAccounts and MerkleTreeSync (WIP) feat(transition-frontier): Implement missing enabling conditions

tizoc force-pushed the feat/optimized-bootstrap branch from 76cdf13 to 8186e0f Compare March 22, 2024 14:55

tizoc marked this pull request as ready for review March 22, 2024 17:43

tizoc force-pushed the feat/optimized-bootstrap branch from ee83b4c to 77d8634 Compare March 22, 2024 17:49

tizoc requested a review from binier March 22, 2024 17:49

tizoc force-pushed the feat/optimized-bootstrap branch from df11465 to f44c2ad Compare March 22, 2024 21:38

sebastiencs reviewed Mar 25, 2024

View reviewed changes

tizoc force-pushed the feat/optimized-bootstrap branch 4 times, most recently from c17fac8 to 7e038f6 Compare March 25, 2024 17:34

binier requested changes Mar 26, 2024

View reviewed changes

tizoc force-pushed the feat/optimized-bootstrap branch 4 times, most recently from f9f83da to 28d1114 Compare March 28, 2024 18:52

tizoc requested a review from binier March 28, 2024 19:32

tizoc force-pushed the feat/optimized-bootstrap branch 2 times, most recently from e689423 to b0fda52 Compare March 29, 2024 00:34

fix(ledger): When copying root masks, copy the hashes matrix too

0fa5b5b

If this is not done, the cloned mask will need to recompute its internal hashes, which is expensive.

tizoc force-pushed the feat/optimized-bootstrap branch 2 times, most recently from ee63a83 to b8b830e Compare March 29, 2024 15:32

tizoc added 3 commits March 29, 2024 12:35

chore: Update action_kind.rs

ff2137c

chore: Update status.md

549e600

tizoc force-pushed the feat/optimized-bootstrap branch from b8b830e to 549e600 Compare March 29, 2024 15:36

tizoc merged commit d88cfa2 into develop Mar 29, 2024
3 of 4 checks passed

tizoc deleted the feat/optimized-bootstrap branch March 29, 2024 15:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bootstrap): More efficient snarked ledger synchronization #292

feat(bootstrap): More efficient snarked ledger synchronization #292

tizoc commented Mar 21, 2024 •

edited

tizoc commented Mar 22, 2024

sebastiencs Mar 25, 2024

tizoc Mar 25, 2024

binier left a comment

binier Mar 23, 2024

tizoc Mar 26, 2024

tizoc Mar 27, 2024

binier Mar 23, 2024

tizoc Mar 26, 2024

tizoc Mar 26, 2024

binier Mar 23, 2024

binier Mar 26, 2024

tizoc Mar 26, 2024

binier Mar 27, 2024

tizoc Mar 27, 2024

binier Mar 26, 2024

binier Mar 26, 2024

binier Mar 26, 2024

tizoc Mar 26, 2024

tizoc Mar 26, 2024

tizoc Mar 27, 2024

tizoc Mar 27, 2024

binier Mar 26, 2024

tizoc Mar 27, 2024

binier Mar 26, 2024

tizoc Mar 27, 2024

tizoc commented Mar 28, 2024

feat(bootstrap): More efficient snarked ledger synchronization #292

feat(bootstrap): More efficient snarked ledger synchronization #292

Conversation

tizoc commented Mar 21, 2024 • edited

tizoc commented Mar 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tizoc commented Mar 28, 2024

tizoc commented Mar 21, 2024 •

edited