Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce zero-padding and cache zero-hashes in k-ary MerkleTree #2407

Merged
merged 9 commits into from Apr 12, 2024

Conversation

ljedrz
Copy link
Contributor

@ljedrz ljedrz commented Mar 23, 2024

When creating k-ary Merkle trees, we currently pad all the "missing" leaves, which can result in a large number of empty hashes that all end up being allocated, and then repeatedly included in the calculation of further hashes. In order to improve performance for all such scenarios, we can omit some of the padding, and cache the hashes that derive from the zero hash, which is what this PR proposes.

While there are still ways in which the performance can be improved further, these changes aren't exactly trivial already, and it seems like a good moment to open them to discussion. Below are some benchmarks that I'll be referring to when outlining the potential for future improvement.

creation of a KaryMerkleTree with ARITY = 8, DEPTH = 8, NUM_LEAVES = ARITY^(DEPTH-1)+1, and SHA3 hashers

before:

alloc count: 48,061,051
heap use:    5.12 GiB
max heap:    10.17 GiB

after:

alloc count: 21,084,574 (-56%)
heap use:    1.62 GiB (-68%)
max heap:    3.17 GiB (-69%)

These are all wins, with a significantly lower both maximum and current (once the tree is created) heap use.

console/collections/benches/kary_merkle_tree:

KaryMerkleTree/new/0 (0% full)
                        time:   [21.247 µs 21.308 µs 21.383 µs]
                        change: [+93.568% +95.333% +96.860%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

Benchmarking KaryMerkleTree/new/4194304 (25% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 20.4s.
KaryMerkleTree/new/4194304 (25% full)
                        time:   [2.0185 s 2.0246 s 2.0308 s]
                        change: [-51.205% -51.016% -50.847%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/8388608 (50% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 38.2s.
KaryMerkleTree/new/8388608 (50% full)
                        time:   [3.8503 s 3.8668 s 3.8835 s]
                        change: [-25.467% -25.025% -24.593%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/12582912 (75% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 57.7s.
KaryMerkleTree/new/12582912 (75% full)
                        time:   [5.7279 s 5.7525 s 5.7726 s]
                        change: [-8.8066% -8.3742% -7.9833%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) low mild

Benchmarking KaryMerkleTree/new/16777216 (100% full): Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 75.1s.
KaryMerkleTree/new/16777216 (100% full)
                        time:   [7.5454 s 7.5707 s 7.5963 s]
                        change: [+3.6674% +4.2182% +4.7320%] (p = 0.00 < 0.05)
                        Performance has regressed.

The creation of an empty and a full tree see regressions, as those scenarios don't benefit from the caching of empty hashes. I have an idea on how to improve them, which will be posted as an additional commit if the diff isn't too large.

@ljedrz ljedrz requested a review from d0cd March 23, 2024 16:18
Signed-off-by: ljedrz <ljedrz@gmail.com>
@ljedrz
Copy link
Contributor Author

ljedrz commented Mar 23, 2024

The regressions turned out to be simpler to iron out than expected:

KaryMerkleTree/new/0 (0% full)
                        time:   [12.495 µs 12.524 µs 12.564 µs]
                        change: [+12.772% +13.612% +14.468%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

KaryMerkleTree/new/16777216 (100% full)
                        time:   [7.4351 s 7.4754 s 7.5136 s]
                        change: [-4.6663% -3.8930% -3.0814%] (p = 0.00 < 0.05)
                        Performance has improved.

The empty tree case is still a bit worse than before, but it's a tiny difference measured in µs and not the typical scenario. Also, that one seems likely to fluctuate - I just re-ran it (without any changes), and saw a -6% difference.

@ljedrz ljedrz marked this pull request as ready for review March 23, 2024 17:40
@@ -170,7 +170,7 @@ fn check_merkle_tree_depth_3_arity_3_padded<LH: LeafHash<Hash = PH::Hash>, PH: P

// Construct the Merkle tree for the given leaves.
let merkle_tree = KaryMerkleTree::<LH, PH, 3, ARITY>::new(leaf_hasher, path_hasher, &leaves)?;
assert_eq!(40, merkle_tree.tree.len());
assert_eq!(25, merkle_tree.tree.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Contributor Author

@ljedrz ljedrz Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, the nodes that would only contain empty hashes (padding) no longer need to allocate them at the leaf level, which means that the tree ends up being smaller; you can see a calculation of the difference in the current number of leaves - padded for the entire leaf level - and the one this PR proposes (only padded up to the arity) here. In case of this particular test case (arity = 3, depth = 3, leaves = 10), the number of tree members (at the leaf level) that are just padding is currently 17, which this PR takes down to 2, which is a reduction of 15, leading to the corresponding reduction in tree size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change apply to the merkle-tree too? (non-k-ary one) If so we shoudl optimize and profile that one too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already done that; the wins aren't as great, but I'm happy to file a PR soon if this direction is desirable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardwu
Copy link
Member

howardwu commented Apr 2, 2024

@ljedrz is this a breaking PR change? Does the Merkle tree's expected outputs and roots change in any way?

@ljedrz
Copy link
Contributor Author

ljedrz commented Apr 2, 2024

I don't think so - hashes for the nodes containing empty hashes are calculated as if the empty hashes were there, they are only not physically present in the tree; the number of leaves field is not affected either; are there any test vectors I could double-check against?

@ljedrz
Copy link
Contributor Author

ljedrz commented Apr 3, 2024

@howardwu I've been running equivalence tests (both KaryMerkleTree and MerkleTree) by @d0cd for a large number of iterations and haven't encountered any failures.

@d0cd
Copy link
Contributor

d0cd commented Apr 3, 2024

For those interested, the equivalence tests can be found in this repo. (currently private but the AleoHQ/snarkVM team has access)

A sample equivalence test for the Puzzle is below.
Note that snarkvm_old points to mainnet-staging, and snarkvm_new points to ljedrz:perf/kary_merkle_tree_padding

use snarkvm_old::ledger::puzzle::Puzzle as OldPuzzle;
use snarkvm_old::ledger::store::helpers::memory::ConsensusMemory as OldMemory;
use snarkvm_old::prelude::{
    Address as OldAddress, MainnetV0 as OldNetwork, Network as OldNetworkTrait, Uniform,
};
use snarkvm_old::synthesizer::vm::VM as OldVM;

use snarkvm_new::ledger::puzzle::Puzzle as NewPuzzle;
use snarkvm_new::ledger::store::helpers::memory::ConsensusMemory as NewMemory;
use snarkvm_new::prelude::{
    Address as NewAddress, MainnetV0 as NewNetwork, Network as NewNetworkTrait,
};
use snarkvm_new::synthesizer::vm::VM as NewVM;

use anyhow::Result;
use std::str::FromStr;

const ITERATIONS: u128 = 100;

fn check_equivalency(
    old_puzzle: &OldPuzzle<OldNetwork>,
    new_puzzle: &NewPuzzle<NewNetwork>,
) -> Result<()> {
    // Initialize an RNG.
    let rng = &mut rand::thread_rng();
    // Check that the puzzles produce equivalent solutions.
    for _ in 0..ITERATIONS {
        // Generate a random epoch hash.
        let old_epoch_hash = <OldNetwork as OldNetworkTrait>::BlockHash::rand(rng);
        let new_epoch_hash =
            <NewNetwork as NewNetworkTrait>::BlockHash::from_str(&format!("{old_epoch_hash}"))?;

        // Generate a random address and counter.
        let old_address = OldAddress::<OldNetwork>::rand(rng);
        let new_address = NewAddress::<NewNetwork>::from_str(&format!("{old_address}"))?;

        let old_counter = u64::rand(rng);
        let new_counter = u64::from_str(&format!("{old_counter}"))?;

        let old_solution = old_puzzle.prove(old_epoch_hash, old_address, old_counter, None)?;
        let new_solution = new_puzzle.prove(new_epoch_hash, new_address, new_counter, None)?;

        let old_proof_target = old_puzzle.get_proof_target(&old_solution)?;
        let new_proof_target = new_puzzle.get_proof_target(&new_solution)?;

        assert_eq!(old_proof_target, new_proof_target);
    }
    Ok(())
}

#[test]
fn test_puzzles_are_equivalent() -> Result<()> {
    // Construct the old puzzle.
    let old_puzzle = OldVM::<OldNetwork, OldMemory<OldNetwork>>::new_puzzle()?;

    // Construct the new puzzle.
    let new_puzzle = NewVM::<NewNetwork, NewMemory<NewNetwork>>::new_puzzle()?;

    // Check that the old and new puzzles produce equivalent solutions.
    check_equivalency(&old_puzzle, &new_puzzle)?;

    Ok(())
}

@howardwu howardwu requested review from vicsn and mdelle1 April 3, 2024 21:03
@howardwu
Copy link
Member

howardwu commented Apr 3, 2024

@vicsn @mdelle1 can each of you independently review this design change for correctness/completeness and soundness?

Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
@ljedrz
Copy link
Contributor Author

ljedrz commented Apr 4, 2024

I'm going to check if any of these review comments apply to #2415 as well.

Edit: it seems that way, but let's finish this PR first so as to make sure that everything is in good shape.

Copy link
Contributor

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell: this PR does not impact completeness/soundness of the merkle tree opening proof, because it produces merkle tree nodes which are identical to before.

Note that due to the particular imlementation, we can't prove membership anymore of empty leaves outside of range of the originally supplied leaves, which seems like a desirable bonus.

Signed-off-by: ljedrz <ljedrz@gmail.com>
Signed-off-by: ljedrz <ljedrz@gmail.com>
@ljedrz ljedrz requested a review from vicsn April 8, 2024 15:15
@ljedrz
Copy link
Contributor Author

ljedrz commented Apr 8, 2024

I'll rerun the benchmarks for this and the other PR shortly.

@ljedrz
Copy link
Contributor Author

ljedrz commented Apr 8, 2024

The updated benchmark results (skipping the 0% full case, as it's inconsistent) are:

Benchmarking KaryMerkleTree/new/4194304 (25% full): Warming up for 3.0000 s
KaryMerkleTree/new/4194304 (25% full)
                        time:   [1.0458 s 1.0503 s 1.0551 s]
                        change: [-73.821% -73.661% -73.488%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/8388608 (50% full): Warming up for 3.0000 s
KaryMerkleTree/new/8388608 (50% full)
                        time:   [1.9405 s 1.9510 s 1.9611 s]
                        change: [-58.942% -58.687% -58.449%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking KaryMerkleTree/new/12582912 (75% full): Warming up for 3.0000 s
KaryMerkleTree/new/12582912 (75% full)
                        time:   [2.8457 s 2.8569 s 2.8664 s]
                        change: [-47.101% -46.725% -46.350%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) low mild

Benchmarking KaryMerkleTree/new/16777216 (100% full): Warming up for 3.0000 s
KaryMerkleTree/new/16777216 (100% full)
                        time:   [3.7669 s 3.8062 s 3.8520 s]
                        change: [-39.205% -38.257% -37.313%] (p = 0.00 < 0.05)
                        Performance has improved.

KaryMerkleTree/verify   time:   [14.508 µs 14.608 µs 14.737 µs]
                        change: [-7.9402% -6.7762% -5.4263%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  10 (10.00%) high severe

Which is a further improvement across the board.

Copy link
Contributor

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK on the new commits

@howardwu howardwu merged commit bbdb745 into AleoHQ:mainnet-staging Apr 12, 2024
80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants