Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: de-duplicate payloads from persisted beacon blocks #6029

Draft
wants to merge 49 commits into
base: unstable
Choose a base branch
from

Conversation

matthewkeil
Copy link
Member

@matthewkeil matthewkeil commented Oct 9, 2023

NOTE: The Sim Merge Test is not going to pass. The container that it runs one test in needs to be updated. @g11tech is going to look for the Dockerfile and I will help get it updated and published so it will pass. The image is based on a pre-shanghai image that does not have engine_getPayloadBodiesByHashV1 available. This is the image:
https://hub.docker.com/r/g11tech/mergemock

Two things still need to be double checked before moving to ready:

  • - double check that getBlock works as expected
  • - get valid deneb block (ask @g11tech how to generate with valid data. perhaps can pull from devnet 9??)
  • - turn on deneb block unit tests. need to add a value for the fork epoch and spoof valid slots in the mocks
  • - convert the fixtures to .ssz format to reduce the diff
  • - proof on a benchmark the need for serialized conversion in packages/beacon-node/src/util/fullOrBlindedBlock.ts
  • - convert generator serialized conversion to promise and re-test perf as promise
  • - remove excess codepath from results above
  • - Fix sim-test eth1 engine mock to support engine_getPayloadBodiesByHashV1

Motivation

Lodestar is saving data that is also saved in the execution client database. In particular we are persisting transactions and withdrawals in the block and blockArchive databases.

Description

Stores blinded blocks in both the hot and cold db. Modifies calls for data retrieval that require the full block, ReqResp and API, to splice in the missing transactions and withdrawals.

Closes #5671

** How to test **

Extensive unit and perf testing was conducted to make sure that this should work correctly.

yarn test:unit
yarn benchmark:files packages/beacon-node/test/perf/util/fullOrBlindedBlock.test.ts

@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2023

Performance Report

✔️ no performance regression detected

Full benchmark results
Benchmark suite Current: 3d24afb Previous: 2b5935a Ratio
getPubkeys - index2pubkey - req 1000 vs - 250000 vc 779.92 us/op 796.24 us/op 0.98
getPubkeys - validatorsArr - req 1000 vs - 250000 vc 56.687 us/op 47.995 us/op 1.18
BLS verify - blst-native 1.1833 ms/op 1.0721 ms/op 1.10
BLS verifyMultipleSignatures 3 - blst-native 2.5468 ms/op 2.2878 ms/op 1.11
BLS verifyMultipleSignatures 8 - blst-native 5.4344 ms/op 5.0705 ms/op 1.07
BLS verifyMultipleSignatures 32 - blst-native 20.657 ms/op 18.628 ms/op 1.11
BLS verifyMultipleSignatures 64 - blst-native 38.845 ms/op 36.661 ms/op 1.06
BLS verifyMultipleSignatures 128 - blst-native 77.583 ms/op 73.533 ms/op 1.06
BLS deserializing 10000 signatures 798.46 ms/op 761.13 ms/op 1.05
BLS deserializing 100000 signatures 8.2787 s/op 7.6443 s/op 1.08
BLS verifyMultipleSignatures - same message - 3 - blst-native 1.1668 ms/op 1.1701 ms/op 1.00
BLS verifyMultipleSignatures - same message - 8 - blst-native 1.3228 ms/op 1.2694 ms/op 1.04
BLS verifyMultipleSignatures - same message - 32 - blst-native 2.3607 ms/op 1.9999 ms/op 1.18
BLS verifyMultipleSignatures - same message - 64 - blst-native 3.1314 ms/op 2.9730 ms/op 1.05
BLS verifyMultipleSignatures - same message - 128 - blst-native 5.9647 ms/op 4.9050 ms/op 1.22
BLS aggregatePubkeys 32 - blst-native 24.113 us/op 22.258 us/op 1.08
BLS aggregatePubkeys 128 - blst-native 89.529 us/op 87.332 us/op 1.03
getAttestationsForBlock 39.245 ms/op 27.307 ms/op 1.44
isKnown best case - 1 super set check 361.00 ns/op 299.00 ns/op 1.21
isKnown normal case - 2 super set checks 324.00 ns/op 302.00 ns/op 1.07
isKnown worse case - 16 super set checks 579.00 ns/op 301.00 ns/op 1.92
CheckpointStateCache - add get delete 4.3360 us/op 3.4550 us/op 1.25
validate api signedAggregateAndProof - struct 2.4828 ms/op 2.4074 ms/op 1.03
validate gossip signedAggregateAndProof - struct 2.4176 ms/op 2.3522 ms/op 1.03
validate gossip attestation - vc 640000 1.2036 ms/op 1.1280 ms/op 1.07
batch validate gossip attestation - vc 640000 - chunk 32 147.16 us/op 135.47 us/op 1.09
batch validate gossip attestation - vc 640000 - chunk 64 127.96 us/op 120.70 us/op 1.06
batch validate gossip attestation - vc 640000 - chunk 128 118.47 us/op 109.81 us/op 1.08
batch validate gossip attestation - vc 640000 - chunk 256 110.73 us/op 106.85 us/op 1.04
pickEth1Vote - no votes 886.40 us/op 871.51 us/op 1.02
pickEth1Vote - max votes 10.008 ms/op 10.367 ms/op 0.97
pickEth1Vote - Eth1Data hashTreeRoot value x2048 18.552 ms/op 18.709 ms/op 0.99
pickEth1Vote - Eth1Data hashTreeRoot tree x2048 19.039 ms/op 24.615 ms/op 0.77
pickEth1Vote - Eth1Data fastSerialize value x2048 427.00 us/op 366.67 us/op 1.16
pickEth1Vote - Eth1Data fastSerialize tree x2048 8.8883 ms/op 5.1396 ms/op 1.73
bytes32 toHexString 422.00 ns/op 391.00 ns/op 1.08
bytes32 Buffer.toString(hex) 288.00 ns/op 275.00 ns/op 1.05
bytes32 Buffer.toString(hex) from Uint8Array 426.00 ns/op 378.00 ns/op 1.13
bytes32 Buffer.toString(hex) + 0x 301.00 ns/op 277.00 ns/op 1.09
Object access 1 prop 0.19400 ns/op 0.18100 ns/op 1.07
Map access 1 prop 0.18300 ns/op 0.17800 ns/op 1.03
Object get x1000 5.6220 ns/op 4.8040 ns/op 1.17
Map get x1000 0.49900 ns/op 0.48800 ns/op 1.02
Object set x1000 23.448 ns/op 22.926 ns/op 1.02
Map set x1000 15.988 ns/op 16.235 ns/op 0.98
Return object 10000 times 0.22510 ns/op 0.21350 ns/op 1.05
Throw Error 10000 times 2.8632 us/op 2.6166 us/op 1.09
fastMsgIdFn sha256 / 200 bytes 1.9710 us/op 1.8840 us/op 1.05
fastMsgIdFn h32 xxhash / 200 bytes 317.00 ns/op 281.00 ns/op 1.13
fastMsgIdFn h64 xxhash / 200 bytes 352.00 ns/op 331.00 ns/op 1.06
fastMsgIdFn sha256 / 1000 bytes 6.0650 us/op 5.8950 us/op 1.03
fastMsgIdFn h32 xxhash / 1000 bytes 432.00 ns/op 391.00 ns/op 1.10
fastMsgIdFn h64 xxhash / 1000 bytes 411.00 ns/op 387.00 ns/op 1.06
fastMsgIdFn sha256 / 10000 bytes 52.260 us/op 51.040 us/op 1.02
fastMsgIdFn h32 xxhash / 10000 bytes 1.7790 us/op 1.7250 us/op 1.03
fastMsgIdFn h64 xxhash / 10000 bytes 1.2170 us/op 1.1830 us/op 1.03
send data - 1000 256B messages 12.615 ms/op 11.017 ms/op 1.15
send data - 1000 512B messages 16.711 ms/op 14.457 ms/op 1.16
send data - 1000 1024B messages 23.331 ms/op 21.755 ms/op 1.07
send data - 1000 1200B messages 21.759 ms/op 20.456 ms/op 1.06
send data - 1000 2048B messages 22.679 ms/op 22.924 ms/op 0.99
send data - 1000 4096B messages 22.146 ms/op 23.586 ms/op 0.94
send data - 1000 16384B messages 71.023 ms/op 56.992 ms/op 1.25
send data - 1000 65536B messages 390.46 ms/op 228.16 ms/op 1.71
enrSubnets - fastDeserialize 64 bits 1.2430 us/op 859.00 ns/op 1.45
enrSubnets - ssz BitVector 64 bits 608.00 ns/op 397.00 ns/op 1.53
enrSubnets - fastDeserialize 4 bits 295.00 ns/op 185.00 ns/op 1.59
enrSubnets - ssz BitVector 4 bits 577.00 ns/op 391.00 ns/op 1.48
prioritizePeers score -10:0 att 32-0.1 sync 2-0 120.09 us/op 62.735 us/op 1.91
prioritizePeers score 0:0 att 32-0.25 sync 2-0.25 125.45 us/op 74.079 us/op 1.69
prioritizePeers score 0:0 att 32-0.5 sync 2-0.5 219.87 us/op 105.66 us/op 2.08
prioritizePeers score 0:0 att 64-0.75 sync 4-0.75 379.69 us/op 177.99 us/op 2.13
prioritizePeers score 0:0 att 64-1 sync 4-1 251.26 us/op 197.47 us/op 1.27
array of 16000 items push then shift 1.2690 us/op 1.1970 us/op 1.06
LinkedList of 16000 items push then shift 7.3980 ns/op 6.5390 ns/op 1.13
array of 16000 items push then pop 89.461 ns/op 59.456 ns/op 1.50
LinkedList of 16000 items push then pop 5.9140 ns/op 5.5730 ns/op 1.06
array of 24000 items push then shift 1.9780 us/op 1.7747 us/op 1.11
LinkedList of 24000 items push then shift 6.5740 ns/op 6.2550 ns/op 1.05
array of 24000 items push then pop 116.54 ns/op 77.599 ns/op 1.50
LinkedList of 24000 items push then pop 6.2000 ns/op 5.5830 ns/op 1.11
intersect bitArray bitLen 8 5.4140 ns/op 5.2120 ns/op 1.04
intersect array and set length 8 79.687 ns/op 39.104 ns/op 2.04
intersect bitArray bitLen 128 25.650 ns/op 24.742 ns/op 1.04
intersect array and set length 128 590.12 ns/op 548.66 ns/op 1.08
bitArray.getTrueBitIndexes() bitLen 128 1.3730 us/op 1.1640 us/op 1.18
bitArray.getTrueBitIndexes() bitLen 248 2.1430 us/op 1.8790 us/op 1.14
bitArray.getTrueBitIndexes() bitLen 512 4.8170 us/op 3.3860 us/op 1.42
Buffer.concat 32 items 969.00 ns/op 841.00 ns/op 1.15
Uint8Array.set 32 items 1.9200 us/op 1.7780 us/op 1.08
Set add up to 64 items then delete first 1.6977 us/op 1.6944 us/op 1.00
OrderedSet add up to 64 items then delete first 2.8673 us/op 2.5875 us/op 1.11
Set add up to 64 items then delete last 1.9820 us/op 1.9366 us/op 1.02
OrderedSet add up to 64 items then delete last 2.8114 us/op 2.8678 us/op 0.98
Set add up to 64 items then delete middle 1.8927 us/op 1.9352 us/op 0.98
OrderedSet add up to 64 items then delete middle 3.9762 us/op 4.1304 us/op 0.96
Set add up to 128 items then delete first 3.7339 us/op 3.8383 us/op 0.97
OrderedSet add up to 128 items then delete first 5.8489 us/op 5.9986 us/op 0.98
Set add up to 128 items then delete last 3.6044 us/op 3.6751 us/op 0.98
OrderedSet add up to 128 items then delete last 5.3784 us/op 5.6744 us/op 0.95
Set add up to 128 items then delete middle 3.6255 us/op 3.8124 us/op 0.95
OrderedSet add up to 128 items then delete middle 12.196 us/op 10.538 us/op 1.16
Set add up to 256 items then delete first 10.159 us/op 7.4845 us/op 1.36
OrderedSet add up to 256 items then delete first 20.723 us/op 11.857 us/op 1.75
Set add up to 256 items then delete last 14.394 us/op 7.2076 us/op 2.00
OrderedSet add up to 256 items then delete last 14.941 us/op 10.956 us/op 1.36
Set add up to 256 items then delete middle 9.2424 us/op 7.1477 us/op 1.29
OrderedSet add up to 256 items then delete middle 35.033 us/op 29.953 us/op 1.17
transfer serialized Status (84 B) 1.9670 us/op 1.3980 us/op 1.41
copy serialized Status (84 B) 2.2160 us/op 1.2440 us/op 1.78
transfer serialized SignedVoluntaryExit (112 B) 2.7790 us/op 1.4640 us/op 1.90
copy serialized SignedVoluntaryExit (112 B) 1.7380 us/op 1.3250 us/op 1.31
transfer serialized ProposerSlashing (416 B) 3.1800 us/op 2.3130 us/op 1.37
copy serialized ProposerSlashing (416 B) 3.1890 us/op 2.3160 us/op 1.38
transfer serialized Attestation (485 B) 2.5820 us/op 2.4300 us/op 1.06
copy serialized Attestation (485 B) 2.4820 us/op 2.2810 us/op 1.09
transfer serialized AttesterSlashing (33232 B) 2.4490 us/op 2.3780 us/op 1.03
copy serialized AttesterSlashing (33232 B) 8.0500 us/op 5.0060 us/op 1.61
transfer serialized Small SignedBeaconBlock (128000 B) 2.6790 us/op 2.3340 us/op 1.15
copy serialized Small SignedBeaconBlock (128000 B) 10.634 us/op 11.074 us/op 0.96
transfer serialized Avg SignedBeaconBlock (200000 B) 3.1330 us/op 2.4080 us/op 1.30
copy serialized Avg SignedBeaconBlock (200000 B) 14.004 us/op 18.528 us/op 0.76
transfer serialized BlobsSidecar (524380 B) 4.0240 us/op 2.5510 us/op 1.58
copy serialized BlobsSidecar (524380 B) 141.36 us/op 71.011 us/op 1.99
transfer serialized Big SignedBeaconBlock (1000000 B) 4.5650 us/op 2.5540 us/op 1.79
copy serialized Big SignedBeaconBlock (1000000 B) 256.63 us/op 139.87 us/op 1.83
pass gossip attestations to forkchoice per slot 2.7367 ms/op 2.6015 ms/op 1.05
forkChoice updateHead vc 100000 bc 64 eq 0 616.65 us/op 431.50 us/op 1.43
forkChoice updateHead vc 600000 bc 64 eq 0 2.9726 ms/op 2.9483 ms/op 1.01
forkChoice updateHead vc 1000000 bc 64 eq 0 5.1292 ms/op 4.5480 ms/op 1.13
forkChoice updateHead vc 600000 bc 320 eq 0 3.0412 ms/op 2.6198 ms/op 1.16
forkChoice updateHead vc 600000 bc 1200 eq 0 3.2473 ms/op 2.9296 ms/op 1.11
forkChoice updateHead vc 600000 bc 7200 eq 0 3.7134 ms/op 3.4147 ms/op 1.09
forkChoice updateHead vc 600000 bc 64 eq 1000 10.321 ms/op 9.8509 ms/op 1.05
forkChoice updateHead vc 600000 bc 64 eq 10000 10.271 ms/op 9.9270 ms/op 1.03
forkChoice updateHead vc 600000 bc 64 eq 300000 16.185 ms/op 12.361 ms/op 1.31
computeDeltas 500000 validators 300 proto nodes 4.0232 ms/op 2.8485 ms/op 1.41
computeDeltas 500000 validators 1200 proto nodes 3.8653 ms/op 2.8871 ms/op 1.34
computeDeltas 500000 validators 7200 proto nodes 4.0815 ms/op 2.8293 ms/op 1.44
computeDeltas 750000 validators 300 proto nodes 5.3841 ms/op 4.3903 ms/op 1.23
computeDeltas 750000 validators 1200 proto nodes 6.5005 ms/op 4.2857 ms/op 1.52
computeDeltas 750000 validators 7200 proto nodes 7.0571 ms/op 4.2634 ms/op 1.66
computeDeltas 1400000 validators 300 proto nodes 13.189 ms/op 8.2065 ms/op 1.61
computeDeltas 1400000 validators 1200 proto nodes 12.866 ms/op 8.2389 ms/op 1.56
computeDeltas 1400000 validators 7200 proto nodes 13.438 ms/op 8.2451 ms/op 1.63
computeDeltas 2100000 validators 300 proto nodes 19.904 ms/op 12.618 ms/op 1.58
computeDeltas 2100000 validators 1200 proto nodes 18.771 ms/op 12.617 ms/op 1.49
computeDeltas 2100000 validators 7200 proto nodes 14.523 ms/op 12.940 ms/op 1.12
computeProposerBoostScoreFromBalances 500000 validators 2.9497 ms/op 2.7660 ms/op 1.07
computeProposerBoostScoreFromBalances 750000 validators 3.0279 ms/op 2.7606 ms/op 1.10
computeProposerBoostScoreFromBalances 1400000 validators 2.9826 ms/op 2.8303 ms/op 1.05
computeProposerBoostScoreFromBalances 2100000 validators 2.9215 ms/op 2.8487 ms/op 1.03
altair processAttestation - 250000 vs - 7PWei normalcase 1.8450 ms/op 1.5454 ms/op 1.19
altair processAttestation - 250000 vs - 7PWei worstcase 2.3005 ms/op 2.8214 ms/op 0.82
altair processAttestation - setStatus - 1/6 committees join 117.31 us/op 121.41 us/op 0.97
altair processAttestation - setStatus - 1/3 committees join 246.18 us/op 210.46 us/op 1.17
altair processAttestation - setStatus - 1/2 committees join 332.01 us/op 311.58 us/op 1.07
altair processAttestation - setStatus - 2/3 committees join 423.60 us/op 419.26 us/op 1.01
altair processAttestation - setStatus - 4/5 committees join 538.62 us/op 528.25 us/op 1.02
altair processAttestation - setStatus - 100% committees join 638.22 us/op 673.94 us/op 0.95
altair processBlock - 250000 vs - 7PWei normalcase 7.5212 ms/op 8.9014 ms/op 0.84
altair processBlock - 250000 vs - 7PWei normalcase hashState 29.014 ms/op 25.617 ms/op 1.13
altair processBlock - 250000 vs - 7PWei worstcase 34.934 ms/op 30.510 ms/op 1.14
altair processBlock - 250000 vs - 7PWei worstcase hashState 86.718 ms/op 76.640 ms/op 1.13
phase0 processBlock - 250000 vs - 7PWei normalcase 2.4427 ms/op 2.4757 ms/op 0.99
phase0 processBlock - 250000 vs - 7PWei worstcase 26.857 ms/op 28.806 ms/op 0.93
altair processEth1Data - 250000 vs - 7PWei normalcase 314.66 us/op 307.60 us/op 1.02
getExpectedWithdrawals 250000 eb:1,eth1:1,we:0,wn:0,smpl:15 6.6980 us/op 7.7620 us/op 0.86
getExpectedWithdrawals 250000 eb:0.95,eth1:0.1,we:0.05,wn:0,smpl:219 52.951 us/op 41.841 us/op 1.27
getExpectedWithdrawals 250000 eb:0.95,eth1:0.3,we:0.05,wn:0,smpl:42 13.939 us/op 8.2060 us/op 1.70
getExpectedWithdrawals 250000 eb:0.95,eth1:0.7,we:0.05,wn:0,smpl:18 10.530 us/op 11.508 us/op 0.92
getExpectedWithdrawals 250000 eb:0.1,eth1:0.1,we:0,wn:0,smpl:1020 153.86 us/op 129.05 us/op 1.19
getExpectedWithdrawals 250000 eb:0.03,eth1:0.03,we:0,wn:0,smpl:11777 1.1582 ms/op 678.69 us/op 1.71
getExpectedWithdrawals 250000 eb:0.01,eth1:0.01,we:0,wn:0,smpl:16384 1.0762 ms/op 913.40 us/op 1.18
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,smpl:16384 947.83 us/op 1.0932 ms/op 0.87
getExpectedWithdrawals 250000 eb:0,eth1:0,we:0,wn:0,nocache,smpl:16384 1.9349 ms/op 2.7804 ms/op 0.70
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,smpl:16384 1.7822 ms/op 1.8019 ms/op 0.99
getExpectedWithdrawals 250000 eb:0,eth1:1,we:0,wn:0,nocache,smpl:16384 3.3222 ms/op 4.8357 ms/op 0.69
Tree 40 250000 create 230.64 ms/op 252.45 ms/op 0.91
Tree 40 250000 get(125000) 111.00 ns/op 117.30 ns/op 0.95
Tree 40 250000 set(125000) 678.72 ns/op 796.94 ns/op 0.85
Tree 40 250000 toArray() 9.7575 ms/op 21.021 ms/op 0.46
Tree 40 250000 iterate all - toArray() + loop 9.9683 ms/op 21.574 ms/op 0.46
Tree 40 250000 iterate all - get(i) 43.422 ms/op 52.900 ms/op 0.82
MutableVector 250000 create 10.069 ms/op 10.444 ms/op 0.96
MutableVector 250000 get(125000) 5.5920 ns/op 5.9100 ns/op 0.95
MutableVector 250000 set(125000) 202.68 ns/op 216.86 ns/op 0.93
MutableVector 250000 toArray() 2.1597 ms/op 3.1824 ms/op 0.68
MutableVector 250000 iterate all - toArray() + loop 2.5351 ms/op 3.3455 ms/op 0.76
MutableVector 250000 iterate all - get(i) 1.3410 ms/op 1.3468 ms/op 1.00
Array 250000 create 1.9415 ms/op 2.7194 ms/op 0.71
Array 250000 clone - spread 1.0207 ms/op 987.27 us/op 1.03
Array 250000 get(125000) 0.57900 ns/op 0.58100 ns/op 1.00
Array 250000 set(125000) 0.65000 ns/op 0.61600 ns/op 1.06
Array 250000 iterate all - loop 77.744 us/op 78.219 us/op 0.99
effectiveBalanceIncrements clone Uint8Array 300000 12.567 us/op 11.511 us/op 1.09
effectiveBalanceIncrements clone MutableVector 300000 364.00 ns/op 318.00 ns/op 1.14
effectiveBalanceIncrements rw all Uint8Array 300000 169.12 us/op 173.07 us/op 0.98
effectiveBalanceIncrements rw all MutableVector 300000 64.566 ms/op 61.339 ms/op 1.05
phase0 afterProcessEpoch - 250000 vs - 7PWei 80.320 ms/op 78.789 ms/op 1.02
phase0 beforeProcessEpoch - 250000 vs - 7PWei 29.465 ms/op 32.440 ms/op 0.91
altair processEpoch - mainnet_e81889 360.36 ms/op 363.30 ms/op 0.99
mainnet_e81889 - altair beforeProcessEpoch 49.044 ms/op 46.725 ms/op 1.05
mainnet_e81889 - altair processJustificationAndFinalization 16.426 us/op 8.7960 us/op 1.87
mainnet_e81889 - altair processInactivityUpdates 5.1969 ms/op 5.0674 ms/op 1.03
mainnet_e81889 - altair processRewardsAndPenalties 58.014 ms/op 48.836 ms/op 1.19
mainnet_e81889 - altair processRegistryUpdates 3.0420 us/op 1.2150 us/op 2.50
mainnet_e81889 - altair processSlashings 821.00 ns/op 313.00 ns/op 2.62
mainnet_e81889 - altair processEth1DataReset 1.0980 us/op 310.00 ns/op 3.54
mainnet_e81889 - altair processEffectiveBalanceUpdates 924.90 us/op 902.29 us/op 1.03
mainnet_e81889 - altair processSlashingsReset 3.9540 us/op 2.0690 us/op 1.91
mainnet_e81889 - altair processRandaoMixesReset 5.2210 us/op 3.1120 us/op 1.68
mainnet_e81889 - altair processHistoricalRootsUpdate 1.0130 us/op 451.00 ns/op 2.25
mainnet_e81889 - altair processParticipationFlagUpdates 2.1860 us/op 1.2020 us/op 1.82
mainnet_e81889 - altair processSyncCommitteeUpdates 925.00 ns/op 394.00 ns/op 2.35
mainnet_e81889 - altair afterProcessEpoch 84.434 ms/op 85.684 ms/op 0.99
capella processEpoch - mainnet_e217614 1.1862 s/op 1.2378 s/op 0.96
mainnet_e217614 - capella beforeProcessEpoch 215.48 ms/op 198.75 ms/op 1.08
mainnet_e217614 - capella processJustificationAndFinalization 17.229 us/op 7.1530 us/op 2.41
mainnet_e217614 - capella processInactivityUpdates 15.022 ms/op 13.391 ms/op 1.12
mainnet_e217614 - capella processRewardsAndPenalties 269.93 ms/op 238.55 ms/op 1.13
mainnet_e217614 - capella processRegistryUpdates 20.150 us/op 12.678 us/op 1.59
mainnet_e217614 - capella processSlashings 546.00 ns/op 342.00 ns/op 1.60
mainnet_e217614 - capella processEth1DataReset 481.00 ns/op 505.00 ns/op 0.95
mainnet_e217614 - capella processEffectiveBalanceUpdates 3.3342 ms/op 3.1561 ms/op 1.06
mainnet_e217614 - capella processSlashingsReset 1.8430 us/op 1.4570 us/op 1.26
mainnet_e217614 - capella processRandaoMixesReset 3.1780 us/op 2.8830 us/op 1.10
mainnet_e217614 - capella processHistoricalRootsUpdate 565.00 ns/op 444.00 ns/op 1.27
mainnet_e217614 - capella processParticipationFlagUpdates 2.0710 us/op 883.00 ns/op 2.35
mainnet_e217614 - capella afterProcessEpoch 205.91 ms/op 198.49 ms/op 1.04
phase0 processEpoch - mainnet_e58758 339.09 ms/op 346.99 ms/op 0.98
mainnet_e58758 - phase0 beforeProcessEpoch 102.02 ms/op 95.395 ms/op 1.07
mainnet_e58758 - phase0 processJustificationAndFinalization 11.276 us/op 9.6680 us/op 1.17
mainnet_e58758 - phase0 processRewardsAndPenalties 51.597 ms/op 47.363 ms/op 1.09
mainnet_e58758 - phase0 processRegistryUpdates 9.5470 us/op 4.8730 us/op 1.96
mainnet_e58758 - phase0 processSlashings 535.00 ns/op 292.00 ns/op 1.83
mainnet_e58758 - phase0 processEth1DataReset 750.00 ns/op 289.00 ns/op 2.60
mainnet_e58758 - phase0 processEffectiveBalanceUpdates 822.89 us/op 725.29 us/op 1.13
mainnet_e58758 - phase0 processSlashingsReset 1.7730 us/op 1.4060 us/op 1.26
mainnet_e58758 - phase0 processRandaoMixesReset 3.5710 us/op 1.6370 us/op 2.18
mainnet_e58758 - phase0 processHistoricalRootsUpdate 536.00 ns/op 284.00 ns/op 1.89
mainnet_e58758 - phase0 processParticipationRecordUpdates 3.0330 us/op 2.4030 us/op 1.26
mainnet_e58758 - phase0 afterProcessEpoch 66.681 ms/op 67.389 ms/op 0.99
phase0 processEffectiveBalanceUpdates - 250000 normalcase 946.39 us/op 887.51 us/op 1.07
phase0 processEffectiveBalanceUpdates - 250000 worstcase 0.5 1.8728 ms/op 1.1074 ms/op 1.69
altair processInactivityUpdates - 250000 normalcase 16.533 ms/op 15.463 ms/op 1.07
altair processInactivityUpdates - 250000 worstcase 20.013 ms/op 14.796 ms/op 1.35
phase0 processRegistryUpdates - 250000 normalcase 7.4040 us/op 3.4560 us/op 2.14
phase0 processRegistryUpdates - 250000 badcase_full_deposits 370.23 us/op 246.78 us/op 1.50
phase0 processRegistryUpdates - 250000 worstcase 0.5 109.62 ms/op 104.27 ms/op 1.05
altair processRewardsAndPenalties - 250000 normalcase 59.657 ms/op 50.910 ms/op 1.17
altair processRewardsAndPenalties - 250000 worstcase 49.411 ms/op 53.187 ms/op 0.93
phase0 getAttestationDeltas - 250000 normalcase 5.8217 ms/op 5.0912 ms/op 1.14
phase0 getAttestationDeltas - 250000 worstcase 5.3283 ms/op 4.9587 ms/op 1.07
phase0 processSlashings - 250000 worstcase 1.4411 ms/op 1.5576 ms/op 0.93
altair processSyncCommitteeUpdates - 250000 104.70 ms/op 104.16 ms/op 1.01
BeaconState.hashTreeRoot - No change 319.00 ns/op 288.00 ns/op 1.11
BeaconState.hashTreeRoot - 1 full validator 117.50 us/op 104.33 us/op 1.13
BeaconState.hashTreeRoot - 32 full validator 1.6118 ms/op 1.3449 ms/op 1.20
BeaconState.hashTreeRoot - 512 full validator 18.321 ms/op 13.384 ms/op 1.37
BeaconState.hashTreeRoot - 1 validator.effectiveBalance 131.66 us/op 143.56 us/op 0.92
BeaconState.hashTreeRoot - 32 validator.effectiveBalance 2.1911 ms/op 1.9368 ms/op 1.13
BeaconState.hashTreeRoot - 512 validator.effectiveBalance 22.247 ms/op 25.543 ms/op 0.87
BeaconState.hashTreeRoot - 1 balances 115.05 us/op 129.02 us/op 0.89
BeaconState.hashTreeRoot - 32 balances 1.0578 ms/op 867.73 us/op 1.22
BeaconState.hashTreeRoot - 512 balances 9.1723 ms/op 10.921 ms/op 0.84
BeaconState.hashTreeRoot - 250000 balances 172.43 ms/op 168.14 ms/op 1.03
aggregationBits - 2048 els - zipIndexesInBitList 11.615 us/op 9.3830 us/op 1.24
regular array get 100000 times 30.129 us/op 30.568 us/op 0.99
wrappedArray get 100000 times 29.528 us/op 30.557 us/op 0.97
arrayWithProxy get 100000 times 9.3758 ms/op 10.008 ms/op 0.94
ssz.Root.equals 244.00 ns/op 235.00 ns/op 1.04
byteArrayEquals 237.00 ns/op 225.00 ns/op 1.05
shuffle list - 16384 els 4.5248 ms/op 4.3707 ms/op 1.04
shuffle list - 250000 els 67.088 ms/op 64.246 ms/op 1.04
processSlot - 1 slots 17.512 us/op 13.379 us/op 1.31
processSlot - 32 slots 2.3465 ms/op 2.5688 ms/op 0.91
getEffectiveBalanceIncrementsZeroInactive - 250000 vs - 7PWei 40.174 ms/op 40.429 ms/op 0.99
getCommitteeAssignments - req 1 vs - 250000 vc 2.2714 ms/op 2.3571 ms/op 0.96
getCommitteeAssignments - req 100 vs - 250000 vc 3.4412 ms/op 3.7899 ms/op 0.91
getCommitteeAssignments - req 1000 vs - 250000 vc 3.7478 ms/op 4.2685 ms/op 0.88
RootCache.getBlockRootAtSlot - 250000 vs - 7PWei 5.7200 ns/op 6.5600 ns/op 0.87
state getBlockRootAtSlot - 250000 vs - 7PWei 595.89 ns/op 1.2790 us/op 0.47
computeProposers - vc 250000 6.5371 ms/op 8.1641 ms/op 0.80
computeEpochShuffling - vc 250000 69.796 ms/op 75.813 ms/op 0.92
getNextSyncCommittee - vc 250000 114.74 ms/op 137.36 ms/op 0.84
computeSigningRoot for AttestationData 21.256 us/op 24.369 us/op 0.87
hash AttestationData serialized data then Buffer.toString(base64) 1.2717 us/op 1.2953 us/op 0.98
toHexString serialized data 799.09 ns/op 834.04 ns/op 0.96
Buffer.toString(base64) 172.89 ns/op 158.43 ns/op 1.09

by benchmarkbot/action

@dapplion
Copy link
Contributor

Some todos:

  • proof on a benchmark the need for packages/beacon-node/src/util/fullOrBlindedBlock.ts. Compare the difference between the two points below, and unless there's a massive difference, just do the simpler strategy merging structs. After doing the benchmarks, persist the results in code, add a new comment to this PR with the results, and delete the losing codepath.
    • deserialize, merge structs, serialize
    • serialize exec payload, merge as bytes
  • convert the fixtures to .ssz format to reduce the diff

@matthewkeil
Copy link
Member Author

matthewkeil commented Oct 11, 2023

  • convert the fixtures to .ssz format to reduce the diff

@dapplion I am working on that conversion now. When I did it this evening the unit tests for capella broke. I had logic to convert the mainnet mocks to work with the minimal testing preset in the mock loading file. I manually converted them tonight to minimal config before saving them serialized but something needs debugging. I was modifing the raw JSON before using the @lodestar/types because of how the LODESTAR_PRESET flows into the ssz types but when I hand converted something was not converted correctly. Ill find it in the AM and push the changes.

  • proof on a benchmark the need for packages/beacon-node/src/util/fullOrBlindedBlock.ts. Compare the difference between the two points below, and unless there's a massive difference, just do the simpler strategy merging structs. After doing the benchmarks, persist the results in code, add a new comment to this PR with the results, and delete the losing codepath.

    • deserialize, merge structs, serialize
    • serialize exec payload, merge as bytes

I remembered chatting with you about this a couple weeks ago and got it ready for you :) Apologies, I should have brought this up when we spoke before standup. I forgot with all the other stuff we chatted about.

I posted those results from the perf test on the issue:
#5671 (comment)

I copied the results in that comment below so they are part of this PR too for visibility.

The test file is in this commit of this PR:
4112724

The results seem like the serialize is the way to go, is a couple of orders of magnitude faster, but would love to get your opinion before I delete the losing codepath. The perf test is in the commit linked above so you can check the methodology. I will leave the perf test as part of this PR if serialize is how you want to go.

I was thinking about removing the generator function and just returning a promise after our discussion before standup. I will rerun the perf tests like that to compare and post them in a comment below tomorrow once I sort out the mock serialization bug.

  fullOrBlindedBlock
    BlindedOrFull to full
      phase0
        ✔ phase0 to full - deserialize first                                  9646.737 ops/s    103.6620 us/op        -       4856 runs  0.617 s
        ✔ phase0 to full - convert serialized                                  2865330 ops/s    349.0000 ns/op        -    1740003 runs  0.909 s
      altair
        ✔ altair to full - deserialize first                                  5352.431 ops/s    186.8310 us/op        -       2699 runs  0.697 s
        ✔ altair to full - convert serialized                                  2967359 ops/s    337.0000 ns/op        -    1598138 runs  0.808 s
      bellatrix
        ✔ bellatrix to full - deserialize first                               3991.474 ops/s    250.5340 us/op        -       1208 runs  0.553 s
        ✔ bellatrix to full - convert serialized                               2463054 ops/s    406.0000 ns/op        -     879455 runs  0.505 s
      capella
        ✔ capella to full - deserialize first                                 3660.175 ops/s    273.2110 us/op        -       1846 runs  0.783 s
        ✔ capella to full - convert serialized                                 2364066 ops/s    423.0000 ns/op        -    2012155 runs   1.21 s
      deneb
        ✔ deneb to full - deserialize first                                   3621.915 ops/s    276.0970 us/op        -       1827 runs  0.806 s
        ✔ deneb to full - convert serialized                                   2398082 ops/s    417.0000 ns/op        -     506726 runs  0.303 s
    BlindedOrFull to blinded
      phase0
        ✔ phase0 to blinded - deserialize first                               12937.95 ops/s    77.29200 us/op        -       4230 runs  0.404 s
        ✔ phase0 to blinded - convert serialized                           1.000000e+7 ops/s    100.0000 ns/op        -    3120420 runs  0.606 s
      altair
        ✔ altair to blinded - deserialize first                               7185.198 ops/s    139.1750 us/op        -       2170 runs  0.439 s
        ✔ altair to blinded - convert serialized                               9900990 ops/s    101.0000 ns/op        -    2588758 runs  0.505 s
      bellatrix
        ✔ bellatrix to blinded - deserialize first                            100.1679 ops/s    9.983241 ms/op        -         76 runs   1.26 s
        ✔ bellatrix to blinded - convert serialized                           92.22430 ops/s    10.84313 ms/op        -        117 runs   1.77 s
      capella
        ✔ capella to blinded - deserialize first                              45.29530 ops/s    22.07735 ms/op        -         48 runs   1.58 s
        ✔ capella to blinded - convert serialized                             43.09465 ops/s    23.20474 ms/op        -         50 runs   1.66 s
      deneb
        ✔ deneb to blinded - deserialize first                                45.42834 ops/s    22.01269 ms/op        -         51 runs   1.63 s
        ✔ deneb to blinded - convert serialized                               46.20545 ops/s    21.64247 ms/op        -         46 runs   1.50 s

@matthewkeil
Copy link
Member Author

matthewkeil commented Oct 11, 2023

Some todos:

  • proof on a benchmark the need for packages/beacon-node/src/util/fullOrBlindedBlock.ts. Compare the difference between the two points below, and unless there's a massive difference, just do the simpler strategy merging structs. After doing the benchmarks, persist the results in code, add a new comment to this PR with the results, and delete the losing codepath.

    • deserialize, merge structs, serialize
    • serialize exec payload, merge as bytes
  • convert the fixtures to .ssz format to reduce the diff

btw @dapplion I added these, and one for converting from generator and retesting perf, to the checklist above

@dapplion
Copy link
Contributor

@matthewkeil thanks! the differences in performance do not justify doing the complex byte manipulation IMO. Just merge structs.

Copy link
Contributor

@g11tech g11tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just blocking right now for a deeper review as it might affect some of the critical paths i want to double check + with the produceblockv3 PR types and helpers...

will also dig into the mergemock requirements

@matthewkeil
Copy link
Member Author

⚠️ Performance Alert ⚠️

Possible performance regression was detected for some benchmarks. Benchmark result of this commit is worse than the previous benchmark result exceeding threshold.

Benchmark suite Current: 49ab90f Previous: 3a6702e Ratio
forkChoice updateHead vc 600000 bc 64 eq 300000 72.026 ms/op 18.857 ms/op 3.82
Full benchmark results

by benchmarkbot/action

@dapplion there is a benchmark regression after removing the serialized blinding/unblinding. There is not a big difference in time for the blinding process and the increase in the updateHead test seems higher than it should be.

@g11tech g11tech changed the base branch from unstable to deneb-builder October 30, 2023 07:50
return firstByte - readExtraDataOffsetAt > 92;
}

// same as isBlindedSignedBeaconBlock but without type narrowing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the issue with type narrowing?

canonical,
header: {
message: blockToHeader(config, block.message),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its cleaner to extend blockToHeader to accept full or blinded,

also then the root above can be calulated from the header returned by hashtree root of the blockheader ... it should be more efficient since body won't be merklized twice

Base automatically changed from deneb-builder to unstable October 30, 2023 13:42
@codecov
Copy link

codecov bot commented Oct 30, 2023

We're currently processing your upload. This comment will be updated when the results are available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

De-duplicate payload from persisted beacon blocks
3 participants