New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mainnet full archive state database in 167.47 GiB, pruned 25.03 GiB #971
Open
jlokier
wants to merge
3
commits into
master
Choose a base branch
from
jl/tinystore2
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
You can now tell if it's a missing file or a problem with the EVM shared library. The non-error loading is clarified as well. Signed-off-by: Jamie Lokier <jamie@shareable.org>
Adds changes to the fixtures test harness for the new [Arrow Glacier fork](https://eips.ethereum.org/EIPS/eip-4345). While here, sorted the fork case statement into the actual order of forks, and sorted the "At5" pseudo-forks used for testing into their own section also in time order. This now more closely matches the Hive integration code in `hive_integration/nodocker/consensus/extract_consensus_data.nim`. Signed-off-by: Jamie Lokier <jamie@shareable.org>
A saving of about 24,000 GB over the current Mainnet disk space. At last, it's feasible to work with all Mainnet states up to the current head block. With this patch, Nimbus-eth1 has access to the entire Mainnet state history, by reading from a specially-constructed database file of size 167.47 GiB which contains all the states. For the first time ever, it's possible to run Nimbus-eth1 on high numbered Mainnet blocks! Validate the processing, and run things like current-day transactions on current-day states. It's read-only at the moment. The format is not a fixed read-only format. It's actually designed to be part of a writable database, but it's been kept simple to ship something and be a proof of concept emphasising size. Using this ability, Nimbus-eth1 can validate blocks throughout the whole history, and a number of blocks have been tried. A smaller 25.03 GiB file is available for 90k blocks pruned state. Files for Goerli are also available. These files are available on request to test this code if someone wants to. (They can also be regenerated, but doing so requires a big machine and a synced Erigon instance. You may prefer to just get the files). This frees people up to work on other areas with _full_ access to the Mainnet states, all the way from block 0 to near today's head block. Each value is looked up, and compared with the value stored in RocksDB. Ultimately RocksDB can be dropped, but this is meant to be a proof of concept so for now it just compares values. Some blocks fail. Close investigations with Etherscan's help indicate the data in the database file is fine, and it is the comparison function that misses some balance updates, for example when a transaction involves the same account as the miner. **Size figures** **Mainnet Ethereum "archive mode" state history in 167.47 GiB**. (Blocks 0 to 13818907. The final block is dated 2021-12-16 22:38:47 UTC). This compares extremely favourably\* with [8,646 GiB and 8,719 GiB (charts)](https://etherscan.io/chartsync/chainarchive) used by popular implementations Geth and OpenEthereum respectively at the same time frame. It's a profound improvement over [22,911 GiB for Nimbus-eth1](#863) (= 24.6 TB), which this approach to storage was designed to address. \* Note that those Etherscan charts show space used by other things than just state-history, but state-history accounts for almost all of that space. To finish the comparison, minimum required Merkle hashes, block bodies, block headers, contract code and receipts must be added. Some more space on top is required in an actively updated database. Some experiments have been done and there are good reasons to believe all those things can be fitted in less than 420 GiB more "estimated worst case". **Pruned size** "Pruned mode" state history comes to **25.03 GiB**. (Blocks 13728908-13818907, 90k history). This also compares favourably\* with [pruned mode charts](https://etherscan.io/chartsync/chaindefault), but the picture is more complicated with pruned state, as the other things contribute more significantly to the size in those charts. Even so, the pruned state size is promising. **Lookup performance** Any account or storage can be looked up at any point in block time in O(log N) time using these files. This proof of concept is focused more on demonstrating small size than time, so the constant factor of the big-O notation is quite high, but when fully optimised the constant factor will have low IOPS, and reasonable for CPU and memory. **Space first, speed second** This is a proof of concept designed to highlight _space used_, rather than time. The compact database is part of an implementation in progress of an on-disk data structure designed to be fast as well, for Ethereum use cases. Specifically, fast at random-access writes for EVM execution, and fast with low write-amplification for network state synchronisation. It is neither a B-tree nor an LSM-tree but has elements of both. However the current implementation, although O(log N), has a high constant time and I/O factor. The number of I/O operations (IOPS) is significantly higher than necessary. Speed will improve greatly when index blocks and structures inside each block are added to improve the performance. With those in place, the IOPS will drop to _less than 1 IOPS_ average per account/storage query during EVM executions, even at Mainnet archive scale. The structure is also designed to support fast network sync, and to store the received data efficiently without write-amplification. The ad-hoc encoding of individual values has been through many iterations to optimise the assignment of bits and opcodes to different purposes, but a number of improvements are still known that would reduce size further. Signed-off-by: Jamie Lokier <jamie@shareable.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A saving of about 24,000 GB over current Mainnet disk usage!
A smaller 25.03 GiB file is available for 90k blocks pruned state. Files for Goerli are also available.
The data files are available on request to test this code if someone wants to. (They can also be regenerated, but doing so requires a big machine and a synced Erigon instance. You may prefer to just get the files).
What can it do?
At long last, it's feasible to work with all Mainnet states up to the current head block. With this patch, Nimbus-eth1 has access to the entire Mainnet state history, by reading from a specially-constructed database file of size 167.47 GiB which contains all the states.
For the first time ever, it's possible to run Nimbus-eth1 on high numbered Mainnet blocks. Validate the processing, and run things like current-day transactions on current-day states.
Using this ability, Nimbus-eth1 can validate blocks throughout the whole history, and a number of blocks have been tried on both Mainnet and Goerli. The data appears to be fine.
This frees people up to work on other areas with full access to the Mainnet states, all the way from block 0 to near today's head block.
Each value is looked up, and compared with the value stored in RocksDB. Ultimately RocksDB can be dropped, but this is meant to be a proof of concept so for now it just compares values.
Some blocks fail. Close investigations with Etherscan's help indicate the data in the database file looks fine, and it is the comparison function that misses some balance updates, for example when a transaction involves the same account as the miner.
What can't it do
It's read-only at the moment. The format is not a fixed read-only format. It's actually designed to be part of a writable database, but it's been kept simple to ship something and be a proof of concept emphasising size.
The proof of concept is also the reason the ad-hoc encoding used to compress the data (which has actually been tuned over many iterations) is kept simple in this version: So it's easy to understand. With a more fiddly encoding, smaller files are possible.
This is just the history of all account and storage states. I.e. the part which is particularly large, and worth optimising down while keeping good performance. The blocks, receipts and Merkle hashes and other indexes are not included here. When those are added the size will triple, but not necessarily a lot more than that. 600 GiB or lower is feasible for the whole thing.
The Merkle trie nodes or their hashes are an area where I know some would like confirmation of (a) size of a compact representation, and (b) that the calculations for Merkle state-root update can be performed efficiently. The short answer is both have been investigated, but not implemented for this proof of concept. It's the next obvious thing to do. There's a high probability of an upper bound of 70 GiB on these.
Mainnet full archive history size
Mainnet Ethereum "archive mode" state history in 167.47 GiB.
(Blocks 0 to 13818907. The final block is dated 2021-12-16 22:38:47 UTC).
This compares extremely favourably* with 8,646 GiB and 8,719 GiB (charts) used by popular implementations Geth and OpenEthereum respectively at the same time frame.
It's a profound improvement over 22,911 GiB for Nimbus-eth1 (= 24.6 TB), which this approach to storage was designed to address.
* Note that those Etherscan charts show space used by other things than just state-history, but state-history accounts for almost all of that space. To finish the comparison, minimum required Merkle hashes, block bodies, block headers, contract code and receipts must be added. Some more space on top is required in an actively updated database. Some experiments have been done and there are good reasons to believe all those things can be fitted in less than 420 GiB more "estimated worst case".
Pruned size
"Pruned mode" state history comes to 25.03 GiB. (Blocks 13728908-13818907, 90k history).
This also compares favourably* with pruned mode charts, but the picture is more complicated with pruned state, as the other things contribute more significantly to the size in those charts. Even so, the pruned state size is promising.
Lookup performance
Any account or storage can be looked up at any point in block time in O(log N) time using these files. This proof of concept is focused more on demonstrating small size than time, so the constant factor of the big-O notation is quite high, but when fully optimised the constant factor will have low IOPS, and reasonable for CPU and memory.
Space first, speed second
This is a proof of concept designed to highlight space used, rather than time.
The compact database is part of an implementation in progress of an on-disk data structure designed to be fast as well, for Ethereum use cases. Specifically, fast at random-access writes for EVM execution, fast with low write-amplification for network state synchronisation, and fast at random-access reads for EVM and Merkle updates. It is neither a B-tree nor an LSM-tree but has elements of both.
However the current implementation, although O(log N), has a high constant time and I/O factor. The number of I/O operations (IOPS) is significantly higher than necessary.
Speed will improve greatly when index blocks and structures inside each block are added to improve the performance. With those in place, the IOPS will drop to about 1, and sometimes less than 1 average, IOPS per account/storage query during EVM executions, even at Mainnet archive scale.
The structure is also designed to support fast network sync, and to store the received data efficiently without write-amplification.
The ad-hoc encoding of individual values has been through many iterations to optimise the assignment of bits and opcodes to different purposes, but a number of improvements are still known that would reduce size further.