Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mainnet full archive state database in 167.47 GiB, pruned 25.03 GiB #971

Closed
wants to merge 3 commits into from

Commits on Dec 14, 2021

  1. EVMC: Improved error messages when loading another EVM

    You can now tell if it's a missing file or a problem with the EVM shared
    library.  The non-error loading is clarified as well.
    
    Signed-off-by: Jamie Lokier <jamie@shareable.org>
    jlokier committed Dec 14, 2021
    Configuration menu
    Copy the full SHA
    0b5ecfd View commit details
    Browse the repository at this point in the history
  2. Tests: Add Arrow Glacier fork to fixtures execution test harness

    Adds changes to the fixtures test harness for the new
    [Arrow Glacier fork](https://eips.ethereum.org/EIPS/eip-4345).
    
    While here, sorted the fork case statement into the actual order of forks, and
    sorted the "At5" pseudo-forks used for testing into their own section also in
    time order.
    
    This now more closely matches the Hive integration code in
    `hive_integration/nodocker/consensus/extract_consensus_data.nim`.
    
    Signed-off-by: Jamie Lokier <jamie@shareable.org>
    jlokier committed Dec 14, 2021
    Configuration menu
    Copy the full SHA
    78a88d9 View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2022

  1. Mainnet full archive state database in 167.47GiB, pruned 25.03GiB

    A saving of about 24,000 GB over the current Mainnet disk space.  At last, it's
    feasible to work with all Mainnet states up to the current head block.
    
    With this patch, Nimbus-eth1 has access to the entire Mainnet state history, by
    reading from a specially-constructed database file of size 167.47 GiB which
    contains all the states.
    
    For the first time ever, it's possible to run Nimbus-eth1 on high numbered
    Mainnet blocks!  Validate the processing, and run things like current-day
    transactions on current-day states.
    
    It's read-only at the moment.  The format is not a fixed read-only format.
    It's actually designed to be part of a writable database, but it's been kept
    simple to ship something and be a proof of concept emphasising size.
    
    Using this ability, Nimbus-eth1 can validate blocks throughout the whole
    history, and a number of blocks have been tried.
    
    A smaller 25.03 GiB file is available for 90k blocks pruned state.  Files for
    Goerli are also available.
    
    These files are available on request to test this code if someone wants to.
    (They can also be regenerated, but doing so requires a big machine and a synced
    Erigon instance.  You may prefer to just get the files).
    
    This frees people up to work on other areas with _full_ access to the Mainnet
    states, all the way from block 0 to near today's head block.
    
    Each value is looked up, and compared with the value stored in RocksDB.
    Ultimately RocksDB can be dropped, but this is meant to be a proof of concept
    so for now it just compares values.
    
    Some blocks fail.  Close investigations with Etherscan's help indicate the data
    in the database file is fine, and it is the comparison function that misses
    some balance updates, for example when a transaction involves the same account
    as the miner.
    
    **Size figures**
    
    **Mainnet Ethereum "archive mode" state history in 167.47&nbsp;GiB**.
    (Blocks 0 to 13818907.  The final block is dated 2021-12-16 22:38:47 UTC).
    
    This compares extremely favourably\* with [8,646&nbsp;GiB and 8,719&nbsp;GiB
    (charts)](https://etherscan.io/chartsync/chainarchive) used by popular
    implementations Geth and OpenEthereum respectively at the same time frame.
    
    It's a profound improvement over [22,911&nbsp;GiB for
    Nimbus-eth1](#863)
    (=&nbsp;24.6&nbsp;TB), which this approach to storage was designed to address.
    
    \* Note that those Etherscan charts show space used by other things than just
    state-history, but state-history accounts for almost all of that space.  To
    finish the comparison, minimum required Merkle hashes, block bodies, block
    headers, contract code and receipts must be added.  Some more space on top is
    required in an actively updated database.  Some experiments have been done and
    there are good reasons to believe all those things can be fitted in less than
    420&nbsp;GiB more "estimated worst case".
    
    **Pruned size**
    
    "Pruned mode" state history comes to **25.03&nbsp;GiB**.  (Blocks
    13728908-13818907, 90k history).
    
    This also compares favourably\* with [pruned mode
    charts](https://etherscan.io/chartsync/chaindefault), but the picture is more
    complicated with pruned state, as the other things contribute more
    significantly to the size in those charts.  Even so, the pruned state size is
    promising.
    
    **Lookup performance**
    
    Any account or storage can be looked up at any point in block time in O(log N)
    time using these files.  This proof of concept is focused more on demonstrating
    small size than time, so the constant factor of the big-O notation is quite
    high, but when fully optimised the constant factor will have low IOPS, and
    reasonable for CPU and memory.
    
    **Space first, speed second**
    
    This is a proof of concept designed to highlight _space used_, rather than time.
    
    The compact database is part of an implementation in progress of an on-disk
    data structure designed to be fast as well, for Ethereum use cases.
    Specifically, fast at random-access writes for EVM execution, and fast with low
    write-amplification for network state synchronisation.  It is neither a B-tree
    nor an LSM-tree but has elements of both.
    
    However the current implementation, although O(log N), has a high constant time
    and I/O factor.  The number of I/O operations (IOPS) is significantly higher
    than necessary.
    
    Speed will improve greatly when index blocks and structures inside each block
    are added to improve the performance.  With those in place, the IOPS will drop
    to _less than 1 IOPS_ average per account/storage query during EVM executions,
    even at Mainnet archive scale.
    
    The structure is also designed to support fast network sync, and to store the
    received data efficiently without write-amplification.
    
    The ad-hoc encoding of individual values has been through many iterations to
    optimise the assignment of bits and opcodes to different purposes, but a number
    of improvements are still known that would reduce size further.
    
    Signed-off-by: Jamie Lokier <jamie@shareable.org>
    jlokier committed Feb 16, 2022
    Configuration menu
    Copy the full SHA
    08fbe1d View commit details
    Browse the repository at this point in the history