Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: Optimize serialization and enhance metadata of dumptxoutset output #29612

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Mar 10, 2024

The second attempt at implementing the dumptxoutset space optimization as suggested in #25675. Closes #25675.

This builds on the work done in #26045, addresses open feedback, adds some further improvements (most importantly usage of compact size), documentation, and an additional test.

The original snapshot at height 830,000 came in at 10.82 GB. With this change, the same snapshot is 8.94 GB, a reduction of 17.4%.

This also enhances the metadata of the output file and adds the following data to allow for better error handling and make future upgrades easier:

  • A newly introduced utxo set magic
  • A version number
  • The network magic
  • The block height

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 10, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK theStack
Stale ACK TheCharlatan, mzumsande, Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29775 (Testnet4 including PoW difficulty adjustment fix by fjahr)
  • #29307 (util: check for errors after close and read in AutoFile by vasild)
  • #28670 (assumeutxo, rpc: Improve EOF error when reading snapshot metadata in loadtxoutset by pablomartin4btc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22485931752

@Sjors
Copy link
Member

Sjors commented Mar 11, 2024

我最近正在休假中

Well, I'm not on holiday :-)

tACK b0cfbce

I do suggest adding a comment to PopulateAndValidateSnapshot in validation.h:

To reduce space, this function takes advantage of the guarantee by leveldb that keys are lexicographically sorted.

I generated a snapshot and loaded it using the patch in #29551 (only waited for it to reach the tip, not a full background sync).

For those testing who don't want to wait out the long flush, see #28358.

@fjahr fjahr force-pushed the 2024-03-pr26045-reopen branch 2 times, most recently from 33aa19d to 97eb214 Compare March 11, 2024 20:59
@fjahr
Copy link
Contributor Author

fjahr commented Mar 11, 2024

Thanks for the review and testing!

I do suggest adding a comment to PopulateAndValidateSnapshot in validation.h:

To reduce space, this function takes advantage of the guarantee by leveldb that keys are lexicographically sorted.

As it is worded, wouldn't this be more appropriate to add in CreateUTXOSnapshot? That's where we are iterating with the pcursor, leveraging the sorting, and writing the coins to file, so I would say that's also where the space savings are realized.

I have now amended the comment a bit and added a version of it to both CreateUTXOSnapshot and PopulateAndValidateSnapshot. Please let me know if you think this is alright: 923ec90

@Sjors
Copy link
Member

Sjors commented Mar 12, 2024

re-ACK 923ec90

That looks good.

The reason I suggested (also) writing something in the header is because it's more likely to be noticed by some future dev looking through existing leveldb uses, seeing if they can replace it.

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ACK 923ec90

Just some comment nits that could easily be done in a follow-up too.

src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
src/rpc/blockchain.cpp Outdated Show resolved Hide resolved
Coin coin;
unsigned int iter{0};
std::vector<std::pair<uint32_t, Coin>> coins;

// To reduce space the serialization format of the snapshot avoids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Might it be useful to add a small diagram of the format to the dumptxoutset help output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will think about what that could look like if I have to retouch or in the follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time thinking about a diagram that would work but I wasn't very happy what I could come up with. I instead copied the expression used in the issue. It shows the format in a concise way that I find understandable: list(Txid, list((vout,Coin)). Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add the size of the coins list list(Txid, number_of_coins, list((vout,Coin)))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this if I have to retouch, but if not I can do this as a follow-up in one of my other assumeutxo PRs.

test/functional/feature_assumeutxo.py Outdated Show resolved Hide resolved
Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ACK 71d41d5

@DrahtBot DrahtBot requested a review from Sjors March 20, 2024 15:03
@Sjors
Copy link
Member

Sjors commented Apr 5, 2024

I like magic bytes. For PSBT it's just "psbt" in ascii followed by 0xff: https://en.bitcoin.it/wiki/BIP_0174

So you could use "utxo" in ascii: 0x75 0x74 0x78 0x6F 0xFF

But then while you're at it, you could also append:

  • ChainTypeToString ("main", "testnet", "signet", "regtest")
  • height (so we can have an error like "you tried to load a snapshot at height X, but only heights Y, Z are supported by this version of PACKAGE_NAME. Newer releases may support more recent snapshots.")
  • for signet: the signet challenge bytes

@fjahr
Copy link
Contributor Author

fjahr commented Apr 21, 2024

So far I have addressed the minor feedback in the latest push and I am now working on the extended metadata for the snapshot. I am currently planning to add the following:

  1. magic bytes - the ones suggested by @Sjors
  2. snapshot version
  3. Network magic (pchMessageStart) - This lets us distinguish between the different networks and since it’s generated from the challenge for signets, we can deduce that it’s probably a custom signet if we don’t recognize them and give an appropriate feedback to the user
  4. block height

Let me know if you disagree with this choice or have something else to add, otherwise I will have the code ready soon.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24078947679

@Sjors
Copy link
Member

Sjors commented Apr 22, 2024

@fjahr sounds good. Let me know when that's ready and I'll bake a halving block snapshot...

@fjahr fjahr force-pushed the 2024-03-pr26045-reopen branch 4 times, most recently from 4aae939 to cfc780f Compare April 26, 2024 22:14
@fjahr
Copy link
Contributor Author

fjahr commented Apr 26, 2024

Alright, ready for review again. I hope I have addressed all the ideas that were mentioned as intended. Sorry but the enhancement of the metadata is now one big commit for now, I can try to split it up if it makes review easier.

@fjahr fjahr changed the title rpc: Optimize serialization disk space of dumptxoutset - Reloaded rpc: Optimize serialization and enhance metadata of dumptxoutset output Apr 26, 2024
@Sjors
Copy link
Member

Sjors commented Apr 29, 2024

I can try to split it up if it makes review easier.

Doesn't matter that much, but it's probably to put the kernel/chainparams.h change in its own commit so e.g. @TheCharlatan can do a drive-by review.

Here's some new snapshot torrents (not sure if I'm seeding them correctly...).

Torrent for testnet: magnet:?xt=urn:btih:787f5917029876c83301c49dc97bb69224597285&dn=utxo-testnet-2500000.dat&tr=udp%3A%2F%2Ftracker.bitcoin.sprovoost.nl%3A6969

Torrent for signet: magnet:?xt=urn:btih:715a8797d6154a2474f225f6019364b6230eed17&dn=utxo-signet-160000.dat&tr=udp%3A%2F%2Ftracker.bitcoin.sprovoost.nl%3A6969

Torrent for mainnet at the halving block (I'll update #28553 to use this): magnet:?xt=urn:btih:55cb2ea9b6c48aa11d676de871f9639e3fa3dd7e&dn=utxo-840000.dat&tr=udp%3A%2F%2Ftracker.bitcoin.sprovoost.nl%3A6969

@@ -183,4 +185,6 @@ class CChainParams
ChainTxData chainTxData;
};

std::optional<std::string> GetNetworkForMagic(MessageStartChars& pchMessageStart);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6ed6bff: this doesn't seem kernel-worthy. chaintype.h is a better home for it, next to ChainTypeToString.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine where it is, I like keeping these single type utilities small.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK on the enhanced metadata, I think this includes now everything we need. Will update #27432 in a bit to the new format and test.

src/node/utxo_snapshot.h Outdated Show resolved Hide resolved
src/node/utxo_snapshot.h Outdated Show resolved Hide resolved
fjahr and others added 4 commits May 5, 2024 13:18
Co-authored-by: Aurèle Oulès <aurele@oules.com>
Co-authored-by: TheCharlatan <seb.kung@gmail.com>
The following data is added:
- A newly introduced utxo set magic
- A version number
- The network magic
- The block height
@fjahr
Copy link
Contributor Author

fjahr commented May 5, 2024

Rebased, resolved conflicts (the functional tests have changed a bit now), and addressed the open feedback. Thanks everyone for reviewing!

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just some small comments, will review again quickly once they are addressed.

const auto mainnet_msg = CChainParams::Main()->MessageStart();
const auto testnet_msg = CChainParams::TestNet()->MessageStart();
auto regtest_opts = CChainParams::RegTestOptions{};
const auto regtest_msg = CChainParams::RegTest(regtest_opts)->MessageStart();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit b2cf3f7:
Nit: Could just do const auto regtest_msg = CChainParams::RegTest({})->MessageStart(); here and for the signet_msg.

@@ -14,6 +14,7 @@
#include <logging.h>
#include <primitives/block.h>
#include <primitives/transaction.h>
#include <protocol.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit b2cf3f7:
Is this include from a prior iteration? It seems unused (also in the header file).

@@ -2712,7 +2712,7 @@ UniValue CreateUTXOSnapshot(
auto write_coins_to_file = [&](AutoFile& afile, const Txid& last_hash, const std::vector<std::pair<uint32_t, Coin>>& coins, size_t& written_coins_count) {
afile << last_hash;
WriteCompactSize(afile, coins.size());
for (auto [n, coin] : coins) {
for (const auto& [n, coin] : coins) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit b2cf3f7:
Nit: I think this should be moved to the first commit.

@@ -542,3 +543,33 @@ std::unique_ptr<const CChainParams> CChainParams::TestNet()
{
return std::make_unique<const CTestNetParams>();
}

std::vector<int> CChainParams::GetAvailableSnapshotHeights() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit b2cf3f7
Nit (clang-format-diff): Open braces on new line (here and the other new functions and methods below).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

Possible savings in dumptxoutset serialization format (~20%)
7 participants