Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockstorage: Separate reindexing from saving new blocks #29975

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Apr 26, 2024

SaveBlockToDisk / FindBlockPos are used for two purposes, depending on whether they are called during reindexing (dbp set,  fKnown = true) or in the "normal" case when adding new blocks (dbp == nullptr,  fKnown = false).
The actual tasks are quite different

  • In normal mode, preparations for saving a new block are made, which is then saved: find the correct position on disk (maybe skipping to a new blk file), check for available disk space, update the blockfile info db, save the block.
  • during reindex, most of this is not necessary (the block is already on disk after all), only the blockfile info needs to rebuilt because reindex wiped the leveldb it's saved in.

Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone:

  • many code paths in FindBlockPos are conditional on fKnown or !fKnown
  • It's not really clear what actually needs to be done during reindex (we don't need to "save a block to disk" or "find a block pos" as the function names suggest)
  • logic that should be applied to only one of the two modes is sometimes applied to both (see first commit, or blockstorage: do not flush block to disk if it is already there #27039)

#24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged.

This PR proposes to clean this code up by splitting out the reindex logic into a separate function (AddToBlockFileInfo) which will be called directly from validation. As a result, SaveBlockToDisk and FindBlockPos only need to cover the non-reindex logic.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 26, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan
Concept ACK furszy, BrandonOdiwuor
Stale ACK ryanofsky

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24317664317

@TheCharlatan
Copy link
Contributor

Concept ACK

Copy link
Contributor

@paplorinc paplorinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recreated the change to understand it better, commented on what I've noticed.

@@ -189,7 +189,7 @@ BOOST_AUTO_TEST_CASE(blockmanager_flush_block_file)
// to block 2 location.
CBlockFileInfo* block_data = blockman.GetBlockFileInfo(0);
BOOST_CHECK_EQUAL(block_data->nBlocks, 2);
BOOST_CHECK(blockman.SaveBlockToDisk(block3, /*nHeight=*/3, /*dbp=*/&pos2) == pos2);
blockman.AddToBlockFileInfo(block3, /*nHeight=*/3, /*pos=*/pos2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me the related comments needs to be updated in this file (e.g line 184 and 199)

Copy link
Contributor Author

@mzumsande mzumsande Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated some comments. The entire test setup probably makes a bit less sense after the refactor, users unfamiliar
with the history might ask themselves why someone could think that Reindex / AddToBlockFileInfo would change the block files so that we'd require a test making sure it doesn't.

src/node/blockstorage.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.cpp Show resolved Hide resolved
src/node/blockstorage.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.h Outdated Show resolved Hide resolved
src/validation.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.cpp Outdated Show resolved Hide resolved
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Copy link
Contributor

@BrandonOdiwuor BrandonOdiwuor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@mzumsande mzumsande force-pushed the 202404_blockstorage_split_reindex branch from 39ad8d8 to 194e84a Compare April 29, 2024 19:32
@mzumsande
Copy link
Contributor Author

39ad8d8 to 194e84a: Addressed review feedback by @paplorinc, thanks!

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 194e84a. I left a lot of comments, but everything looks right here and the code is a lot nicer than before.

return false;
FlatFilePos blockPos{};
if (dbp) {
blockPos = *dbp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "validation, blockstorage: Separate code paths for reindex and saving new blocks" (a17eaca)

It looks like previously there would have been an error here if dbp->IsNull() was true, and now there will not be an error. This is probably a good change, since AcceptBlock should not be looking at block positions, just passing them on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the reindex case; In this case the passed dbp isn't changed (it's now a const arg to AddToBlockFileInfo). If a dpb was passed to AcceptBlock for which dbp->IsNull(), the error message ("Failed to find position to write new block to disk") would have been very confusing anyway, because we don't write a block to disk during reindex anyway.

src/node/blockstorage.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.cpp Outdated Show resolved Hide resolved
Comment on lines 163 to 164
* @param[in] pos the position of the block on disk. This must point *after* the
* 8 byte serialization header, at the beginning of the actual block data.
Copy link
Contributor

@ryanofsky ryanofsky May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "blockstorage: split up FindBlockPos function" (a2ae0a3)

I think this description is technically accurate, but I got confused by it and thought it was wrong because "the 8 byte serialization header" sounds like something that is part of CBlock serialization, when actually it is referring to separator fields written by WriteBlockToDisk before the serialized CBlock.

Would suggest changing comment to "pos: the position of the serialized CBlock on disk. This is the position returned by WriteBlockToDisk pointing at the CBlock, not the separator fields before it."

I would also suggesting adding two more comments to this commit to make it clear what it happening at this stage of the PR.

In WriteBlockToDisk documentation, "// The pos argument passed to this function is modified by this call. Before this call, it should point to an unused file location where separator fields will be written followed by the serialized CBlock data. After this call, it will point to the beginning of the serialized CBlock data, after the separator fields"

In FindBlockPos documentation, "// The nAddSize argument passed to this function should include not just the size of the serialized CBlock, but the also size of separator fields which are written before it by WriteBlockToDisk (BLOCK_SERIALIZATION_HEADER_SIZE).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done that, but bundled most of the doc changes to a doc-only commit at the beginning in order to not have unrelated things in the "blockstorage: split up FindBlockPos function" commit.

src/node/blockstorage.cpp Outdated Show resolved Hide resolved
src/node/blockstorage.h Outdated Show resolved Hide resolved
src/node/blockstorage.h Outdated Show resolved Hide resolved
src/test/blockmanager_tests.cpp Outdated Show resolved Hide resolved
@ryanofsky
Copy link
Contributor

ryanofsky commented May 7, 2024

With 194e84a, since reindexing regenerates undo data, and undo data shouldn't be added until all existing blocks are, it seems like there is no reason for the AddToBlockFileInfo function to worry about resetting the BlockfileCursor::undo_file field or even accessing the block storage cursors at all. So I think the following simplification would make sense: [ EDIT: This is wrong and the cursors do need to be updated to record the max file number. The diff below is wrong and a better change would be #29975 (comment) ]

--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -941,22 +941,11 @@ bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigne
 void BlockManager::AddToBlockFileInfo(const CBlock& block, unsigned int nHeight, const FlatFilePos& pos)
 {
     LOCK(cs_LastBlockFile);
-
     const unsigned int added_size = ::GetSerializeSize(TX_WITH_WITNESS(block));
-    const BlockfileType chain_type = BlockfileTypeForHeight(nHeight);
-    // -reindex and -reindex-chainstate delete any snapshot chainstate during init
-    Assume(chain_type == BlockfileType::NORMAL);
-
     const int nFile = pos.nFile;
     if (static_cast<int>(m_blockfile_info.size()) <= nFile) {
         m_blockfile_info.resize(nFile + 1);
     }
-
-    const int last_blockfile = m_blockfile_cursors[chain_type]->file_num;
-    if (nFile != last_blockfile) {
-        // No undo data yet in the new file, so reset our undo-height tracking.
-        m_blockfile_cursors[chain_type] = BlockfileCursor{nFile};
-    }
     m_blockfile_info[nFile].AddBlock(nHeight, block.GetBlockTime());
     m_blockfile_info[nFile].nSize = std::max(pos.nPos + added_size, m_blockfile_info[nFile].nSize);
     m_dirty_fileinfo.insert(nFile);

fKnown is true during reindex (and only then), which deletes
any existing snapshot chainstate. As a result, this function can never
be called wth fKnown set and a snapshot chainstate.
Add an Assume for this, and make the code initializing a blockfile cursor
for the snapshot conditional on !fKnown.

This is a preparation for splitting the reindex logic out of
FindBlockPos in the following commits.
@mzumsande mzumsande force-pushed the 202404_blockstorage_split_reindex branch from 194e84a to 6a22eed Compare May 8, 2024 22:22
@mzumsande
Copy link
Contributor Author

Thanks for the detailed review @ryanofsky!
With the latest push, I addressed the feedback partially, see in particular #29975 (comment). I will address the remaining comments soon.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 6a22eed. Just some suggested changes were made since the last review: adding more Assume checks, renaming a function, and moving a declaration. Everything still looks good.

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

const int last_blockfile = m_blockfile_cursors[chain_type]->file_num;
if (nFile != last_blockfile) {
// No undo data yet in the new file, so reset our undo-height tracking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "blockstorage: split up FindBlockPos function" (2b6d274)

I think this comment is not really accurate in this context. There should be no need to reset BlockfileCursor::undo_height here, only to set a new BlockfileCursor::file_num value so MaxBlockfileNum() returns the right thing. So the comment could be changed to something like "update the cursor so it points to the last file".

Maybe it would also make sense to make this a little more robust so doesn't move the cursor backwards if UpdateBlockInfo calls are made out of order. We are already doing this for blocks within the file by using std::max to set the file size below. Generalizing this would also allow dropping the Assume() checks:

--- a/src/node/blockstorage.cpp
+++ b/src/node/blockstorage.cpp
@@ -942,24 +942,19 @@ void BlockManager::UpdateBlockInfo(const CBlock& block, unsigned int nHeight, co
 {
     LOCK(cs_LastBlockFile);
 
+    // Update the cursor so it points to the last file.
+    const BlockfileType chain_type{BlockfileTypeForHeight(nHeight)};
+    auto& cursor{m_blockfile_cursors[chain_type]};
+    if (!cursor || cursor->file_num < pos.nFile) {
+        m_blockfile_cursors[chain_type] = BlockfileCursor{pos.nFile};
+    }
+
+    // Update the file information so it points to the last block.
     const unsigned int added_size = ::GetSerializeSize(TX_WITH_WITNESS(block));
-    const BlockfileType chain_type = BlockfileTypeForHeight(nHeight);
-    // Check that chain type is NORMAL, because this function is only
-    // called during reindexing, and reindexing deletes snapshot chainstates, so
-    // chain_type will not be SNAPSHOT. Also check that cursor exists, because
-    // the normal cursor should never be null.
-    Assume(chain_type == BlockfileType::NORMAL);
-    Assume(m_blockfile_cursors[chain_type]);
     const int nFile = pos.nFile;
     if (static_cast<int>(m_blockfile_info.size()) <= nFile) {
         m_blockfile_info.resize(nFile + 1);
     }
-
-    const int last_blockfile = m_blockfile_cursors[chain_type]->file_num;
-    if (nFile != last_blockfile) {
-        // No undo data yet in the new file, so reset our undo-height tracking.
-        m_blockfile_cursors[chain_type] = BlockfileCursor{nFile};
-    }
     m_blockfile_info[nFile].AddBlock(nHeight, block.GetBlockTime());
     m_blockfile_info[nFile].nSize = std::max(pos.nPos + added_size, m_blockfile_info[nFile].nSize);
     m_dirty_fileinfo.insert(nFile);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic seems really brittle to me, I suspected that there might be a bug hiding somewhere, but it barely works out in all scenarios I could think of.
With blocks being save on disk out of order, it is possible that at the end of a reindex, the cursor points to an older block file. Yet, it appears that nothing really bad happens as a result: When a new blocks arrives, FindBlockPos will still find the correct position in this while loop, skipping ahead.
In a similar way, MaxBlockfileNum() being incorrect can result in an incorrect DB_LAST_BLOCK being written to disk on shutdown. However, we also recover from that because at next startup, LoadBlockIndexDB uses DB_LAST_BLOCK basically only for logging but doesn't trust it's actually pointing to the last block, searching for more db entries here, so only the log messages on startup will be wrong.

This seems really fragile to me, so I think that your suggestion is a good idea, and I'll add a commit for it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this with the latest push. Changed the // Update the file information so it points to the last block. comment slightly because the file information doesn't really point to anything.

mzumsande and others added 3 commits May 10, 2024 17:08
In particular, document the flat file positions expected and
returned by functions better.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
FindBlockPos does different things depending on whether the block is known
or not, as shown by the fact that much of the existing code is conditional on fKnown set or not.

If the block position is known (during reindex) the function only updates the block info
statistics. It doesn't actually find a block position in this case.

This commit removes fKnown and splits up these two code paths by introducing a separate function
for the reindex case when the block position is known.
It doesn't change behavior.
…new blocks

By calling SaveBlockToDisk only when we actually want to save a new
block to disk. In the reindex case, we now call UpdateBlockInfo
directly from validation.

This commit doesn't change behavior.
@mzumsande mzumsande force-pushed the 202404_blockstorage_split_reindex branch from 6a22eed to ff2ef7d Compare May 10, 2024 21:15
The new name reflects that it is no longer called with existing blocks
for which the position is already known.

Returning a FlatFilePos directly simplifies the interface.
Previously, it was possible to move the cursor back to an older file
during reindex if blocks are enocuntered out of order during reindex.
This would means that MaxBlockfileNum() would be incorrect, and
a wrong DB_LAST_BLOCK could be written to disk.

This improves the logic by only ever moving the cursor forward (if possible)
but not backwards.

Co-authored-by: Martin Zumsande <mzumsande@gmail.com>
@mzumsande mzumsande force-pushed the 202404_blockstorage_split_reindex branch from ff2ef7d to 9cf475f Compare May 10, 2024 21:18
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/24838737665

@mzumsande
Copy link
Contributor Author

6a22eed to 9cf475f:

I think I addressed all feedback now. Added 2 additional commits (one for renaming / updating FindNextBlockPos, one for not moving the cursor backwards in UpdateBlockInfo) and reorganized the other commit slightly by having a doc-only commit at the beginning.

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9cf475f

* block file depending on nAddSize. May flush the previous blockfile to disk if full, updates
* blockfile info, and checks if there is enough disk space to save the block.
*
* The nAddSize argument passed to this function should include not just the size of the serialized CBlock, but the also size of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/the also/also the/

@DrahtBot DrahtBot requested a review from ryanofsky May 11, 2024 20:38
Copy link
Contributor

@paplorinc paplorinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, only left a few nit comments

* @returns in case of success, the position to which the block was written to
* in case of an error, an empty FlatFilePos
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra line

@@ -164,6 +168,12 @@ class BlockManager

AutoFile OpenUndoFile(const FlatFilePos& pos, bool fReadOnly = false) const;

/**
* Write a block to disk. The pos argument passed to this function is modified by this call. Before this call, it should
* point to an unused file location where separator fields will be written followed by the serialized CBlock data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
* point to an unused file location where separator fields will be written followed by the serialized CBlock data.
* point to an unused file location where separator fields will be written, followed by the serialized CBlock data.

@@ -312,7 +322,17 @@ class BlockManager
bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex& block)
EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

/** Store block on disk. If dbp is not nullptr, then it provides the known position of the block within a block file on disk. */
/** Store block on disk and update block file statistics.
* During reindex, only the block file statistics is updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
* During reindex, only the block file statistics is updated.
* During reindex, only the block file statistics are updated.

return FlatFilePos();
}
// Must also account for the serialization header
// (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
// (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE)
// Account for the 4 magic message start bytes + the 4 length bytes (8 bytes total, defined as BLOCK_SERIALIZATION_HEADER_SIZE).

* The nAddSize argument passed to this function should include not just the size of the serialized CBlock, but the also size of
* separator fields which are written before it by WriteBlockToDisk (BLOCK_SERIALIZATION_HEADER_SIZE).
*/
[[nodiscard]] bool FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigned int nHeight, uint64_t nTime);
[[nodiscard]] FlatFilePos FindNextBlockPos(unsigned int nAddSize, unsigned int nHeight, uint64_t nTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does [[nodiscard]] still make sense now?

@@ -848,7 +848,7 @@ fs::path BlockManager::GetBlockPosFilename(const FlatFilePos& pos) const
return BlockFileSeq().FileName(pos);
}

bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigned int nHeight, uint64_t nTime)
FlatFilePos BlockManager::FindNextBlockPos(unsigned int nAddSize, unsigned int nHeight, uint64_t nTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -943,24 +943,19 @@ void BlockManager::UpdateBlockInfo(const CBlock& block, unsigned int nHeight, co
{
LOCK(cs_LastBlockFile);

// Update the cursor so it points to the last file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (in commit message):

This would means that MaxBlockfileNum

vs

This would mean that MaxBlockfileNum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants