Skip to content

Commit

Permalink
Merge #29975: blockstorage: Separate reindexing from saving new blocks
Browse files Browse the repository at this point in the history
e41667b blockstorage: Don't move cursor backwards in UpdateBlockInfo (Ryan Ofsky)
1710363 blockstorage: Rename FindBlockPos and have it return a FlatFilePos (Martin Zumsande)
d9e477c validation, blockstorage: Separate code paths for reindex and saving new blocks (Martin Zumsande)
064859b blockstorage: split up FindBlockPos function (Martin Zumsande)
fdae638 doc: Improve doc for functions involved in saving blocks to disk (Martin Zumsande)
0d114e3 blockstorage: Add Assume for fKnown / snapshot chainstate (Martin Zumsande)

Pull request description:

  `SaveBlockToDisk` / `FindBlockPos` are used for two purposes, depending on whether they are called during reindexing (`dbp` set,  `fKnown = true`) or in the "normal" case when adding new blocks (`dbp == nullptr`,  `fKnown = false`).
  The actual tasks are quite different
  - In normal mode, preparations for saving a new block are made, which is then saved: find the correct position on disk (maybe skipping to a new blk file), check for available disk space, update the blockfile info db, save the block.
  - during reindex, most of this is not necessary (the block is already on disk after all), only the blockfile info needs to rebuilt because reindex wiped the leveldb it's saved in.

  Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone:
  - many code paths in `FindBlockPos` are conditional on `fKnown` or `!fKnown`
  - It's not really clear what actually needs to be done during reindex (we don't need to "save a block to disk" or "find a block pos" as the function names suggest)
  - logic that should be applied to only one of the two modes is sometimes applied to both (see first commit, or #27039)

  #24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged.

  This PR proposes to clean this code up by splitting out the reindex logic into a separate function (`UpdateBlockInfo`) which will be called directly from validation. As a result, `SaveBlockToDisk` and `FindBlockPos` only need to cover the non-reindex logic.

ACKs for top commit:
  paplorinc:
    ACK e41667b
  TheCharlatan:
    Re-ACK e41667b
  ryanofsky:
    Code review ACK e41667b. Just improvements to comments since last review.

Tree-SHA512: a14ff9a0facf6b1e3c1cd724a2d19a79a25d4b48de64398fdd172671532a472bc10a20cbb64ac3a3e55814dcc877d0597a3e1699cabc4f9d9a86b439b6eaba20
  • Loading branch information
ryanofsky committed May 16, 2024
2 parents 75118a6 + e41667b commit 2f53f22
Show file tree
Hide file tree
Showing 5 changed files with 143 additions and 105 deletions.
2 changes: 1 addition & 1 deletion src/bench/readblock.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ static FlatFilePos WriteBlockToDisk(ChainstateManager& chainman)
CBlock block;
stream >> TX_WITH_WITNESS(block);

return chainman.m_blockman.SaveBlockToDisk(block, 0, nullptr);
return chainman.m_blockman.SaveBlockToDisk(block, 0);
}

static void ReadBlockFromDiskTest(benchmark::Bench& bench)
Expand Down
167 changes: 86 additions & 81 deletions src/node/blockstorage.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -848,7 +848,7 @@ fs::path BlockManager::GetBlockPosFilename(const FlatFilePos& pos) const
return BlockFileSeq().FileName(pos);
}

bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigned int nHeight, uint64_t nTime, bool fKnown)
FlatFilePos BlockManager::FindNextBlockPos(unsigned int nAddSize, unsigned int nHeight, uint64_t nTime)
{
LOCK(cs_LastBlockFile);

Expand All @@ -863,88 +863,101 @@ bool BlockManager::FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigne
}
const int last_blockfile = m_blockfile_cursors[chain_type]->file_num;

int nFile = fKnown ? pos.nFile : last_blockfile;
int nFile = last_blockfile;
if (static_cast<int>(m_blockfile_info.size()) <= nFile) {
m_blockfile_info.resize(nFile + 1);
}

bool finalize_undo = false;
if (!fKnown) {
unsigned int max_blockfile_size{MAX_BLOCKFILE_SIZE};
// Use smaller blockfiles in test-only -fastprune mode - but avoid
// the possibility of having a block not fit into the block file.
if (m_opts.fast_prune) {
max_blockfile_size = 0x10000; // 64kiB
if (nAddSize >= max_blockfile_size) {
// dynamically adjust the blockfile size to be larger than the added size
max_blockfile_size = nAddSize + 1;
}
unsigned int max_blockfile_size{MAX_BLOCKFILE_SIZE};
// Use smaller blockfiles in test-only -fastprune mode - but avoid
// the possibility of having a block not fit into the block file.
if (m_opts.fast_prune) {
max_blockfile_size = 0x10000; // 64kiB
if (nAddSize >= max_blockfile_size) {
// dynamically adjust the blockfile size to be larger than the added size
max_blockfile_size = nAddSize + 1;
}
assert(nAddSize < max_blockfile_size);

while (m_blockfile_info[nFile].nSize + nAddSize >= max_blockfile_size) {
// when the undo file is keeping up with the block file, we want to flush it explicitly
// when it is lagging behind (more blocks arrive than are being connected), we let the
// undo block write case handle it
finalize_undo = (static_cast<int>(m_blockfile_info[nFile].nHeightLast) ==
Assert(m_blockfile_cursors[chain_type])->undo_height);

// Try the next unclaimed blockfile number
nFile = this->MaxBlockfileNum() + 1;
// Set to increment MaxBlockfileNum() for next iteration
m_blockfile_cursors[chain_type] = BlockfileCursor{nFile};

if (static_cast<int>(m_blockfile_info.size()) <= nFile) {
m_blockfile_info.resize(nFile + 1);
}
}
assert(nAddSize < max_blockfile_size);

while (m_blockfile_info[nFile].nSize + nAddSize >= max_blockfile_size) {
// when the undo file is keeping up with the block file, we want to flush it explicitly
// when it is lagging behind (more blocks arrive than are being connected), we let the
// undo block write case handle it
finalize_undo = (static_cast<int>(m_blockfile_info[nFile].nHeightLast) ==
Assert(m_blockfile_cursors[chain_type])->undo_height);

// Try the next unclaimed blockfile number
nFile = this->MaxBlockfileNum() + 1;
// Set to increment MaxBlockfileNum() for next iteration
m_blockfile_cursors[chain_type] = BlockfileCursor{nFile};

if (static_cast<int>(m_blockfile_info.size()) <= nFile) {
m_blockfile_info.resize(nFile + 1);
}
pos.nFile = nFile;
pos.nPos = m_blockfile_info[nFile].nSize;
}
FlatFilePos pos;
pos.nFile = nFile;
pos.nPos = m_blockfile_info[nFile].nSize;

if (nFile != last_blockfile) {
if (!fKnown) {
LogPrint(BCLog::BLOCKSTORAGE, "Leaving block file %i: %s (onto %i) (height %i)\n",
last_blockfile, m_blockfile_info[last_blockfile].ToString(), nFile, nHeight);

// Do not propagate the return code. The flush concerns a previous block
// and undo file that has already been written to. If a flush fails
// here, and we crash, there is no expected additional block data
// inconsistency arising from the flush failure here. However, the undo
// data may be inconsistent after a crash if the flush is called during
// a reindex. A flush error might also leave some of the data files
// untrimmed.
if (!FlushBlockFile(last_blockfile, !fKnown, finalize_undo)) {
LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning,
"Failed to flush previous block file %05i (finalize=%i, finalize_undo=%i) before opening new block file %05i\n",
last_blockfile, !fKnown, finalize_undo, nFile);
}
LogPrint(BCLog::BLOCKSTORAGE, "Leaving block file %i: %s (onto %i) (height %i)\n",
last_blockfile, m_blockfile_info[last_blockfile].ToString(), nFile, nHeight);

// Do not propagate the return code. The flush concerns a previous block
// and undo file that has already been written to. If a flush fails
// here, and we crash, there is no expected additional block data
// inconsistency arising from the flush failure here. However, the undo
// data may be inconsistent after a crash if the flush is called during
// a reindex. A flush error might also leave some of the data files
// untrimmed.
if (!FlushBlockFile(last_blockfile, /*fFinalize=*/true, finalize_undo)) {
LogPrintLevel(BCLog::BLOCKSTORAGE, BCLog::Level::Warning,
"Failed to flush previous block file %05i (finalize=1, finalize_undo=%i) before opening new block file %05i\n",
last_blockfile, finalize_undo, nFile);
}
// No undo data yet in the new file, so reset our undo-height tracking.
m_blockfile_cursors[chain_type] = BlockfileCursor{nFile};
}

m_blockfile_info[nFile].AddBlock(nHeight, nTime);
if (fKnown) {
m_blockfile_info[nFile].nSize = std::max(pos.nPos + nAddSize, m_blockfile_info[nFile].nSize);
} else {
m_blockfile_info[nFile].nSize += nAddSize;
m_blockfile_info[nFile].nSize += nAddSize;

bool out_of_space;
size_t bytes_allocated = BlockFileSeq().Allocate(pos, nAddSize, out_of_space);
if (out_of_space) {
m_opts.notifications.fatalError(_("Disk space is too low!"));
return {};
}
if (bytes_allocated != 0 && IsPruneMode()) {
m_check_for_pruning = true;
}

if (!fKnown) {
bool out_of_space;
size_t bytes_allocated = BlockFileSeq().Allocate(pos, nAddSize, out_of_space);
if (out_of_space) {
m_opts.notifications.fatalError(_("Disk space is too low!"));
return false;
}
if (bytes_allocated != 0 && IsPruneMode()) {
m_check_for_pruning = true;
}
m_dirty_fileinfo.insert(nFile);
return pos;
}

void BlockManager::UpdateBlockInfo(const CBlock& block, unsigned int nHeight, const FlatFilePos& pos)
{
LOCK(cs_LastBlockFile);

// Update the cursor so it points to the last file.
const BlockfileType chain_type{BlockfileTypeForHeight(nHeight)};
auto& cursor{m_blockfile_cursors[chain_type]};
if (!cursor || cursor->file_num < pos.nFile) {
m_blockfile_cursors[chain_type] = BlockfileCursor{pos.nFile};
}

// Update the file information with the current block.
const unsigned int added_size = ::GetSerializeSize(TX_WITH_WITNESS(block));
const int nFile = pos.nFile;
if (static_cast<int>(m_blockfile_info.size()) <= nFile) {
m_blockfile_info.resize(nFile + 1);
}
m_blockfile_info[nFile].AddBlock(nHeight, block.GetBlockTime());
m_blockfile_info[nFile].nSize = std::max(pos.nPos + added_size, m_blockfile_info[nFile].nSize);
m_dirty_fileinfo.insert(nFile);
return true;
}

bool BlockManager::FindUndoPos(BlockValidationState& state, int nFile, FlatFilePos& pos, unsigned int nAddSize)
Expand Down Expand Up @@ -1014,7 +1027,7 @@ bool BlockManager::WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValid
// we want to flush the rev (undo) file once we've written the last block, which is indicated by the last height
// in the block file info as below; note that this does not catch the case where the undo writes are keeping up
// with the block writes (usually when a synced up node is getting newly mined blocks) -- this case is caught in
// the FindBlockPos function
// the FindNextBlockPos function
if (_pos.nFile < cursor.file_num && static_cast<uint32_t>(block.nHeight) == m_blockfile_info[_pos.nFile].nHeightLast) {
// Do not propagate the return code, a failed flush here should not
// be an indication for a failed write. If it were propagated here,
Expand Down Expand Up @@ -1130,28 +1143,20 @@ bool BlockManager::ReadRawBlockFromDisk(std::vector<uint8_t>& block, const FlatF
return true;
}

FlatFilePos BlockManager::SaveBlockToDisk(const CBlock& block, int nHeight, const FlatFilePos* dbp)
FlatFilePos BlockManager::SaveBlockToDisk(const CBlock& block, int nHeight)
{
unsigned int nBlockSize = ::GetSerializeSize(TX_WITH_WITNESS(block));
FlatFilePos blockPos;
const auto position_known {dbp != nullptr};
if (position_known) {
blockPos = *dbp;
} else {
// when known, blockPos.nPos points at the offset of the block data in the blk file. that already accounts for
// the serialization header present in the file (the 4 magic message start bytes + the 4 length bytes = 8 bytes = BLOCK_SERIALIZATION_HEADER_SIZE).
// we add BLOCK_SERIALIZATION_HEADER_SIZE only for new blocks since they will have the serialization header added when written to disk.
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);
}
if (!FindBlockPos(blockPos, nBlockSize, nHeight, block.GetBlockTime(), position_known)) {
LogError("%s: FindBlockPos failed\n", __func__);
// Account for the 4 magic message start bytes + the 4 length bytes (8 bytes total,
// defined as BLOCK_SERIALIZATION_HEADER_SIZE)
nBlockSize += static_cast<unsigned int>(BLOCK_SERIALIZATION_HEADER_SIZE);
FlatFilePos blockPos{FindNextBlockPos(nBlockSize, nHeight, block.GetBlockTime())};
if (blockPos.IsNull()) {
LogError("%s: FindNextBlockPos failed\n", __func__);
return FlatFilePos();
}
if (!position_known) {
if (!WriteBlockToDisk(block, blockPos)) {
m_opts.notifications.fatalError(_("Failed to write block."));
return FlatFilePos();
}
if (!WriteBlockToDisk(block, blockPos)) {
m_opts.notifications.fatalError(_("Failed to write block."));
return FlatFilePos();
}
return blockPos;
}
Expand Down
39 changes: 35 additions & 4 deletions src/node/blockstorage.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,16 @@ class BlockManager
/** Return false if undo file flushing fails. */
[[nodiscard]] bool FlushUndoFile(int block_file, bool finalize = false);

[[nodiscard]] bool FindBlockPos(FlatFilePos& pos, unsigned int nAddSize, unsigned int nHeight, uint64_t nTime, bool fKnown);
/**
* Helper function performing various preparations before a block can be saved to disk:
* Returns the correct position for the block to be saved, which may be in the current or a new
* block file depending on nAddSize. May flush the previous blockfile to disk if full, updates
* blockfile info, and checks if there is enough disk space to save the block.
*
* The nAddSize argument passed to this function should include not just the size of the serialized CBlock, but also the size of
* separator fields which are written before it by WriteBlockToDisk (BLOCK_SERIALIZATION_HEADER_SIZE).
*/
[[nodiscard]] FlatFilePos FindNextBlockPos(unsigned int nAddSize, unsigned int nHeight, uint64_t nTime);
[[nodiscard]] bool FlushChainstateBlockFile(int tip_height);
bool FindUndoPos(BlockValidationState& state, int nFile, FlatFilePos& pos, unsigned int nAddSize);

Expand All @@ -164,6 +173,12 @@ class BlockManager

AutoFile OpenUndoFile(const FlatFilePos& pos, bool fReadOnly = false) const;

/**
* Write a block to disk. The pos argument passed to this function is modified by this call. Before this call, it should
* point to an unused file location where separator fields will be written, followed by the serialized CBlock data.
* After this call, it will point to the beginning of the serialized CBlock data, after the separator fields
* (BLOCK_SERIALIZATION_HEADER_SIZE)
*/
bool WriteBlockToDisk(const CBlock& block, FlatFilePos& pos) const;
bool UndoWriteToDisk(const CBlockUndo& blockundo, FlatFilePos& pos, const uint256& hashBlock) const;

Expand Down Expand Up @@ -206,7 +221,7 @@ class BlockManager
//! effectively.
//!
//! This data structure maintains separate blockfile number cursors for each
//! BlockfileType. The ASSUMED state is initialized, when necessary, in FindBlockPos().
//! BlockfileType. The ASSUMED state is initialized, when necessary, in FindNextBlockPos().
//!
//! The first element is the NORMAL cursor, second is ASSUMED.
std::array<std::optional<BlockfileCursor>, BlockfileType::NUM_TYPES>
Expand Down Expand Up @@ -312,8 +327,24 @@ class BlockManager
bool WriteUndoDataForBlock(const CBlockUndo& blockundo, BlockValidationState& state, CBlockIndex& block)
EXCLUSIVE_LOCKS_REQUIRED(::cs_main);

/** Store block on disk. If dbp is not nullptr, then it provides the known position of the block within a block file on disk. */
FlatFilePos SaveBlockToDisk(const CBlock& block, int nHeight, const FlatFilePos* dbp);
/** Store block on disk and update block file statistics.
*
* @param[in] block the block to be stored
* @param[in] nHeight the height of the block
*
* @returns in case of success, the position to which the block was written to
* in case of an error, an empty FlatFilePos
*/
FlatFilePos SaveBlockToDisk(const CBlock& block, int nHeight);

/** Update blockfile info while processing a block during reindex. The block must be available on disk.
*
* @param[in] block the block being processed
* @param[in] nHeight the height of the block
* @param[in] pos the position of the serialized CBlock on disk. This is the position returned
* by WriteBlockToDisk pointing at the CBlock, not the separator fields before it
*/
void UpdateBlockInfo(const CBlock& block, unsigned int nHeight, const FlatFilePos& pos);

/** Whether running in -prune mode. */
[[nodiscard]] bool IsPruneMode() const { return m_prune_mode; }
Expand Down
24 changes: 10 additions & 14 deletions src/test/blockmanager_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,20 @@ BOOST_AUTO_TEST_CASE(blockmanager_find_block_pos)
};
BlockManager blockman{*Assert(m_node.shutdown), blockman_opts};
// simulate adding a genesis block normally
BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0, nullptr).nPos, BLOCK_SERIALIZATION_HEADER_SIZE);
BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0).nPos, BLOCK_SERIALIZATION_HEADER_SIZE);
// simulate what happens during reindex
// simulate a well-formed genesis block being found at offset 8 in the blk00000.dat file
// the block is found at offset 8 because there is an 8 byte serialization header
// consisting of 4 magic bytes + 4 length bytes before each block in a well-formed blk file.
FlatFilePos pos{0, BLOCK_SERIALIZATION_HEADER_SIZE};
BOOST_CHECK_EQUAL(blockman.SaveBlockToDisk(params->GenesisBlock(), 0, &pos).nPos, BLOCK_SERIALIZATION_HEADER_SIZE);
const FlatFilePos pos{0, BLOCK_SERIALIZATION_HEADER_SIZE};
blockman.UpdateBlockInfo(params->GenesisBlock(), 0, pos);
// now simulate what happens after reindex for the first new block processed
// the actual block contents don't matter, just that it's a block.
// verify that the write position is at offset 0x12d.
// this is a check to make sure that https://github.com/bitcoin/bitcoin/issues/21379 does not recur
// 8 bytes (for serialization header) + 285 (for serialized genesis block) = 293
// add another 8 bytes for the second block's serialization header and we get 293 + 8 = 301
FlatFilePos actual{blockman.SaveBlockToDisk(params->GenesisBlock(), 1, nullptr)};
FlatFilePos actual{blockman.SaveBlockToDisk(params->GenesisBlock(), 1)};
BOOST_CHECK_EQUAL(actual.nPos, BLOCK_SERIALIZATION_HEADER_SIZE + ::GetSerializeSize(TX_WITH_WITNESS(params->GenesisBlock())) + BLOCK_SERIALIZATION_HEADER_SIZE);
}

Expand Down Expand Up @@ -156,12 +156,11 @@ BOOST_AUTO_TEST_CASE(blockmanager_flush_block_file)
// Blockstore is empty
BOOST_CHECK_EQUAL(blockman.CalculateCurrentUsage(), 0);

// Write the first block; dbp=nullptr means this block doesn't already have a disk
// location, so allocate a free location and write it there.
FlatFilePos pos1{blockman.SaveBlockToDisk(block1, /*nHeight=*/1, /*dbp=*/nullptr)};
// Write the first block to a new location.
FlatFilePos pos1{blockman.SaveBlockToDisk(block1, /*nHeight=*/1)};

// Write second block
FlatFilePos pos2{blockman.SaveBlockToDisk(block2, /*nHeight=*/2, /*dbp=*/nullptr)};
FlatFilePos pos2{blockman.SaveBlockToDisk(block2, /*nHeight=*/2)};

// Two blocks in the file
BOOST_CHECK_EQUAL(blockman.CalculateCurrentUsage(), (TEST_BLOCK_SIZE + BLOCK_SERIALIZATION_HEADER_SIZE) * 2);
Expand All @@ -181,22 +180,19 @@ BOOST_AUTO_TEST_CASE(blockmanager_flush_block_file)
BOOST_CHECK_EQUAL(read_block.nVersion, 2);
}

// When FlatFilePos* dbp is given, SaveBlockToDisk() will not write or
// overwrite anything to the flat file block storage. It will, however,
// update the blockfile metadata. This is to facilitate reindexing
// when the user has the blocks on disk but the metadata is being rebuilt.
// During reindex, the flat file block storage will not be written to.
// UpdateBlockInfo will, however, update the blockfile metadata.
// Verify this behavior by attempting (and failing) to write block 3 data
// to block 2 location.
CBlockFileInfo* block_data = blockman.GetBlockFileInfo(0);
BOOST_CHECK_EQUAL(block_data->nBlocks, 2);
BOOST_CHECK(blockman.SaveBlockToDisk(block3, /*nHeight=*/3, /*dbp=*/&pos2) == pos2);
blockman.UpdateBlockInfo(block3, /*nHeight=*/3, /*pos=*/pos2);
// Metadata is updated...
BOOST_CHECK_EQUAL(block_data->nBlocks, 3);
// ...but there are still only two blocks in the file
BOOST_CHECK_EQUAL(blockman.CalculateCurrentUsage(), (TEST_BLOCK_SIZE + BLOCK_SERIALIZATION_HEADER_SIZE) * 2);

// Block 2 was not overwritten:
// SaveBlockToDisk() did not call WriteBlockToDisk() because `FlatFilePos* dbp` was non-null
blockman.ReadBlockFromDisk(read_block, pos2);
BOOST_CHECK_EQUAL(read_block.nVersion, 2);
}
Expand Down

0 comments on commit 2f53f22

Please sign in to comment.