New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blockstorage: do not flush block to disk if it is already there #27039
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
7a2e5cd
to
a8db898
Compare
a8db898
to
a6b05f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach and code review ACK a6b05f8
I don't understand why the test needs to start a second node or why -fastprune
is needed (but that may just be me!).
EDIT: never mind, I see now why it's needed, you've documented it well.
reviewers can let me know if the changes should be combined
It seems fine to me that this is a separate PR, but I don't have a strong opinion.
a6b05f8
to
386faaf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crtACK 386faaf
I'm still not able to figure out why two nodes are needed in the test. I tried modifying the test to use just one node, but the new test reindex_readonly()
, didn't pass (the log file contained the unexpected string "failed to open file"). So the second node does seem to be needed.
386faaf
to
ed57565
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ed57565
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
ec7dab6
to
e7a0339
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK e7a0339
Rebased on master and prepended an extra commit with a unit test for BlockManager. The test passes on master and the branch without modification, hopefully illustrating that at least some behaviors are not affected. It's hard to test precisely for the branch's change in a unit test because file-flushing is not something we can really predict or test for (sometimes it happens as soon as we The actual error in issue #2039 comes from an attempt to OPEN a file with Lines 81 to 86 in 82793f1
...and this PR avoids the call to @jonatack lemme know if the new test is reassuring enough, or if you have advice for better coverage? |
b787dac
to
5afd6bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested that this PR indeed fixes #2039:
- on master (82793f1):
$ chmod u-w ~/.bitcoin/signet/blocks/blk00003.dat
$ ./src/bitcoind -signet -reindex
...
2023-02-27T17:58:31Z Reindexing block file blk00003.dat...
2023-02-27T17:58:31Z Unable to open file /home/hebasto/.bitcoin/signet/blocks/blk00003.dat
2023-02-27T17:58:31Z ERROR: Flush: failed to open file 3
2023-02-27T17:58:31Z *** Flushing block file to disk failed. This is likely the result of an I/O error.
2023-02-27T17:58:31Z Error: A fatal internal error occurred, see debug.log for details
Error: A fatal internal error occurred, see debug.log for details
...
- this PR (5afd6bf)
$ chmod u-w ~/.bitcoin/signet/blocks/blk00003.dat
$ ./src/bitcoind -signet -reindex # no errors
re-ACK, although now rebase is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, will review more closely soon.
The added unit test seems to fail under Win64.
a9a9c86
to
e600eb4
Compare
force-push to e600eb4:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 8a1ab9a
test: ensure we can reindex from read-only block files now
8a1ab9a
to
dfcef53
Compare
Thanks for the reviews, @mzumsande @furszy @pablomartin4btc @LarryRuane I just rebased on master to clean up CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase diff ACK dfcef53.
Please try to not rebase the PR when it is not needed. A not related CI failure, or an spurious one, will not block a PR from getting it merged.
utACK dfcef53 |
re-ACK dfcef53 |
ACK dfcef53 |
self.nodes[0].assert_start_raises_init_error(extra_args=['-reindex', '-fastprune'], | ||
expected_msg="Error: A fatal internal error occurred, see debug.log for details") | ||
self.log.debug("Attempt to restart and reindex the node with the unwritable block file") | ||
with self.nodes[0].wait_for_debug_log([b"Reindexing finished"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to change this to the aggressive busy-loop helper? The previous should work just fine with the right timeout. Also, I think wait_for_debug_log
should be renamed to busy_wait_for_debug_log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. The previous code expected a crash so the test would know exactly when to check for the expected log message. In the new code bitcoind just runs fine so we have to poll the log for the success message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait_for_debug_log
(the variant of the function that accepts raw bytes) does not have a sleep
, so it will try to maxx out IO at 100%, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see now thanks. So I guess... start_node
calls wait_for_rpc_connection
. Is RPC available before reindexing is finished? If we can rely on that to prevent race conditions then this could just be assert_debug_log
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can rely on that to prevent race conditions then this could just be
assert_debug_log
.
assert_debug_log
is also syncing (it has a timeout argument), so no race should happen. My comment is only about the IO usage. assert_debug_log
is not using a busy wait, but a sleepy wait. Otherwise they are exactly identical.
However, if assert_debug_log
is used to sync, my personal preference is to provide the timeout
argument, so that the code is self-documenting.
I think in this context my feedback is mostly of stylistic nature, but I could imagine other scenarios where a more expensive reindex in combination with the busy wait may lead to test timeouts on slow storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks. Worth opening a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, yes, I am happy to review. Though, I won't create a pull myself.
test: ensure we can reindex from read-only block files now Github-Pull: bitcoin#27039 Rebased-From: dfcef53
fd6a7d3 test: use sleepy wait-for-log in reindex readonly (Matthew Zipkin) Pull request description: Also rename the busy wait-for-log method to prevent recurrence. See #27039 (comment) ACKs for top commit: maflcko: utACK fd6a7d3 achow101: ACK fd6a7d3 tdb3: ACK for fd6a7d3 rkrux: ACK [fd6a7d3](fd6a7d3) Tree-SHA512: 7ff0574833df1ec843159b35ee88b8bb345a513ac13ed0b72abd1bf330c454a3f9df4d927871b9e3d37bfcc07542b06ef63acef8e822cd18499adae8cbb0cda8
e41667b blockstorage: Don't move cursor backwards in UpdateBlockInfo (Ryan Ofsky) 1710363 blockstorage: Rename FindBlockPos and have it return a FlatFilePos (Martin Zumsande) d9e477c validation, blockstorage: Separate code paths for reindex and saving new blocks (Martin Zumsande) 064859b blockstorage: split up FindBlockPos function (Martin Zumsande) fdae638 doc: Improve doc for functions involved in saving blocks to disk (Martin Zumsande) 0d114e3 blockstorage: Add Assume for fKnown / snapshot chainstate (Martin Zumsande) Pull request description: `SaveBlockToDisk` / `FindBlockPos` are used for two purposes, depending on whether they are called during reindexing (`dbp` set, `fKnown = true`) or in the "normal" case when adding new blocks (`dbp == nullptr`, `fKnown = false`). The actual tasks are quite different - In normal mode, preparations for saving a new block are made, which is then saved: find the correct position on disk (maybe skipping to a new blk file), check for available disk space, update the blockfile info db, save the block. - during reindex, most of this is not necessary (the block is already on disk after all), only the blockfile info needs to rebuilt because reindex wiped the leveldb it's saved in. Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone: - many code paths in `FindBlockPos` are conditional on `fKnown` or `!fKnown` - It's not really clear what actually needs to be done during reindex (we don't need to "save a block to disk" or "find a block pos" as the function names suggest) - logic that should be applied to only one of the two modes is sometimes applied to both (see first commit, or #27039) #24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged. This PR proposes to clean this code up by splitting out the reindex logic into a separate function (`UpdateBlockInfo`) which will be called directly from validation. As a result, `SaveBlockToDisk` and `FindBlockPos` only need to cover the non-reindex logic. ACKs for top commit: paplorinc: ACK e41667b TheCharlatan: Re-ACK e41667b ryanofsky: Code review ACK e41667b. Just improvements to comments since last review. Tree-SHA512: a14ff9a0facf6b1e3c1cd724a2d19a79a25d4b48de64398fdd172671532a472bc10a20cbb64ac3a3e55814dcc877d0597a3e1699cabc4f9d9a86b439b6eaba20
Closes #2039
When reindexing from flat-file block storage there is no need to write anything back to disk, since the block data is already there. This PR skips flushing to disk those blocks that already have a known position in the datastore. Skipping this means that users can write-protect the
blk
files on disk which may be useful for security or even safely sharing that data between multiple bitcoind instances.FindBlockPos()
may also flush the undo data file, but again this is skipped if the corresponding block position is known, like during the initial stage of a reindex when block data is being indexed. Once the block index is complete the validation mechanism will callConnectBlock()
which will save undo data at that time.The call stack looks like this:
A larger refactor of this part of the code was started by mzumsande here: https://github.com/mzumsande/bitcoin/tree/202207_refactor_findblockpos including this fix, reviewers can let me know if the changes should be combined.