New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

blockstorage: do not flush block to disk if it is already there #27039

Merged

achow101 merged 1 commit into bitcoin:master from pinheadmz:reindex-read-only

Mar 20, 2024

Member

pinheadmz commented Feb 3, 2023 •

edited

Closes #2039

When reindexing from flat-file block storage there is no need to write anything back to disk, since the block data is already there. This PR skips flushing to disk those blocks that already have a known position in the datastore. Skipping this means that users can write-protect the blk files on disk which may be useful for security or even safely sharing that data between multiple bitcoind instances.

FindBlockPos() may also flush the undo data file, but again this is skipped if the corresponding block position is known, like during the initial stage of a reindex when block data is being indexed. Once the block index is complete the validation mechanism will call ConnectBlock() which will save undo data at that time.

The call stack looks like this:

init()
ThreadImport() <-- process fReindex flag
LoadExternalBlockFile()
AcceptBlock()
SaveBlockToDisk()
FindBlockPos()
FlushBlockFile() <-- unnecessary if block is already on disk

A larger refactor of this part of the code was started by mzumsande here: https://github.com/mzumsande/bitcoin/tree/202207_refactor_findblockpos including this fix, reviewers can let me know if the changes should be combined.

Contributor

DrahtBot commented Feb 3, 2023 •

edited

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	furszy, sipa, mzumsande, achow101
Concept ACK	jonatack, turkycat
Approach ACK	stickies-v
Stale ACK	LarryRuane, pablomartin4btc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29672 (validation: Make translations of fatal errors consistent by TheCharlatan)
#29642 (kernel: Handle fatal errors through return values by TheCharlatan)
#29231 (logging: Update to new logging API by ajtowns)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the Block storage label

brunoerg reviewed

View reviewed changes

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

pinheadmz force-pushed the reindex-read-only branch from 7a2e5cd to a8db898 Compare

February 3, 2023 18:59

brunoerg reviewed

View reviewed changes

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

pinheadmz force-pushed the reindex-read-only branch from a8db898 to a6b05f8 Compare

February 3, 2023 21:15

LarryRuane reviewed

View reviewed changes

Contributor

LarryRuane left a comment •

edited

Approach and code review ACK a6b05f8
I don't understand why the test needs to start a second node or why -fastprune is needed (but that may just be me!).

EDIT: never mind, I see now why it's needed, you've documented it well.

reviewers can let me know if the changes should be combined

It seems fine to me that this is a separate PR, but I don't have a strong opinion.

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

pinheadmz force-pushed the reindex-read-only branch from a6b05f8 to 386faaf Compare

February 4, 2023 12:15

pinheadmz mentioned this pull request

-reindex fails if blk0*.dat files read-only #2039

Closed

LarryRuane reviewed

View reviewed changes

Contributor

LarryRuane left a comment

crtACK 386faaf
I'm still not able to figure out why two nodes are needed in the test. I tried modifying the test to use just one node, but the new test reindex_readonly(), didn't pass (the log file contained the unexpected string "failed to open file"). So the second node does seem to be needed.

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

pinheadmz force-pushed the reindex-read-only branch from 386faaf to ed57565 Compare

February 8, 2023 19:50

LarryRuane reviewed

View reviewed changes

Contributor

LarryRuane left a comment

jonatack reviewed

View reviewed changes

Contributor

jonatack left a comment

Concept ACK

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

src/node/blockstorage.cpp Outdated Show resolved Hide resolved

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

pinheadmz force-pushed the reindex-read-only branch 2 times, most recently from ec7dab6 to e7a0339 Compare

February 22, 2023 16:08

DrahtBot mentioned this pull request

refactor, kernel: Decouple ArgsManager from blockstorage #27125

Merged

LarryRuane reviewed

View reviewed changes

Contributor

LarryRuane left a comment

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

test/functional/feature_reindex.py Outdated Show resolved Hide resolved

Member Author

pinheadmz commented Feb 27, 2023

Rebased on master and prepended an extra commit with a unit test for BlockManager. The test passes on master and the branch without modification, hopefully illustrating that at least some behaviors are not affected.

It's hard to test precisely for the branch's change in a unit test because file-flushing is not something we can really predict or test for (sometimes it happens as soon as we fwrite()).

The actual error in issue #2039 comes from an attempt to OPEN a file with rw when the file is read-only:

bitcoin/src/flatfile.cpp

Lines 81 to 86 in 82793f1

    
           bool FlatFileSeq::Flush(const FlatFilePos& pos, bool finalize) 
        
           { 
        
               FILE* file = Open(FlatFilePos(pos.nFile, 0)); // Avoid fseek to nPos 
        
               if (!file) { 
        
                   return error("%s: failed to open file %d", __func__, pos.nFile); 
        
               }

...and this PR avoids the call to Flush() altogether when we know it is not necessary

@jonatack lemme know if the new test is reassuring enough, or if you have advice for better coverage?

pinheadmz force-pushed the reindex-read-only branch 2 times, most recently from b787dac to 5afd6bf Compare

February 27, 2023 16:21

hebasto reviewed

View reviewed changes

Member

hebasto left a comment

Tested that this PR indeed fixes #2039:

on master (82793f1):

$ chmod u-w ~/.bitcoin/signet/blocks/blk00003.dat
$ ./src/bitcoind -signet -reindex
...
2023-02-27T17:58:31Z Reindexing block file blk00003.dat...
2023-02-27T17:58:31Z Unable to open file /home/hebasto/.bitcoin/signet/blocks/blk00003.dat
2023-02-27T17:58:31Z ERROR: Flush: failed to open file 3
2023-02-27T17:58:31Z *** Flushing block file to disk failed. This is likely the result of an I/O error.
2023-02-27T17:58:31Z Error: A fatal internal error occurred, see debug.log for details
Error: A fatal internal error occurred, see debug.log for details
...

this PR (5afd6bf)

$ chmod u-w ~/.bitcoin/signet/blocks/blk00003.dat
$ ./src/bitcoind -signet -reindex  # no errors

LarryRuane mentioned this pull request

revive 26535 - reindex: allow finalized blk files to use read-only permissions LarryRuane/bitcoin#6

Open

DrahtBot mentioned this pull request

prune: scan and unlink already pruned block files on startup #26533

Merged

DrahtBot added the Needs rebase label

Contributor

LarryRuane commented Feb 28, 2023

re-ACK, although now rebase is needed

mzumsande reviewed

View reviewed changes

Contributor

mzumsande left a comment

Concept ACK, will review more closely soon.

The added unit test seems to fail under Win64.

pinheadmz force-pushed the reindex-read-only branch 2 times, most recently from a9a9c86 to e600eb4 Compare

March 1, 2023 15:22

Member Author

pinheadmz commented Mar 1, 2023

force-push to e600eb4:

rebased on master and fixed conflict
added 2-second sleep before calls to SaveBlockToDisk(). This is to ensure that if the file is modified, its modification timestamp will definitely be updated. This test was intermittent on Windows I think because FAT has a 2-second resolution for mtime values. 🤷‍♂️

DrahtBot removed the Needs rebase label

DrahtBot requested review from LarryRuane, pablomartin4btc, stickies-v, jonatack and turkycat and removed request for LarryRuane

October 25, 2023 19:35

DrahtBot removed the CI failed label

DrahtBot mentioned this pull request

logging: Update to new logging API #29231

Closed

DrahtBot added the CI failed label

DrahtBot mentioned this pull request

Improve new LogDebug/Trace/Info/Warning/Error Macros #29256

Open

furszy reviewed

View reviewed changes

Member

furszy left a comment

Code review ACK 8a1ab9a


          blockstorage: do not flush block to disk if it is already there

dfcef53

test: ensure we can reindex from read-only block files now

pinheadmz force-pushed the reindex-read-only branch from 8a1ab9a to dfcef53 Compare

March 12, 2024 14:10

Member Author

pinheadmz commented Mar 12, 2024

Thanks for the reviews, @mzumsande @furszy @pablomartin4btc @LarryRuane I just rebased on master to clean up CI

DrahtBot removed the CI failed label

DrahtBot mentioned this pull request

kernel: Handle fatal errors through return values #29642

Draft

furszy reviewed

View reviewed changes

Member

furszy left a comment

Rebase diff ACK dfcef53.

Please try to not rebase the PR when it is not needed. A not related CI failure, or an spurious one, will not block a PR from getting it merged.

DrahtBot requested a review from mzumsande

March 16, 2024 22:54

Member

sipa commented Mar 17, 2024

utACK dfcef53

DrahtBot mentioned this pull request

validation: Make translations of fatal errors consistent #29672

Merged

Contributor

mzumsande commented Mar 18, 2024

re-ACK dfcef53

achow101 self-assigned this

Member

achow101 commented Mar 20, 2024

achow101 merged commit 69ddee6 into bitcoin:master

16 checks passed

maflcko reviewed

View reviewed changes

test/functional/feature_reindex_readonly.py

-                              self.nodes[0].assert_start_raises_init_error(extra_args=['-reindex', '-fastprune'],
-                                  expected_msg="Error: A fatal internal error occurred, see debug.log for details")
+                          self.log.debug("Attempt to restart and reindex the node with the unwritable block file")
+                          with self.nodes[0].wait_for_debug_log([b"Reindexing finished"]):

Member

maflcko Mar 20, 2024

Any reason to change this to the aggressive busy-loop helper? The previous should work just fine with the right timeout. Also, I think wait_for_debug_log should be renamed to busy_wait_for_debug_log

Member Author

pinheadmz Apr 16, 2024

I'm not sure I understand. The previous code expected a crash so the test would know exactly when to check for the expected log message. In the new code bitcoind just runs fine so we have to poll the log for the success message.

Member

maflcko Apr 16, 2024

wait_for_debug_log (the variant of the function that accepts raw bytes) does not have a sleep, so it will try to maxx out IO at 100%, no?

Member Author

pinheadmz Apr 16, 2024

OK I see now thanks. So I guess... start_node calls wait_for_rpc_connection. Is RPC available before reindexing is finished? If we can rely on that to prevent race conditions then this could just be assert_debug_log.

Member

maflcko Apr 17, 2024

If we can rely on that to prevent race conditions then this could just be assert_debug_log.

assert_debug_log is also syncing (it has a timeout argument), so no race should happen. My comment is only about the IO usage. assert_debug_log is not using a busy wait, but a sleepy wait. Otherwise they are exactly identical.

However, if assert_debug_log is used to sync, my personal preference is to provide the timeout argument, so that the code is self-documenting.

I think in this context my feedback is mostly of stylistic nature, but I could imagine other scenarios where a more expensive reindex in combination with the busy wait may lead to test timeouts on slow storage.

Member Author

pinheadmz Apr 17, 2024

Got it, thanks. Worth opening a new PR?

Member

maflcko Apr 17, 2024

If you want, yes, I am happy to review. Though, I won't create a pull myself.

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request


          blockstorage: do not flush block to disk if it is already there

f9e091f

test: ensure we can reindex from read-only block files now

Github-Pull: bitcoin#27039
Rebased-From: dfcef53

mzumsande mentioned this pull request

blockstorage: Separate reindexing from saving new blocks #29975

Merged

pinheadmz mentioned this pull request

test: use sleepy wait-for-log in reindex readonly #30006

Merged

achow101 added a commit that referenced this pull request


          Merge #30006: test: use sleepy wait-for-log in reindex readonly

98dd4e7

fd6a7d3 test: use sleepy wait-for-log in reindex readonly (Matthew Zipkin)

Pull request description:

  Also rename the busy wait-for-log method to prevent recurrence. See #27039 (comment)

ACKs for top commit:
  maflcko:
    utACK fd6a7d3
  achow101:
    ACK fd6a7d3
  tdb3:
    ACK for fd6a7d3
  rkrux:
    ACK [fd6a7d3](fd6a7d3)

Tree-SHA512: 7ff0574833df1ec843159b35ee88b8bb345a513ac13ed0b72abd1bf330c454a3f9df4d927871b9e3d37bfcc07542b06ef63acef8e822cd18499adae8cbb0cda8

ryanofsky added a commit that referenced this pull request


          Merge #29975: blockstorage: Separate reindexing from saving new blocks

2f53f22

e41667b blockstorage: Don't move cursor backwards in UpdateBlockInfo (Ryan Ofsky)
1710363 blockstorage: Rename FindBlockPos and have it return a FlatFilePos (Martin Zumsande)
d9e477c validation, blockstorage: Separate code paths for reindex and saving new blocks (Martin Zumsande)
064859b blockstorage: split up FindBlockPos function (Martin Zumsande)
fdae638 doc: Improve doc for functions involved in saving blocks to disk (Martin Zumsande)
0d114e3 blockstorage: Add Assume for fKnown / snapshot chainstate (Martin Zumsande)

Pull request description:

  `SaveBlockToDisk` / `FindBlockPos` are used for two purposes, depending on whether they are called during reindexing (`dbp` set,  `fKnown = true`) or in the "normal" case when adding new blocks (`dbp == nullptr`,  `fKnown = false`).
  The actual tasks are quite different
  - In normal mode, preparations for saving a new block are made, which is then saved: find the correct position on disk (maybe skipping to a new blk file), check for available disk space, update the blockfile info db, save the block.
  - during reindex, most of this is not necessary (the block is already on disk after all), only the blockfile info needs to rebuilt because reindex wiped the leveldb it's saved in.

  Using one function with many conditional statements for this leads to code that is hard to read / understand and bug-prone:
  - many code paths in `FindBlockPos` are conditional on `fKnown` or `!fKnown`
  - It's not really clear what actually needs to be done during reindex (we don't need to "save a block to disk" or "find a block pos" as the function names suggest)
  - logic that should be applied to only one of the two modes is sometimes applied to both (see first commit, or #27039)

  #24858 and #27039 were recent bugs directly related to the differences between reindexing and normal mode, and in both cases the simple fix took a long time to be reviewed and merged.

  This PR proposes to clean this code up by splitting out the reindex logic into a separate function (`UpdateBlockInfo`) which will be called directly from validation. As a result, `SaveBlockToDisk` and `FindBlockPos` only need to cover the non-reindex logic.

ACKs for top commit:
  paplorinc:
    ACK e41667b
  TheCharlatan:
    Re-ACK e41667b
  ryanofsky:
    Code review ACK e41667b. Just improvements to comments since last review.

Tree-SHA512: a14ff9a0facf6b1e3c1cd724a2d19a79a25d4b48de64398fdd172671532a472bc10a20cbb64ac3a3e55814dcc877d0597a3e1699cabc4f9d9a86b439b6eaba20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

luke-jr luke-jr left review comments

achow101 achow101 left review comments

furszy furszy left review comments

maflcko maflcko left review comments

brunoerg brunoerg left review comments

hebasto hebasto left review comments

LarryRuane Awaiting requested review from LarryRuane

pablomartin4btc Awaiting requested review from pablomartin4btc

stickies-v Awaiting requested review from stickies-v

jonatack Awaiting requested review from jonatack

turkycat Awaiting requested review from turkycat

mzumsande Awaiting requested review from mzumsande