Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: improve BDB parser (handle internal/overflow pages, support all page sizes) #30125

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

theStack
Copy link
Contributor

@theStack theStack commented May 16, 2024

This PR adds missing features to our test framework's BDB parser with the goal of hopefully being able to read all legacy wallets that are created with current and past versions of Bitcoin Core. This could be useful both for making review of #26606 easier and to also possibly improve our functional tests for the wallet BDB-ro parser by additionally validating it with an alternative implementation. The second commits introduces a test that create a legacy wallet with huge label strings (in order to create overflow pages, i.e. pages needed for key/value data than is larger than the page size) and compares the dump outputs of wallet tool and the extended test framework BDB parser.
It can be exercised via $ ./test/functional/tool_wallet.py --legacy. BDB support has to be compiled in (obviously).

For some manual tests regarding different page sizes, the following patch can be used:

diff --git a/src/wallet/bdb.cpp b/src/wallet/bdb.cpp
index 38cca32f80..1bf39323d3 100644
--- a/src/wallet/bdb.cpp
+++ b/src/wallet/bdb.cpp
@@ -395,6 +395,7 @@ void BerkeleyDatabase::Open()
                             DB_BTREE,                                 // Database type
                             nFlags,                                   // Flags
                             0);
+            pdb_temp->set_pagesize(1<<9); /* valid BDB pagesizes are from 1<<9 (=512) to <<16 (=65536) */
 
             if (ret != 0) {
                 throw std::runtime_error(strprintf("BerkeleyDatabase: Error %d, can't open database %s", ret, strFile));

I verified that the newly introduced test passes with all valid page sizes between 512 and 65536.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 16, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK laanwj, BrandonOdiwuor

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@DrahtBot DrahtBot added the Tests label May 16, 2024
@laanwj
Copy link
Member

laanwj commented May 16, 2024

Code review ACK, from what i learned about how BDB databases work internally, implementation looks correct on first glance.

…l page sizes)

This aims to complete our test framework BDB parser to reflect
our read-only BDB parser in the wallet codebase. This could be
useful both for making review of bitcoin#26606 easier and to also possibly
improve our functional tests for the BDB parser by comparing with
an alternative implementation.
@theStack theStack force-pushed the complete_bdb-ro_python_parser2 branch from d166e88 to 361c3b0 Compare May 21, 2024 17:17
@theStack
Copy link
Contributor Author

Rebased on master (necessary since #26606 was merged and also touched the same functional test file).

@theStack theStack force-pushed the complete_bdb-ro_python_parser2 branch from 361c3b0 to ad0a8ff Compare May 22, 2024 09:58
Copy link
Contributor

@BrandonOdiwuor BrandonOdiwuor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Copy link

@rkrux rkrux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete review and untested yet, left couple comments off the top of my head atm.
Will do another round thoroughly in the next few days.

entry['data'] = data[offset + 3:offset + 3 + e_len]
record_header = data[offset:offset + 3]
offset += 3
e_len, record_type = struct.unpack('HB', record_header)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a separate effort from @maflcko regarding removing the usage of struct.pack from all places in test. Though here it's struct.unpack but I guess the reasoning for the removal of struct.pack is still applicable.
#29401

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small benefit to struct.pack being less code when more than one item is packed or unpacked. Seems fine to use either in this case, up to the author. However, in most other cases where only a single item is (un)packed, the modern alternatives are less code, and clearer.

# Determine pagesize first
data = f.read(PAGE_HEADER_SIZE)
pagesize = struct.unpack('I', data[20:24])[0]
assert pagesize in (512, 1024, 2048, 4096, 8192, 16384, 32768, 65536)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about creating a constant tuple for these values? They seem pretty generic enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants