New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Model the bech32 charlimit as an Enum #30047
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK |
1cfe953
to
7cb6dac
Compare
I think the original bug here illustrates that the enum is not well suited for this. Definitely not with a value of 0 for NO_LIMIT anyway. Got any other ideas? :) |
Yeah, I was getting an icky feeling while updating the PR for 2 reasons: My ideal solution is IIRC, when I first wrote this code a year ago, there wasn't much appetite for moving the charlimit out of |
7cb6dac
to
696e0a5
Compare
@theuni updated to remove I'd be open to a more expansive refactor if there was a) conceptual buy in and b) additional use cases beyond supporting silent payments addresses (e.g. parsing bolt11/12 bech32 strings in Bitcoin Core, mercury layer bech32 addresses, etc). |
I'm not sure I see the value in using an enum (and forcing it to be passed in) over simply defining a constant? If there's more than 1 possible value, sure. But for now, why? |
This commit is broken out from #28122 , where we add a new value for silent payment addresses (limit of 1024). Here is the commit that adds a new value for silent payments: 622c7a9 My motivation for this PR is to prep for the silent payments PR, trying to break out smaller bits from that PR to make it more reviewable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I spent some time working on a correct-by-construction approach here with a new Bech32Encoded
string type which knows its own size limit. Ultimately I don't think it's worth the complexity.
What's here doesn't seem ideal to me, but I can't come up with anything better and it's simple enough that it's obviously correct (without NO_LIMIT
at least :).
utACK 696e0a5.
std::vector<int> error_locations{}; | ||
|
||
if (str.size() > 90) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worked around this 90 in my pr in https://github.com/bitcoin/bitcoin/pull/29607/files#diff-f146300624c06d2e08aadf500952294148a1785edd6ff2e8b50f13b2c08255edL315 (though I'm not sure why it's not replaced here with limit
) - I will try to review this in more detail next week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, can't believe I missed that. Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctrl+f "90", whew that's the only one. This is exactly why named constants are so important to name. I think it's fair to call this a nice cleanup (getting rid of the constant) in addition to the new feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, also can't believe I missed this. Thanks for taking a look @paplorinc ! Took a look at your PR and I think this is the most elegant way to solve getting rid of the 90 in ExpandHRP, so I cherry-picked your last commit in (added CHECKSUM_SIZE
and modified the commit message to fit this PR a bit more). As a nice side benefit, just this last commit improves the benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
I just noticed my email was wrong there, would you be so kind and change it to Lőrinc <pap.lorinc@gmail.com>
?
ACK retracted until resolving #30047 (comment). |
ed37065
to
0e431c5
Compare
Cherry picked b3b84c3 (h/t @paplorinc) to resolve #30047 (comment). Ran the benchmarks and saw an improvement from the last commit, which is a nice side benefit! Before
After
|
src/bech32.cpp
Outdated
std::vector<int> errors; | ||
if (!CheckCharacters(str, errors)) return {}; | ||
size_t pos = str.rfind('1'); | ||
if (str.size() > 90 || pos == str.npos || pos == 0 || pos + 7 > str.size()) { | ||
if (str.size() > limit) return {}; | ||
if (pos == str.npos || pos == 0 || pos + 7 > str.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this 7 actually a CHECKSUM_SIZE + 1
, or completely unrelated?
if (pos == str.npos || pos == 0 || pos + 7 > str.size()) { | |
if (pos == str.npos || pos == 0 || pos + CHECKSUM_SIZE >= str.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is checking that "The data part, which is at least 6 characters long" (from BIP173). Changed pos + 7 >
to pos + CHECKSUM_SIZE >=
throughout.
src/bech32.cpp
Outdated
data enc = Cat(ExpandHRP(hrp), values); | ||
enc.resize(enc.size() + 6); // Append 6 zeroes | ||
auto enc = PreparePolynomialCoefficients(hrp, values, CHECKSUM_SIZE); | ||
enc.insert(enc.end(), CHECKSUM_SIZE, 0x00); | ||
uint32_t mod = PolyMod(enc) ^ EncodingConstant(encoding); // Determine what to XOR into those 6 zeroes. | ||
data ret(6); | ||
for (size_t i = 0; i < 6; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use CHECKSUM_SIZE
constant here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/bech32.cpp
Outdated
data enc = Cat(ExpandHRP(hrp), values); | ||
enc.resize(enc.size() + 6); // Append 6 zeroes | ||
auto enc = PreparePolynomialCoefficients(hrp, values, CHECKSUM_SIZE); | ||
enc.insert(enc.end(), CHECKSUM_SIZE, 0x00); | ||
uint32_t mod = PolyMod(enc) ^ EncodingConstant(encoding); // Determine what to XOR into those 6 zeroes. | ||
data ret(6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of preallocation we could do a reserve here:
data ret;
ret.reserve(CHECKSUM_SIZE);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to keep this PR focused on the limit
and CHECKSUM_SIZE
change, but perhaps a good change for #29607
src/bech32.h
Outdated
* convert the CharLimit::VALUE to an int. In practice, this should never happen since this CharLimit applies to an address encoding | ||
* and we would never encode an address with such a massive value */ | ||
enum CharLimit : size_t { | ||
SEGWIT = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just call this BECH32, and come up with another name for variants that have other limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
0e431c5
to
df20572
Compare
Changed the limit name to BECH32 per @sipa 's suggestion and updated to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
src/bech32.h
Outdated
@@ -28,6 +28,14 @@ enum class Encoding { | |||
BECH32M, //!< Bech32m encoding as defined in BIP350 | |||
}; | |||
|
|||
/** Character limits for bech32(m) encoded strings. Character limits are how we provide error location guarantees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to be consistent with the casing below
/** Character limits for bech32(m) encoded strings. Character limits are how we provide error location guarantees. | |
/** Character limits for Bech32(m) encoded strings. Character limits are how we provide error location guarantees. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/bech32.h
Outdated
* convert the CharLimit::VALUE to an int. In practice, this should never happen since this CharLimit applies to an address encoding | ||
* and we would never encode an address with such a massive value */ | ||
enum CharLimit : size_t { | ||
BECH32 = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
BECH32 = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors | |
BECH32 = 90, //!< BIP173/350 imposed a 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it present tense feels more correct, perhaps:
BECH32 = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors | |
BECH32 = 90, //!< BIP173/350 imposed character limit for Bech32(m) encoded addresses. This guarantees finding up to 4 errors. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Bech32(m) was defined with a 90 character limit so that certain guarantees for error detection could be made for segwit addresses. However, there is nothing about the encoding scheme itself that requires a limit and in practice bech32(m) has been used without the 90 char limit (e.g. lightning invoices). Further, increasing the character limit doesn't do away with error detection, it simply lessons the guarantees. Model charlimit as an Enum, so that if a different address scheme is using bech32(m), the character limit for that address scheme can be used, rather than always using the 90 charlimit defined for segwit addresses. upate comment
df20572
to
d196442
Compare
Updated comment per @paplorinc 's suggestions. |
@paplorinc I noticed you approved the PR, but we don't really use github's approval/review flow for determining the review status of a PR. Can you instead leave an |
ACK d196442 |
@theuni #30047 (comment) has been resolved by d196442, if you have time for a second look/reACK |
src/bech32.cpp
Outdated
@@ -308,18 +311,18 @@ bool CheckCharacters(const std::string& str, std::vector<int>& errors) | |||
return errors.empty(); | |||
} | |||
|
|||
/** Expand a HRP for use in checksum computation. */ | |||
data ExpandHRP(const std::string& hrp) | |||
std::vector<unsigned char> PreparePolynomialCoefficients(const std::string& hrp, const data& values, const size_t extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the only reason for passing extra
here is so that it can be reserved?
If so, that's not very intuitive and seems kinda like overkill. Seems like always reserving an extra 6 bytes fine, even if they don't end up getting used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's either 0 or 6.
I agree that we can always add it at the end, amended my original PR: https://github.com/bitcoin/bitcoin/pull/29607/files#diff-f146300624c06d2e08aadf500952294148a1785edd6ff2e8b50f13b2c08255edR317
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to remove the extra
argument.
Replace ExpandHRP with a PreparePolynomialCoefficients function. Instead of using a hardcoded value for the size of the array (90 in this case) and a hardcoded value for the checksum, use the actual values vector and define checksum size as a constexpr. Use the new CHECKSUM_SIZE throughout instead 6. Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
d196442
to
7f3f6c6
Compare
re-ACK 7f3f6c6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 7f3f6c6
Broken out from #28122
Bech32(m) was defined with a 90 character limit so that certain guarantees for error detection could be made for segwit addresses (see https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design).
However, there is nothing about the encoding scheme itself that requires a limit of 90 and in practice bech32(m) is being used without the 90 char limit (e.g. lightning invoices, silent payments). Further, increasing the character limit doesn't do away with error detection, it simply changes the guarantee.
The primary motivation for this change is for being able to parse BIP352 v0 silent payment addresses (see 622c7a9), which require up to 118 characters. In addition to BIP352, modeling the character limit as an enum allows us to easily support new address types that use bech32m and specify their own character limit.