Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Model the bech32 charlimit as an Enum #30047

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

josibake
Copy link
Member

@josibake josibake commented May 6, 2024

Broken out from #28122


Bech32(m) was defined with a 90 character limit so that certain guarantees for error detection could be made for segwit addresses (see https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#checksum-design).

However, there is nothing about the encoding scheme itself that requires a limit of 90 and in practice bech32(m) is being used without the 90 char limit (e.g. lightning invoices, silent payments). Further, increasing the character limit doesn't do away with error detection, it simply changes the guarantee.

The primary motivation for this change is for being able to parse BIP352 v0 silent payment addresses (see 622c7a9), which require up to 118 characters. In addition to BIP352, modeling the character limit as an enum allows us to easily support new address types that use bech32m and specify their own character limit.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 6, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK paplorinc, theuni
Concept ACK katesalazar

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29607 (refactor: Reduce memory copying operations in bech32 encoding/decoding by paplorinc)
  • #27260 (Enhanced error messages for invalid network prefix during address parsing. by russeree)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/bech32.cpp Outdated Show resolved Hide resolved
@katesalazar
Copy link
Contributor

Concept ACK

@theuni
Copy link
Member

theuni commented May 7, 2024

I think the original bug here illustrates that the enum is not well suited for this. Definitely not with a value of 0 for NO_LIMIT anyway. Got any other ideas? :)

@josibake
Copy link
Member Author

josibake commented May 7, 2024

I think the original bug here illustrates that the enum is not well suited for this. Definitely not with a value of 0 for NO_LIMIT anyway. Got any other ideas? :)

Yeah, I was getting an icky feeling while updating the PR for 2 reasons: NO_LIMIT of 0 is crappy and also the caller needs to make sure to use the same CharLimit for Decode and again with LocateErrors, which seems foot-gunny.

My ideal solution is Decode/LocateErrors shouldn't have a character limit at all (or use 1024, considering BIP173 mentions that bech32 error detection works reasonably well up to 1024 characters). Then, character limits for specific address types (e.g. bc1, sp1) should be enforced in the Destination Encoder/Decoder in src/key_io.cpp.

IIRC, when I first wrote this code a year ago, there wasn't much appetite for moving the charlimit out of Decode. If thats still the case, my next best solution would be keep the Enum and remove the NO_LIMIT case.

@josibake josibake force-pushed the model-bech32-limit-as-enum branch from 7cb6dac to 696e0a5 Compare May 8, 2024 10:43
@josibake
Copy link
Member Author

josibake commented May 8, 2024

@theuni updated to remove NO_LIMIT. I also looked at a few alternatives to using an Enum but I think given the relatively simple usecase, an Enum works here with the benefit of not requiring an expansive refactor.

I'd be open to a more expansive refactor if there was a) conceptual buy in and b) additional use cases beyond supporting silent payments addresses (e.g. parsing bolt11/12 bech32 strings in Bitcoin Core, mercury layer bech32 addresses, etc).

@josibake josibake mentioned this pull request May 8, 2024
15 tasks
@theuni
Copy link
Member

theuni commented May 8, 2024

I'm not sure I see the value in using an enum (and forcing it to be passed in) over simply defining a constant? If there's more than 1 possible value, sure. But for now, why?

@josibake
Copy link
Member Author

josibake commented May 8, 2024

I'm not sure I see the value in using an enum (and forcing it to be passed in) over simply defining a constant? If there's more than 1 possible value, sure. But for now, why?

This commit is broken out from #28122 , where we add a new value for silent payment addresses (limit of 1024). Here is the commit that adds a new value for silent payments: 622c7a9

My motivation for this PR is to prep for the silent payments PR, trying to break out smaller bits from that PR to make it more reviewable.

Copy link
Member

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I spent some time working on a correct-by-construction approach here with a new Bech32Encoded string type which knows its own size limit. Ultimately I don't think it's worth the complexity.

What's here doesn't seem ideal to me, but I can't come up with anything better and it's simple enough that it's obviously correct (without NO_LIMIT at least :).

utACK 696e0a5.

std::vector<int> error_locations{};

if (str.size() > 90) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worked around this 90 in my pr in https://github.com/bitcoin/bitcoin/pull/29607/files#diff-f146300624c06d2e08aadf500952294148a1785edd6ff2e8b50f13b2c08255edL315 (though I'm not sure why it's not replaced here with limit) - I will try to review this in more detail next week

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, can't believe I missed that. Nice catch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctrl+f "90", whew that's the only one. This is exactly why named constants are so important to name. I think it's fair to call this a nice cleanup (getting rid of the constant) in addition to the new feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, also can't believe I missed this. Thanks for taking a look @paplorinc ! Took a look at your PR and I think this is the most elegant way to solve getting rid of the 90 in ExpandHRP, so I cherry-picked your last commit in (added CHECKSUM_SIZE and modified the commit message to fit this PR a bit more). As a nice side benefit, just this last commit improves the benchmark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!
I just noticed my email was wrong there, would you be so kind and change it to Lőrinc <pap.lorinc@gmail.com>?

@theuni
Copy link
Member

theuni commented May 10, 2024

ACK retracted until resolving #30047 (comment).

@josibake josibake force-pushed the model-bech32-limit-as-enum branch 2 times, most recently from ed37065 to 0e431c5 Compare May 11, 2024 09:36
@josibake
Copy link
Member Author

Cherry picked b3b84c3 (h/t @paplorinc) to resolve #30047 (comment).

Ran the benchmarks and saw an improvement from the last commit, which is a nice side benefit!

Before

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                5.15 |      194,058,250.69 |    0.5% |           81.62 |           12.86 |  6.348 |          12.63 |    0.0% |      0.01 | `Bech32Decode`
|               11.08 |       90,289,310.89 |    2.3% |          153.69 |           27.64 |  5.559 |          20.00 |    0.1% |      0.01 | `Bech32Encode`

After

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|                4.73 |      211,584,025.79 |    0.3% |           67.67 |           11.78 |  5.743 |          10.85 |    0.0% |      0.01 | `Bech32Decode`
|               10.44 |       95,767,220.36 |    2.6% |          138.84 |           26.06 |  5.327 |          17.25 |    0.0% |      0.01 | `Bech32Encode`

src/bech32.cpp Outdated
std::vector<int> errors;
if (!CheckCharacters(str, errors)) return {};
size_t pos = str.rfind('1');
if (str.size() > 90 || pos == str.npos || pos == 0 || pos + 7 > str.size()) {
if (str.size() > limit) return {};
if (pos == str.npos || pos == 0 || pos + 7 > str.size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this 7 actually a CHECKSUM_SIZE + 1, or completely unrelated?

Suggested change
if (pos == str.npos || pos == 0 || pos + 7 > str.size()) {
if (pos == str.npos || pos == 0 || pos + CHECKSUM_SIZE >= str.size()) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is checking that "The data part, which is at least 6 characters long" (from BIP173). Changed pos + 7 > to pos + CHECKSUM_SIZE >= throughout.

src/bech32.cpp Outdated
data enc = Cat(ExpandHRP(hrp), values);
enc.resize(enc.size() + 6); // Append 6 zeroes
auto enc = PreparePolynomialCoefficients(hrp, values, CHECKSUM_SIZE);
enc.insert(enc.end(), CHECKSUM_SIZE, 0x00);
uint32_t mod = PolyMod(enc) ^ EncodingConstant(encoding); // Determine what to XOR into those 6 zeroes.
data ret(6);
for (size_t i = 0; i < 6; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use CHECKSUM_SIZE constant here as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/bech32.cpp Outdated
data enc = Cat(ExpandHRP(hrp), values);
enc.resize(enc.size() + 6); // Append 6 zeroes
auto enc = PreparePolynomialCoefficients(hrp, values, CHECKSUM_SIZE);
enc.insert(enc.end(), CHECKSUM_SIZE, 0x00);
uint32_t mod = PolyMod(enc) ^ EncodingConstant(encoding); // Determine what to XOR into those 6 zeroes.
data ret(6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of preallocation we could do a reserve here:

    data ret;
    ret.reserve(CHECKSUM_SIZE);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to keep this PR focused on the limit and CHECKSUM_SIZE change, but perhaps a good change for #29607

src/bech32.h Outdated
* convert the CharLimit::VALUE to an int. In practice, this should never happen since this CharLimit applies to an address encoding
* and we would never encode an address with such a massive value */
enum CharLimit : size_t {
SEGWIT = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just call this BECH32, and come up with another name for variants that have other limits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@josibake josibake force-pushed the model-bech32-limit-as-enum branch from 0e431c5 to df20572 Compare May 13, 2024 08:26
@josibake
Copy link
Member Author

Changed the limit name to BECH32 per @sipa 's suggestion and updated to use CHECKSUM_SIZE throughout per @paplorinc 's review.

Copy link
Contributor

@paplorinc paplorinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

src/bech32.h Outdated
@@ -28,6 +28,14 @@ enum class Encoding {
BECH32M, //!< Bech32m encoding as defined in BIP350
};

/** Character limits for bech32(m) encoded strings. Character limits are how we provide error location guarantees.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to be consistent with the casing below

Suggested change
/** Character limits for bech32(m) encoded strings. Character limits are how we provide error location guarantees.
/** Character limits for Bech32(m) encoded strings. Character limits are how we provide error location guarantees.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/bech32.h Outdated
* convert the CharLimit::VALUE to an int. In practice, this should never happen since this CharLimit applies to an address encoding
* and we would never encode an address with such a massive value */
enum CharLimit : size_t {
BECH32 = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
BECH32 = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors
BECH32 = 90, //!< BIP173/350 imposed a 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it present tense feels more correct, perhaps:

Suggested change
BECH32 = 90, //!< BIP173/350 imposed 90 character limit on Bech32(m) encoded addresses. This guarantees finding up to 4 errors
BECH32 = 90, //!< BIP173/350 imposed character limit for Bech32(m) encoded addresses. This guarantees finding up to 4 errors.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Bech32(m) was defined with a 90 character limit so that certain
guarantees for error detection could be made for segwit addresses.
However, there is nothing about the encoding scheme itself that requires
a limit and in practice bech32(m) has been used without the 90 char
limit (e.g. lightning invoices).

Further, increasing the character limit doesn't do away with error
detection, it simply lessons the guarantees.

Model charlimit as an Enum, so that if a different address scheme is
using bech32(m), the character limit for that address scheme can be
used, rather than always using the 90 charlimit defined for segwit
addresses.

upate comment
@josibake josibake force-pushed the model-bech32-limit-as-enum branch from df20572 to d196442 Compare May 13, 2024 10:08
@josibake
Copy link
Member Author

Updated comment per @paplorinc 's suggestions.

@josibake
Copy link
Member Author

@paplorinc I noticed you approved the PR, but we don't really use github's approval/review flow for determining the review status of a PR.

Can you instead leave an ACK <commit> message per https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#code-review ? This way, intent is clear to the maintainers and DrahtBot can track the status and update the summary comment.

@paplorinc
Copy link
Contributor

ACK d196442

@DrahtBot DrahtBot requested a review from theuni May 13, 2024 10:32
@josibake
Copy link
Member Author

@theuni #30047 (comment) has been resolved by d196442, if you have time for a second look/reACK

src/bech32.cpp Outdated
@@ -308,18 +311,18 @@ bool CheckCharacters(const std::string& str, std::vector<int>& errors)
return errors.empty();
}

/** Expand a HRP for use in checksum computation. */
data ExpandHRP(const std::string& hrp)
std::vector<unsigned char> PreparePolynomialCoefficients(const std::string& hrp, const data& values, const size_t extra)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the only reason for passing extra here is so that it can be reserved?

If so, that's not very intuitive and seems kinda like overkill. Seems like always reserving an extra 6 bytes fine, even if they don't end up getting used?

Copy link
Contributor

@paplorinc paplorinc May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's either 0 or 6.
I agree that we can always add it at the end, amended my original PR: https://github.com/bitcoin/bitcoin/pull/29607/files#diff-f146300624c06d2e08aadf500952294148a1785edd6ff2e8b50f13b2c08255edR317

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to remove the extra argument.

@DrahtBot DrahtBot requested a review from theuni May 20, 2024 18:12
Replace ExpandHRP with a PreparePolynomialCoefficients function. Instead
of using a hardcoded value for the size of the array (90 in this case)
and a hardcoded value for the checksum, use the actual values vector and
define checksum size as a constexpr. Use the new CHECKSUM_SIZE
throughout instead 6.

Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
@josibake josibake force-pushed the model-bech32-limit-as-enum branch from d196442 to 7f3f6c6 Compare May 21, 2024 07:15
@paplorinc
Copy link
Contributor

re-ACK 7f3f6c6

Copy link
Member

@theuni theuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 7f3f6c6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants