LIP: 0018
Title: Use long hash of public key for address and Base32 encoding for UI
Author: Andrea Kendziorra <andreas.kendziorra@lightcurve.io>
Discussions-To: https://research.lisk.com/t/use-base32-encoding-of-long-hash-of-public-key-plus-checksum-for-address/
Status: Active
Type: Standards Track
Created: 2019-03-08
Updated: 2021-12-01

Abstract

This LIP proposes a new format for addresses (also referred to as Lisk ID). The proposed address format uses 160 bits of the SHA-256 hash of the public key. On the user interface level, addresses are displayed in the Lisk32 representation. This representation uses a customized Base32 encoding and extends addresses with a 30-bit checksum generated by the BCH code proposed in BIP 173.

The new address format makes the "registration process" (performing at least one transaction per account to associate a public key to an address) obsolete. Moreover, the collision resistance gets significantly increased, and accidentally sending funds to a wrong address becomes almost impossible due to the checksum.

Copyright

This LIP is licensed under the Creative Commons Zero 1.0 Universal.

Motivation

The current address system was designed with the aim to be convenient for users when reading, typing, inserting or communicating an address. This was done by choosing a short length. In fact, the addresses are generated by taking the first 64 bits of the SHA-256 hash of the public key (plus a few small modifications). Despite the advantages, the address system has at least three significant shortcomings. First, an account holder is currently strongly advised to perform at least one transaction from an account to associate a public key with the address. Otherwise, an adversary could find another key pair that yields the same address and use this one to spend funds from this account.

Secondly, the addresses do not have a checksum or any other error detection mechanism. Hence, mistyping a single character of the address for a balance transaction results in sending funds to a different account (with very high probability, no user has a corresponding key pair for this account).

The third disadvantage is the low resistance against collisions of addresses. If the number of created Lisk accounts will rise significantly in the future, the probability for a collision could become non-negligible.

Therefore, a new address system is proposed that eliminates the mentioned issues. The first and the third issue get resolved by using a 160-bit hash of the public key. The second one by extending the address by a checksum on the user interface level. Since the overall bit length of the hash value plus checksum is rather large, a Base32 encoding is used on the user interface level. This results in shorter addresses than in hexadecimal representation while still being human-friendly.

Specification

The specification is split into three parts. The first one defines the new address format. The second part defines a representation of addresses, by adding a checksum and using a Base32 encoding, that is proposed to be used on the user interface level. The third part specifies the migration from the current address system to the new one.

Address Specification

This section covers the definition of the address and further protocol related specifications.

Address Computation

The address for an account with public key, pubkey, is generated by computing SHA-256(pubkey) and truncating the output to its first 160 bits.

Usage in the Protocol

The 160-bit addresses have to be used for the recipientID property of balance transactions. Moreover, if LIP-0023 becomes active, then these addresses have to be used for

the delegateAddress property of votes, i.e., the entries in asset.votes of vote transactions, and
the delegateAddress property of unlock objects, i.e., the entries of asset.unlockObjects in unlock transactions.

Address Length Validation in Transaction Validation

When a transaction object contains an address, the length has to be validated. That means, the address must consist of 20 bytes. Otherwise, the transaction has to be rejected.

Serialization

Whenever an address value has to be serialized, the 20 bytes of the public key hash are taken without any modifications.

Address Representation for User Interface Level

This sections defines a representation (called Lisk32) that extends the addresses defined on the protocol level by a checksum and that uses a custom Base32 encoding. This representation is proposed to be used on the user interface level throughout the whole Lisk ecosystem. Notice that this representation only modifies the way addresses are displayed, but not the addresses themselves. Therefore it is safe to name them just "Lisk address" or "address" in user interfaces.

Encoding Steps

Let addr be an address, i.e., a byte sequence of length 20. Then addr is encoded as follows:

Compute the checksum of addr using the BCH code described below.
Append the output of step 1 to addr.
Encode the output of step 2 using a custom Base32 encoding (see below for details).
Add the prefix "lsk" to the output of step 3.

The output of step 4 is the Lisk32 representation of addr. An example of the required computation to generate the Lisk32 representation of an address can be found in the appendix. In the following subsections, some details of the steps above get clarified.

Step 1: BCH Checksum

The checksum is computed using the BCH code proposed in BIP 173. In this section, we only provide example JavaScript code that computes and validates the checksum correctly (based on the code from bech32). For a more formal description, see the appendix.

Creating Checksum

The checksum can be computed by the function createChecksum below. For this, the 160 bits of addr have to be split into 5-bit strings, and each bit string has to be converted into an integer. Hence, the input for createChecksum is a list of 32 integers between 0 and 31. The output of createChecksum is a sequence of 6 integers between 0 and 31, where each integer represents a 5-bit string. Thus, the checksum consists of 30 bits.

function polymod (values) {
  var GENERATOR = [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3];
  var chk = 1;
  for (var p = 0; p < values.length; ++p) {
    var top = chk >> 25;
    chk = (chk & 0x1ffffff) << 5 ^ values[p];
    for (var i = 0; i < 5; ++i) {
      if ((top >> i) & 1) {
        chk ^= GENERATOR[i];
      }
    }
  }
  return chk;
}

function createChecksum (data) {
  var values = data.concat([0, 0, 0, 0, 0, 0]);
  var mod = polymod(values) ^ 1;
  var ret = [];
  for (var p = 0; p < 6; ++p) {
    ret.push((mod >> 5 * (5 - p)) & 31);
  }
  return ret;
}

Validating Checksum

To verify if a code word has a correct checksum, the function verifyChecksum below can be used.

function verifyChecksum (codeword) {
  return polymod(codeword) === 1;
}

As for the checksum creation, the input for verifyChecksum must be a sequence of integers between 0 and 31. The 6 most right integers of the codeword represent the checksum.

Step 3: Base32

The Base32 encoding in step 3 is a customized version of Base32. It uses lower case letters and digits. The alphabet consists of the following characters:

abcdefghjkmnopqrstuvwxyz23456789

The characters "i", "l" (lowercase L), "0" (zero) and "1" are excluded.

In the following subsections, the encoding and decoding for bit lengths that are multiples of 5 are described. Note that the encoding and decoding could easily be extended to arbitrary bit lengths.

Encoding

First, the bit string is split into substrings of 5 bits and the substrings are processed from left to right. Each substring is mapped to a character from the alphabet set according to the following mapping f, where each 5-bit string is interpreted as an integer:

x	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
f(x)	z	x	v	c	p	m	b	n	3	4	6	5	o	9	7	8

x	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31
f(x)	u	y	r	t	k	q	e	w	2	a	d	s	j	h	f	g

In the case of step 3 of the Lisk32 representation computation, the input string consisting of 190 bits is split into 38 substrings consisting of 5 bits, and each substring gets converted to a character by the mapping f .

Decoding

Each character of the encoded string gets converted to a 5-bit string by the inverse of the mapping f above. The resulting bit string represents the output. In our specific case, the decoder has to convert 38 characters into a sequence of 38 integers between 0 and 31.

Step 4: Prefix

The prefix "lsk" is prepended to the output of step 3, yielding a string of length 41.

Validation of Lisk32 Representation

The following steps have to be done to validate if a given string, addressLisk32, satisfies the proposed Lisk32 representation for addresses. Note that these steps validate only the format of the address. It does not guarantee that any user or even a specific user is in possession of a key pair that corresponds to this address. The validation consists of:

Verify that addressLisk32 is a string of length 41.
Verify that the substring of the first 3 characters of addressLisk32 equals "lsk".
Let substr be the substring of the last 38 characters of addressLisk32. Verify that every character in substr is an element of the used Base32 alphabet. Uppercase letters are not allowed.
Decode substr into sequence of 38 integers between 0 and 31, denoted by intseq, by using the inverse of the mapping f above. Verify that verifyChecksum(intseq) returns True.

If any of these validation steps fails, then the address does not have a valid format.

Usage of Lisk32 Representation of Addresses

This subsection does not contain rules for the protocol, but recommendations for the design of user interfaces and front end tools to make use of all capabilities of the Lisk32 representation of addresses. The recommendations are:

Whenever addresses are communicated between front end tools and/or users, e.g., to request a balance transfer transaction, addresses should be displayed and requested in their Lisk32 representation.
Whenever addresses in their Lisk32 representation are handled, they should be validated first with the steps listed above. If the validation fails, the addresses should not be used for any further processing.
When entering addresses in their Lisk32 representation into an input field, letters should be automatically converted to lowercase letters.

Migration

This section explains how nodes migrate to the new address system.

We distinguish between accounts with and without associated public key. Note that an account has an associated key if and only if an outgoing transaction from this account is included in the blockchain.

Accounts with Public Key

For accounts with an associated public key, the new 20-byte address has to be computed and stored such that the account is accessible via the new address.

Accounts without Public Key

Funds that were sent to an address of the current format and for which no public key was associated before the implementation of the new address system are not automatically accessible. To make them accessible, a special transaction, called reclaim transaction, has to be included in the blockchain. The set of addresses of the current format without associated public key and for which no reclaim transaction was included in the blockchain have to be persisted along with their balances. In the following, we denote this set of addresses by unregisteredAddresses.

Reclaim Transaction

Type

The type of the reclaim transaction is the lowest positive integer that is not used for any other transaction type on the Lisk blockchain (8 at the time of writing).

Schema

The asset property of a reclaim transaction needs to contain the amount property which is of type uint64. The value of asset.amount must be equal to the balance associated to the address of the current format in Beddows.

A reclaim transaction is of the following form:

{
  "type": <type>, // At the time of writing, the value must be 8
  "senderPublicKey": <Public key of the sender>,
  ...
  "asset" : {
    "amount" : <Amount in Beddows>
  }
}

Validity

A reclaim transaction, tx, must fulfill the following points to be valid:

Let old_addr be the address according to the old format for tx.senderPublicKey. old_addr must be contained in unregisteredAddresses.
Let old_balance be the sum of all funds send to old_addr in Beddows. Then tx.asset must contain the property amount, and tx.asset.amount must be equal to old_balance.

Notice that the base properties of the transaction also have to be valid. In particular, this implies that the transaction fee is paid by the account corresponding to tx.senderPublicKey. Hence this account needs to exist beforehand and have enough balance to pay for the fee.

State Changes

We use the same notation as in the Validity section. If the transaction tx is included in the blockchain, the following state changes apply:

old_addr is removed from unregisteredAddresses.
tx.asset.amount is added to the balance of the account corresponding to tx.senderPublicKey.

Serialization

The asset property of a reclaim transaction tx needs to be serialized to an array of 8 bytes containing the big endian encoding of tx.asset.amount, a 64-bit unsigned integer.

Rationale

Distinct Representation on the UI Level

Whenever a user inserts an address into a user interface, the validation of the checksum should happen there. Therefore, all validly signed transactions arriving at a node should contain the address that the user intended to provide. Hence, the checksum is not required on the protocol level and would unnecessarily consume space resources. The same holds for a user friendly encoding. Therefore, compact addresses are used on the protocol level and the longer, more user friendly Lisk32 representation on the user interface level. Note that this is similar to the current protocol where 8 bytes are used when an address is serialized, and a string that is a concatenation of a decimal number and the character "L" in user interfaces and in JSON documents.

Hashing Step

To keep the length of the addresses at a reasonable level, a hash of the public key with a length smaller than 256 bits is used instead of the public key itself. When choosing the hash length, one crucial requirement has to be kept in mind: For a given address, it should be computationally infeasible to find a key pair (public key and private key) that yields this address. This requirement prevents that an attacker can find a key pair for an address for which no public key is yet associated to (a public key gets associated to an address the first time a transaction outgoing from this account is included in a block). If an attacker were able to find such a key pair, they could transfer the funds from this account. In the current Lisk protocol, it is computationally very expensive to find a key pair for a given address, but it is not infeasible. For an account with no associated public key, such an attack may be economically profitable if there are enough funds in the account. Therefore, it is currently recommended to do at least one transaction per account such that the address is associated to the account holder's public key. This process is sometimes referred to as the "address registration" process. However, the address registration process is an additional step required from the user, and it is highly desired to make it obsolete.

The length of the hash output determines the resistance against the mentioned attack. Although a security level of 128 bits is considered to be sufficient for the next years (see the security recommendations from ECRYPT [1, Chapter 2] and NIST [2, Section 5.6.2]), we choose a hash output length of 160 bits which provides 160-bit resistance against such attacks. Besides the strong pre-image resistance, this output length also provides sufficient collision resistance, namely 80-bit resistance. Note that collisions do not provide any value to an attacker. Therefore, the hash function does not need to be resistant against brute force attacks for collisions. However, collisions of addresses could result in “accidental” losses of money: If a user creates a new account by creating a key pair and provides the resulting address to another user in order to receive a payment, and the address is already associated to another public key, then the user will never be able to spend the funds send to this address. Only the key pair associated with the address can be used to spend the funds. With the chosen hash length, the probability for an address collision becomes negligible, even for very large numbers of addresses. For example, the probability of having an address collision after 10¹⁵ addresses is smaller than 10^-18.

Why SHA-256?

SHA-256 is the currently used hash function in the Lisk protocol. To keep the protocol simple, we propose to not introduce a new hash function and to use SHA-256 here as well.

Address Representation for User Interface Level

Checksum

Several checksum options were considered and are discussed in the following.

Hash Functions

A very common choice in blockchain projects is to use a truncated hash value, where a length of 32 bits is typically used. However, hash functions do not provide any error detection guarantees and were discarded for this reason.

Reed-Solomon Codes

Some versions of shortened Reed-Solomon codes were investigated.

The first one uses the alphabet size 32, which is very suitable for using a Base32 encoding. That means, each character of an address (represented by 5 bits) corresponds to one symbol of a codeword. Since a Reed-Solomon code with alphabet size q allows only a maximum block length of q-1, the maximum block length for q=32 is 31. Hence, a codeword would consist of maximum 31*5 = 165 bits. Since this is too short for a 160-bit input plus an adequate checksum, the input has to be split into parts which results in two checksums. More precisely, the input (the 160-bit hash value) is split into two inputs of 80 bits, and both inputs are encoded with a [31, 28, 4]₃₂Reed-Solomon code, where the output is shortened. Hence, there is a 3 symbol (15 bits) checksum for each input. The distance of the code, d=4, guarantees error detection for up to 3 errors in the address, which is less than for the chosen BCH code. Moreover, the error detection capabilities are very poor for more than 3 errors. For example, the false positive rate (false positive means to not detect errors) for 4 errors in an address is ~3*10^-6.

The second considered Reed-Solomon code is the one with the parameters [1023, 1020, 3]₁₀₂₄. The checksum consists of 3 symbols with 10 bits each. One symbol of a codeword always represents 2 characters in the Base32-encoded address. The distance of the code, d=4, guarantees detection of up to 3 errors in the Base32-encoded address. This is again worse than for the chose BCH code. Moreover, the error detection capabilities for more errors are worse than the ones for the BCH code.

Other alphabet size were disregarded due to the worse error detection guarantees in the Base32 encoding (unless one uses more than 32 bits for the checksum).

Cyclic Redundancy Check Codes

Several cyclic redundancy check (CRC) codes were considered, mainly 32-bit CRCs. The best performing CRC code is the one by Castagnoli (also referred to as CRC-32/4 in [3]). It's error detection capabilities for more than 4 errors in the Base32-encoded addresses are better than for the chosen BCH code. However, the error detection guarantee is worse. The code guarantees detection of up to 7 bit inversions which results in an error detection guarantee of up to one character in Base32 encoding. Moreover, the overall length of the address increases by one with a 32-bit checksum. Therefore, this CRC code was not selected.

BCH Codes

BCH codes have many free parameters which results in many possible choices. Finding a suitable choice of parameters requires an extensive computational search. We simply use the result presented in BIP 173 without performing the search ourselves. This code is well suited for a Base32 encoding. It provides detection guarantee of up to 4 characters in Base32 encoding for the input length of 38 characters, has good error detection capabilities for more than 4 errors and is relatively efficient. Moreover, the authors of BIP 173 claim that it was optimized for detecting low numbers of bit inversions which fits well together with the Base32 encoding proposed here (see below for the reasoning of the specific encoding function).

Base32 Encoding

Why Base32?

A Base32 encoding is clearly preferable over the decimal or hexadecimal system as it is shorter (~20% shorter than hexadecimal). Base64 or any other encodings using a larger alphabet were excluded due to the usage of special characters that can have disadvantages when double-clicking on an address. Any alphabet size that is not a power of 2 was excluded as well, as it disallows to map consecutive strings of bits to characters of the alphabet. Hence, a single mistyped character in an address using such an alphabet could result in many inverted bits which makes it unsuitable for our checksum choice. Base58 is an example of such an alphabet. Hence, a Base32 encoding is the preferred choice.

Although there exist already some versions of Base32, for instance RFC 4648 and z-base-32, none of them are satisfactory. This is in most cases due to the selection of the alphabet. The Base32 encoding that matches our requirements the most (but not sufficiently) is the one proposed in BIP 173. Due to the unsatisfactory design of the existing Base32 versions, a custom version is defined in this protocol.

The Alphabet

The alphabet should not contain mixed case letters to ease the communication, for instance, via phone. Lowercase letters are preferred over uppercase letter as they are easier to read. Using an alphabet size of 32 allows to exclude 4 characters from the set of all lowercase letters and all digits. The characters "1", "i", "l" (lowercase L), "0" (zero) and "o" are for many fonts the most ambiguous ones. Restricting the alphabet to lowercase letters makes "o" and "i" less ambiguous than the other ones, which led to the decision in keeping one of the two ("o" is part of the alphabet).

The Encoding Function

The encoding function which maps each 5-bit string to a character of the alphabet was chosen in the way to have few bit inversions for some error types that are expected to be common. As the checksum is optimized for detecting low numbers of bit inversions, the probability of detecting more than 4 errors of these types increases. The types of error considered are:

mistyping a character by a left or right adjacent character on the keyboard, and
mistyping a character by a similar looking character.

Errors of the first kind were given a priority. The 32-bit Gray code was used to ensure that two elements of the alphabet that are in the same row and adjacent on a QWERTY keyboard differ in exactly one bit. In fact, all possible permutations of the contiguous rows of the QWERTY keyboard consisting of elements from the alphabet, i.e. the rows:

23456789, qwertyu, op, asdfghjk, zxcvbnm

fulfill this condition when their concatenation is mapped to the Gray code. Moreover, using the reversed rows and rotations of the concatenations fulfills the same condition. Hence, all these different orderings were considered, and the one that is the best with respect to the second kind of error was chosen. More precisely, the bit representations of any of the pairs^†:

(b, 6), (g, q), (g, 9), (q, 9), (s, 5), (v, u), (z, 2), (q, p), (r, v)

should differ in at most two bits, and the number of pairs differing in two bits should be as small as possible. The lowest number of pairs differing in two bits is 7. In fact, there are 40 different orderings that fulfill this number. The encoding function f defined in the specification is just a random choice from these 40 possibilities.

Prefix

A prefix is added to the address to minimize the risk that a Lisk address gets confused with an address of another blockchain project. This can be helpful, for instance, when users use applications/clients that handle several cryptocurrencies including Lisk.

Migration

The usage of a reclaim transaction for accessing funds send to "unregistered" addresses was preferred over solutions that automatically merge account information of old and new addresses (e.g., as in this post) for the following architectural reason: It allows that the SDK does eventually not need to implement the current address format. In contrast, an automatic merge solution that is part of the transaction verification is hard to extract from general transaction verification logic, and therefore hard to remove from the SDK. Moreover, an automatic merge solution requires additional computational steps whenever a transaction contains a public key that was not contained in the database before.

Amount Property in Reclaim Transactions

One may argue that the amount property of a reclaim transaction is not required to process the transaction. This is true. However, we think that the state of a given account should be easily computed with the incoming and outgoing transaction from that account. In that regard, the reclaim transaction must indicate how many tokens were added to the account balance.

Backwards Compatibility

This change introduces a hard fork because:

balance transactions using the current address format get rejected by nodes following the proposed rules, and vice versa, and
reclaim transactions get rejected by nodes following the current protocol.

References

[1] M. Abdalla et. al, “Algorithms, Key Size and Protocols Report (2018)”, ECRYPT-CSA, Feb. 2018, http://www.ecrypt.eu.org/csa/documents/D5.4-FinalAlgKeySizeProt.pdf

[2] E. Barker, “Recommendation for key management”, NIST Special Publication 800-57 Part 1 Rev. 4, Jan. 2016, https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-57pt1r4.pdf

[3] G. Castagnoli, S. Brauer, M. Herrmann, “Optimization of cyclic redundancy-check codes with 24 and 32 parity bits”, IEEE Transactions on Communications, Vol. 41 (6), pp. 883 - 892, Jun. 1993

Reference Implementation

Mitsuaki Uchimoto: LiskHQ/lisk-sdk#5337
Nazar Hussain: LiskHQ/lisk-sdk#5333
Shusetsu Toda: LiskHQ/lisk-sdk#5311

Appendix

Examples

Address Computation and UI Representation

In the following, the address for the public key

pubkey = 0x0eb0a6d7b862dc35c856c02c47fde3b4f60f2f3571a888b9a8ca7540c6793243

and its Lisk32 representation get computed.

Let

H' = SHA-256(pubkey) = 0xc247a42e09e6aafd818821f75b2f5b0de47c8235b580881bd7750c9365993d25

and let H be the first 160 bits of H', i.e.

H = 0xc247a42e09e6aafd818821f75b2f5b0de47c8235

Then, H is the address for pubkey. In order to get its Lisk32 representation, one needs to represent H as a sequence of 5-bit integers:

H_5bit = [24 9 3 26 8 11 16 9 28 26 21 15 27 0 12 8 4 7 27 21 22 11 26 27 1 23 18 7 25 0 17 21]

Let CS be the checksum of H, i.e.

CS = createChecksum(H_5bit) = [21 21 31 11 22 16]

Let HCS be the concatenation of H and CS in 5-bit representation, i.e.

HCS = [24 9 3 26 8 11 16 9 28 26 21 15 27 0 12 8 4 7 27 21 22 11 26 27 1 23 18 7 25 0 17 21 21 21 31 11 22 16]

Let B be the Base32 encoding of HCS, i.e.

B = 24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu

Adding the prefix "lsk" to B yields the Lisk32 representation of the address:

lsk24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu

Valid Lisk32 Representation of Addresses

We list some example addresses for a given public key. Note that the provided public keys are not necessarily valid public keys (curve points for which there exists a private key).

public key = 0x0000000000000000000000000000000000000000000000000000000000000000
address in hexadecimal           = 0x66687aadf862bd776c8fc18b8e9f8e2008971485
Lisk32 representation of address = lskoaknq582o6fw7sp82bm2hnj7pzp47mpmbmux2g

public key = 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
address in hexadecimal           = 0xaf9613760f72635fbdb44a5a0a63c39f12af30f9
Lisk32 representation of address = lskqf5xbhu874yqg89k449zk2fctj46fona9bafgr

public key = 0x00ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
address in hexadecimal           = 0xc946da78163c094fd8310efc9a81be13cac6a518
Lisk32 representation of address = lskamc9kfzenupkgexyxsf4qz9fv8mo9432of9p5j

public key = 0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff00
address in hexadecimal           = 0x506c2d6c08d12bed8710517e8e05a0a3c35d3002
Lisk32 representation of address = lsk6xevdsz3dpqfsx2u6mg3jx9zk8xqdozvn7x5ur

public key = 0x749b88baa787e83b5a06bbfee95002ac5a9925dcaeea28262e683498147b8fce
address in hexadecimal           = 0x0dce64c0d36a3e04b6e8679eb5c62d800f3d6a27
Lisk32 representation of address = lskxwnb4ubt93gz49w3of855yy9uzntddyndahm6s

public key = 0xe877c250d725ac7ca6c5d9d83b39645de7eebe6c4ae6e076da7163434d297c7d
address in hexadecimal           = 0x053d7733df22210dd0e6b4ec595a29cdb33ffb07
Lisk32 representation of address = lskzkfw7ofgp3uusknbetemrey4aeatgf2ntbhcds

Invalid Lisk32 Representation of Addresses

In the following, we list some invalid Lisk32 representations of addresses and for each at least one reason for the invalidity.

24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu      # missing prefix

lsk24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5e    # incorrect length (length 40 instead of 41)

lsk24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5euu  # incorrect length (length 42 instead of 41)

LSK24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu   # incorrect prefix

tsk24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu   # incorrect prefix

lsk24cd35u4jdq8sz03pnsqe5dsxwrnazyqqqg5eu   # invalid character (contains a zero)

lsk24Cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu   # invalid character (contains an upper case 'C' instead of lower case 'c')

LSK24CD35U4JDQ8SZO3PNSQE5DSXWRNAZYQQQG5EU   # invalid characters (all letters in upper case)

lsk24dc35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu   # invalid checksum due to character swap

Formal Description of the BCH Code

The specification section provides example code for creating and validating the checksum, but not a formal description. This shall be done here (see also the documentation of the Bitcoin implementation of bech32 for some details; for example, the representation of the generator polynomial as an array of 32 bit numbers is explained there).

Let F₃₂ be the finite field with 32 elements. We use the representation F₂[x]/(x⁵+x³+1) for F₃₂, i.e., every element in F₃₂ is interpreted as a polynomial with coefficients in F₂ modulo x⁵+x³+1. Each polynomial is identified by the decimal number whose bit representation matches with the polynomial. For example, the decimal number 23 represents the polynomial x⁴+x²+x+1 because the bit representation of 23 equals 10111. Let g(x) be the polynomial with coefficients in F₃₂ of the following form:

g(x) = x⁶ + 29x⁵ + 22x⁴ + 20x³ + 21x² + 29x + 18

g(x) is the generator polynomial of the BCH code. Every input for the checksum is interpreted as a polynomial with coefficients in F₃₂ too. For example, the input:

d = 0xc247a42e09e6aafd818821f75b2f5b0de47c8235

is interpreted as the polynomial:

d(x) = 24x³¹ + 9x³⁰ + 3x²⁹ + … + 25x³ + 0x² + 17x + 21

where each coefficient represents 5 bits of d (see the example above for all coefficients). To achieve that a codeword can be represented as the concatenation of the input and the checksum, systematic encoding is used. In this specific case, the checksum polynomial, chk(x), can be computed by:

chk(x) = - (d(x) * x⁶ mod g(x))

for an input polynomial d(x). Then, the whole codeword, represented as a polynomial c(x), has the form:

c(x) = d(x) * x⁶ + chk(x)

A polynomial p(x) with coefficients in F₃₂ is then a valid codeword if and only if:

0 = p(x) mod g(x)

holds. However, two more tweaks were used for the design of the bech32 checksum. Both are discussed in the following subsections.

Adding a Leading Term to the Input Polynomial

A leading term is added to the input polynomial in bech32. More precisely, if k is the length of the input sequence of 5-bit integers, then let:

d’(x) = x^k + d(x)

The polynomial d’(x) is then used for the checksum computation instead of d(x). This ensures that the checksum differs if some zeros are added to the left of the input sequence of 5-bit integers. E.g., the inputs:

[5, 21, 31] and [0, 0, 5, 21, 31]

yield different checksums in bech32. Here, this property is not needed due the fixed length input, but was kept to be compatible with bech32. Since the input sequence has always length 32, we always have:

d’(x) = x³² + d(x)

in our case.

Codewords Are not Multiples of Generator Polynomial

The second tweak is to add 1 to d’(x) * x⁶ when computing the checksum, i.e., the checksum is computed as:

chk’(x) = - (d’(x) * x⁶ + 1 mod g(x))

Then, a polynomial p(x) with coefficients in F₃₂ is a valid codeword if and only if:

-1 = p(x) mod g(x)

holds (note that 1=-1 holds in F₃₂). This avoids that appending some zeros to the right of a codeword yields again a valid codeword. If the checksum were computed by:

chk(x) = - (d(x) * x⁶ mod g(x))

then appending n zeros to the right of a valid codeword would yield again a valid codeword. This is because:

0 = c(x) mod g(x) implies 0 = c(x) * xⁿ mod g(x)

This property is not strictly required for this proposal due to the fixed length of codewords, but was kept to be compatible with bech32.

Notes

[^†]: Note that this list of pairs of similar looking characters is based on subjective perception since no scientific result is known to the author.

Files

lip-0018.md

Latest commit

History

lip-0018.md

File metadata and controls

Abstract

Copyright

Motivation

Specification

Address Specification

Address Computation

Usage in the Protocol

Address Length Validation in Transaction Validation

Serialization

Address Representation for User Interface Level

Encoding Steps

Step 1: BCH Checksum

Creating Checksum

Validating Checksum

Step 3: Base32

Encoding

Decoding

Step 4: Prefix

Validation of Lisk32 Representation

Usage of Lisk32 Representation of Addresses

Migration

Accounts with Public Key

Accounts without Public Key

Reclaim Transaction

Type

Schema

Validity

State Changes

Serialization

Rationale

Distinct Representation on the UI Level

Hashing Step

Why SHA-256?

Address Representation for User Interface Level

Checksum

Hash Functions

Reed-Solomon Codes

Cyclic Redundancy Check Codes

BCH Codes

Base32 Encoding

Why Base32?

The Alphabet

The Encoding Function

Prefix

Migration

Amount Property in Reclaim Transactions

Backwards Compatibility

References

Reference Implementation

Appendix

Examples

Address Computation and UI Representation

Valid Lisk32 Representation of Addresses

Invalid Lisk32 Representation of Addresses

Formal Description of the BCH Code

Adding a Leading Term to the Input Polynomial

Codewords Are not Multiples of Generator Polynomial

Notes