Enhance Large Input Decoding Performance #101

malvidin · 2022-11-23T23:51:13Z

Using rug or num_bigint for larger inputs significantly increases performance. BigUint is slower for 10 bytes, but for 255 bytes it is ~10x faster, and for 10k bytes, it is ~50x faster

let translated_input = vec![1u8, 1, 1];
let b58_biguint = BigUint::from_radix_be(&translated_input, 58).unwrap();
let decoded_vec = b58_biguint.to_bytes_be();
assert_eq!(decoded_vec, "\r_".as_bytes());

Related issue: kevinheavey/based58#5

Nemo157 · 2022-11-24T07:50:31Z

Base58 isn't really intended to be used for large values. The primary point of using it is human readability and transcribability which I would estimate starts to break down due to sheer size at ~32 bytes, for larger values which don't need to be as human readable base64 gives much better efficiency.

malvidin · 2022-11-24T21:42:19Z

Looking at the benchmarks included in this project, using integers appears faster. For inputs between 0-10 characters, using u64 is faster. For inputs between 10-20 characters, u128 is faster. For encoded outputs 32 bytes and larger, BigUint is faster.

Teal is using u64/u128/BigUint, and orange uses the original nested loops.

malvidin · 2022-11-28T03:34:25Z

For decoding the following input sizes, using the corresponding intermediate are comparable or faster than nested loops over output [u8].

Input Size (chars)	Intermediate Primitive / Struc
`1..=10`	`u64`
`11..=21`	`u128`
`22..=43`	`[u64; 4]`
`44..`	`Vec<u64>`
`44..`	`Vec<u8>`, `BigUint::from_radix_be`

For 32 bytes, similar to Bitcoin addresses, Vec<u64> times are 40% faster, and 256 and 1024 bytes are 70% faster.

See #102

xpe · 2023-12-19T15:16:28Z

@Nemo157 is correct. See also https://digitalbazaar.github.io/base58-spec/#rfc.section.7.1

7.1 Quadratic behaviour of base58 algorithms

In general, when converting from a source base-encoding to a target base-encoding, if the target encoding is not the a numerical power of the source encoding, it is possible for the algorithm to degrade from linear algorithm complexity into quadratic algorithm complexity. That is, converting from base-2 (binary) to base-32 or base-64 can be done with a worst-case algorithm complexity of O(n) time, but converting from base-2 to base-58 has a worst-case complexity of O(n^2). In short, base-58 does not work well for encoding large byte values.

Implementers are urged to ensure that their production software imposes limits on input size. Encoding values greater than 256 bytes in base58 is NOT RECOMMENDED.

malvidin · 2023-12-19T16:33:26Z

I understand the poor performance of encoding to a base that does not align to the source.

With a target of ~32 bytes, the performance can be increased by using [u64; 4] instead of looping over [u8].

There is increased complexity in handling much larger inputs, but with decreased resource utilization. This was my concern, as there are other uses of base58 where I cannot control the size of the input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Large Input Decoding Performance #101

Enhance Large Input Decoding Performance #101

malvidin commented Nov 23, 2022

Nemo157 commented Nov 24, 2022 •

edited

malvidin commented Nov 24, 2022

malvidin commented Nov 28, 2022

xpe commented Dec 19, 2023

malvidin commented Dec 19, 2023

Enhance Large Input Decoding Performance #101

Enhance Large Input Decoding Performance #101

Comments

malvidin commented Nov 23, 2022

Nemo157 commented Nov 24, 2022 • edited

malvidin commented Nov 24, 2022

malvidin commented Nov 28, 2022

xpe commented Dec 19, 2023

malvidin commented Dec 19, 2023

Nemo157 commented Nov 24, 2022 •

edited