Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Large Input Decoding Performance #101

Open
malvidin opened this issue Nov 23, 2022 · 5 comments
Open

Enhance Large Input Decoding Performance #101

malvidin opened this issue Nov 23, 2022 · 5 comments

Comments

@malvidin
Copy link

Using rug or num_bigint for larger inputs significantly increases performance. BigUint is slower for 10 bytes, but for 255 bytes it is ~10x faster, and for 10k bytes, it is ~50x faster

let translated_input = vec![1u8, 1, 1];
let b58_biguint = BigUint::from_radix_be(&translated_input, 58).unwrap();
let decoded_vec = b58_biguint.to_bytes_be();
assert_eq!(decoded_vec, "\r_".as_bytes());

Related issue: kevinheavey/based58#5

@Nemo157
Copy link
Member

Nemo157 commented Nov 24, 2022

Base58 isn't really intended to be used for large values. The primary point of using it is human readability and transcribability which I would estimate starts to break down due to sheer size at ~32 bytes, for larger values which don't need to be as human readable base64 gives much better efficiency.

@malvidin
Copy link
Author

Looking at the benchmarks included in this project, using integers appears faster. For inputs between 0-10 characters, using u64 is faster. For inputs between 10-20 characters, u128 is faster. For encoded outputs 32 bytes and larger, BigUint is faster.

Teal is using u64/u128/BigUint, and orange uses the original nested loops.

0 - violin (0)
1 - violin (1)
2 - violin (5)
3 - violin (10min)
4 - violin (10)
5 - violin (10max)
6 - violin (32)
7 - violin (256)
8 - violin (10k)

@malvidin
Copy link
Author

For decoding the following input sizes, using the corresponding intermediate are comparable or faster than nested loops over output [u8].

Input Size (chars) Intermediate Primitive / Struc
1..=10 u64
11..=21 u128
22..=43 [u64; 4]
44.. Vec<u64>
44.. Vec<u8>, BigUint::from_radix_be

For 32 bytes, similar to Bitcoin addresses, Vec<u64> times are 40% faster, and 256 and 1024 bytes are 70% faster.

See #102

@xpe
Copy link

xpe commented Dec 19, 2023

@Nemo157 is correct. See also https://digitalbazaar.github.io/base58-spec/#rfc.section.7.1

7.1 Quadratic behaviour of base58 algorithms

In general, when converting from a source base-encoding to a target base-encoding, if the target encoding is not the a numerical power of the source encoding, it is possible for the algorithm to degrade from linear algorithm complexity into quadratic algorithm complexity. That is, converting from base-2 (binary) to base-32 or base-64 can be done with a worst-case algorithm complexity of O(n) time, but converting from base-2 to base-58 has a worst-case complexity of O(n^2). In short, base-58 does not work well for encoding large byte values.

Implementers are urged to ensure that their production software imposes limits on input size. Encoding values greater than 256 bytes in base58 is NOT RECOMMENDED.

@malvidin
Copy link
Author

I understand the poor performance of encoding to a base that does not align to the source.

With a target of ~32 bytes, the performance can be increased by using [u64; 4] instead of looping over [u8].

There is increased complexity in handling much larger inputs, but with decreased resource utilization. This was my concern, as there are other uses of base58 where I cannot control the size of the input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants