Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LSP2] CompactBytesArray - complete technical overview with examples #172

Open
JeneaVranceanu opened this issue Feb 6, 2023 · 6 comments

Comments

@JeneaVranceanu
Copy link
Member

JeneaVranceanu commented Feb 6, 2023

This issue is aimed at having a complete technical low-level overview of CompactBytesArray type with examples which combined should cover CompactBytesArray completely. LSP2 must have all valueTypes described and covered the same way Solidity documentation has covered its data types and respective ABI encoding/decoding.


TL;DR

Not enough documentation on how to encode/decode [CompactBytesArray]. We have explanations for bytes[CompactBytesArray] and bytesN[CompactBytesArray] in LSP2, but in LSP6 we use (bytes4,address,bytes4)[CompactBytesArray] and both the type and the encoding don't match what is described in LSP2.


The issue

Currently (06.02.2023) in LSP2 we have bytes[CompactBytesArray] and bytesN[CompactBytesArray] value types. These are covered and well understood. But that documentation comes short of the explanation required with the introduction of LSP6's AllowedCalls. Both of these were released with v0.8.0 see CHANGELOG8.pdf

LSP6's AllowedCalls scheme uses valueType set as (bytes4,address,bytes4)[CompactBytesArray], and it has the custom case of encoding [CompactBytesArray] of tuples described in LSP6 (not the right place, IMO). This custom case, first of all, must not be custom and it must be covered by LSP2, and it's not.

On top of that, since we have (bytes4,address,bytes4)[CompactBytesArray] in LSP6 and it doesn't match bytes[CompactBytesArray] and bytesN[CompactBytesArray] it implicitly allows developers to use any other type with [CompactBytesArray], like address[CompactBytesArray][CompactBytesArray] (2D CompactBytesArray). And the question is - how will it be encoded? There is no single source of truth that everyone can refer to at the moment to implement the correct encoding/decoding functions of CBAs.

LSP2 is a standard. We must define low-level specifications that others will build upon their custom structures the same way we use Solidity and its specifications to encode and decode ABI of types it provides. These custom structures will use a specific, known and well-documented set of small blocks (value types) and everyone will be able to refer to LSP2 instead to understand how to create and disassemble these small blocks (encode and decode ABI).


Additional question: if the compact bytes array has 2 or more levels of nested types do we compact only the first layer or all of them? My answer is - only the first layer, the type declared directly next to the left side of the [CompactBytesArray].

Example: (address[],(boolean,bytes4))[CompactBytesArray] (a type that doesn't make sense but is technically possible).

Do we encode ...,(boolean,bytes4)... tuple the default way or do we reduce/compact it as well?

My understanding is that we encode it the default way. The same applies in my opinion to the question above about arrays in CompactBytesArray.

Encoding example of (address[],(boolean,bytes4))[CompactBytesArray]:

// The `(address[],...)` part of `(address[],(boolean,bytes4))[CompactBytesArray]`
web3.eth.abi.encodeParameter('address[]', ['0x98d2fF3907A4a9dEb10B1F2F79EBF078984501BF'])
=>
0x0000000000000000000000000000000000000000000000000000000000000020
0000000000000000000000000000000000000000000000000000000000000001
00000000000000000000000098d2ff3907a4a9deb10b1f2f79ebf078984501bf

// The `(...,(boolean,bytes4))` part of `(address[],(boolean,bytes4))[CompactBytesArray]`
web3.eth.abi.encodeParameter('(bool,bytes4)', [true, '0xffbbffbb'])
=>
0x0000000000000000000000000000000000000000000000000000000000000001
ffbbffbb00000000000000000000000000000000000000000000000000000000

// And instead of encoding these values as default array (with the total count of bytes = 256):
web3.eth.abi.encodeParameter('(address[],(bool,bytes4))[]', [[['0x98d2fF3907A4a9dEb10B1F2F79EBF078984501BF'], [true, '0xffbbffbb']]])
=>
0x0000000000000000000000000000000000000000000000000000000000000020
0000000000000000000000000000000000000000000000000000000000000001
0000000000000000000000000000000000000000000000000000000000000020
0000000000000000000000000000000000000000000000000000000000000060
0000000000000000000000000000000000000000000000000000000000000001
ffbbffbb00000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000001
00000000000000000000000098d2ff3907a4a9deb10b1f2f79ebf078984501bf

// We will encode these values as CompactBytesArray (with the total count of bytes = 196).
// Note: erc725js.encodeParameter is not a real function! Used just for explanation.
erc725js.encodeParameter('(address[],(bool,bytes4))[CompactBytesArray]', [[['0x98d2fF3907A4a9dEb10B1F2F79EBF078984501BF'], [true, '0xffbbffbb']]])
=>
00C0
0000000000000000000000000000000000000000000000000000000000000020
0000000000000000000000000000000000000000000000000000000000000060
0000000000000000000000000000000000000000000000000000000000000001
ffbbffbb00000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000001
00000000000000000000000098d2ff3907a4a9deb10b1f2f79ebf078984501bf

After a discussion with @b00ste on Friday (04.02.2023) it also came to mind that [CompactBytesArray] could completely replace []. Any thoughts on that idea are welcome.

But [CompactBytesArray] has two downsides in comparison to [] one of which can be fixed:

  • (fixable) no way to declare fixed size [CompactBytesArray]. For the default array type, you can place a number between square brackets to define the array's size, e.g. [2]. As a solution it could be placed separated by a colon [CompactBytesArray:2];
  • (non-fixable*) just having the ABI of [CompactBytesArray] you have no way to tell if it's valid or not. The ABI of [] (as of any other Solidity type) must have the number of bytes be a multiple of 32. One byte off and it's not valid. This is not the case with [CompactBytesArray] and cannot be.
    * if the solution mentioned below for the fixed-size CBAs will be applied there is a certain set of conditions that will allow us to produce a type that can be validated before the encoding attempt has happened. The set of conditions is: (1) all subtypes have fixed size and (2) CBA is itself fix-sized.
    Example: the expected bytes count for bytes8[CompactBytesArray:10] is (8 bytes + 2 bytes size prefix) * 10 = 100 bytes.
@JeneaVranceanu
Copy link
Member Author

JeneaVranceanu commented Feb 6, 2023

[WIP].

The encoding/decoding is implemented in Swift but is not yet properly mapped to something more simple for everyone to read, understand and discuss. Will update this comment soon. The main points are laid out already.

Solution for encoding

Code snippets are not written in any specific language

General encoding

How the encode function should handle types that are CBA or have nested CBAs:

func encode(type, value, isSubtypeOfCBA = false)
    if type is CBA:
        // CBA must have a subtype otherwise even the syntax is not valid
        subtype = type.subtype
        encoded_data: bytes = '0x'
        for subvalue in value:
            // Also setting third parameter `isSubtypeOfCBA` as true 
            // The encoding of non-complex types assumes we do not add prefixes 
            // and suffixes when encoding for CBA.
            data: bytes = encode(subtype, subvalue, true)
            data = data.size.leftPadded(toBytes: 2) + data
            encoded_data.append(data)

        return encoded_data

    // If type is not a CBA but has nested CBA, e.g. `bytes[CompactBytesArray][]` 
    // it must be custom encoded first before it is encoded as a part of the outer type
    else if type.hasNestedCBA
        // example: `type` is `bytes[CompactBytesArray][]`
        // `value` is an array of CBAs
        // `func preencode_CBAs` will replace each entry in `value`
        // with encoded CBA and return new version of `value` 
        value = preencode_CBAs(type, value)

        // each `some_type[CompactBytesArray]` from the perspective of Solidity ABI is just `bytes`
        // `func replace_CBAs` will return new type where each `some_type[CompactBytesArray]` is replace
        // with `bytes`.
        // examples:
        //  - `bytes[CompactBytesArray][]` => `bytes[]`
        //  - `(address,uint32,int64,address[])[CompactBytesArray][]` => `bytes[]`
        //  - `(address[CompactBytesArray],uint256)` => `(bytes,uint256)`
        type = replace_CBAs(type)
    else if !type.isComplexType && isSubtypeOfCBA:
        // A complex type is a type that has nested types. 
        // Complex types are: array, tuple, CompactBytesArray.
        // The rest can be directly represented as bytes.
        return convertToBytes(value)
    return web3.eth.abi.encodeParameters(type, value)

Explanation of the overall idea: CBA has a custom encoding format, but the values encoded by

Encoding with a custom case for tuples

We can add a custom case for a tuple that is the direct child of CBA:

func encode(type, value)
    if type is CBA:
        // CBA must have a subtype otherwise even the syntax is not valid
        subtype = type.subtype
        encoded_data: bytes = '0x'
        for subvalue in value:
            if subtype is tuple:
                // subvalue must be an array of arguments
                tuple_data: bytes = '0x'
                tuple_types = subtype.nested_types
                for index in subvalue.indices():        
                    // Also setting the third parameter `isSubtypeOfCBA` as true 
                    // The encoding of non-complex types assumes we do not add prefixes 
                    // and suffixes when encoding for CBA.
                    data: bytes = encode(tuple_types[index], subvalue[index], true)
                    // This line is what makes it different from LSP6 AllowedCalls:
                    // each value in a tuple has it's own size prefix because dynamic types
                    // can also be in a tuple.
                    data = data.size.leftPadded(toBytes: 2) + data
                    tuple_data.append(data)
                
                tuple_data = tuple_data.size.leftPadded(toBytes: 2) + tuple_data
                encoded_data.append(tuple_data)    
            else:
                data: bytes = encode(subtype, subvalue)
                data = data.size.leftPadded(toBytes: 2) + data
                encoded_data.append(data)

        return encoded_data

...

CompactBytesArray of fixed size

Since writing CompactBytesArray between square brackets (writing anything but the number) is already something that's not supported by Solidity as far as I can tell (please, correct me if I'm wrong) we can go further and add an optional attribute to CompactBytesArray separated by a character that is not in this sequence ()[],a-zA-Z0-9. I suggest a colon :.

Example: [CompactBytesArray:10] - a CompactBytesArray of the fixed size of 10 elements.

@YamenMerhi
Copy link
Member

CompactBytesArray

Thanks @JeneaVranceanu for taking the time to write this detailed explanation!
I agree that CBA ValueContent needs to have more explanation and more work and thinking for different types of values, as it's still described in a very basic way. The idea of CBA emerged from having an efficient way to store data without the need to ABI encode them and end up storing more bytes in the storage.

Although one can argue that the probability of someone writing 2D Arrays on the blockchain is minimal, but also it's worth having a general rule that applies to CBA regardless of what and how much will be written.

We're not seeing much traction on this issue, as there are more high-priority discussions happening around features to be implemented in contracts that should not change, but we'll try to keep the discussion going until we reach to a proper solution.

I don't have strong ideas and opinions about the topics mentioned, but it should be discussed whether the CBA format needs to be a general type of encoding or a limited type just for the sake of storing in an efficient way. Because that makes me think of a proposal mentioned by @skimaharvey , to have 1 byte that determines the length of the byte that determines the length of the element. In this way, CBA can have elements that have a very big length in terms of bytes. What made us go with 2 bytes as a fixed length for elements is that with 2 bytes you can store up to 2^16 (65K bytes), which is more than enough for the current blockchain capacity.

If CBA ends up being a general type of encoding, then let's make it as generic as possible and start drafting rules for it. I guess this is where the discussion should start, because discussing supporting very nested types for an encoding scheme that is meant to be defined just for efficiency is a waste of time.

So I would like to discuss here whether it makes sense to make CBA a general type of encoding or a limited one with support to a few elements.

CompactBytesArray of fixed size

I agree currently there is no way to know whether the array is fixed size or not so for instance, address[CBA] we cannot know whether it's fixed size or not. Not sure what is the use case but I agree about adding an optional semi-colon with the number of elements in case of a fixed-size Array.

@JeneaVranceanu
Copy link
Member Author

So I would like to discuss here whether it makes sense to make CBA a general type of encoding or a limited one with support to a few elements.

TL;DR - IMHO, it could be the general type of arrays used off-chain.

Initially, I was thinking that it would make sense to make CBA a default type to use for keys that represent arrays. Still, it is convenient only if those keys are used roughly speaking off-chain because (and please correct me here if I'm wrong) the default arrays are able to be easily decoded from ABI back to representation usable in Solidity. Meaning: the ABI of type address[CompactBytesArray] will have to be decoded in a custom way within a smart contract, while the ABI of type address[] can be easily decoded by using something like abi.decode(...), and I'm not sure how much of a hassle it would be to implement decoding for CBAs in Solidity.

@JeneaVranceanu
Copy link
Member Author

Because that makes me think of a proposal mentioned by @skimaharvey , to have 1 byte that determines the length of the byte that determines the length of the element. In this way, CBA can have elements that have a very big length in terms of bytes.

This is an interesting idea and I think it is worth implementing even though it will be probably rare that someone will try to exceed 0xffff length of an element in the CBA.
Why is it worth implementing: in most cases having 2 separate bytes 1 for the length of the length and 1 for the length of the element will still occupy the same 2 bytes as CBA does now; in cases when an element is greater than 255 bytes in length the length section will occupy 1 byte more; exceeding the next threshold of 65535 is even more difficult and will add only 1 more byte to the length section. I think @skimaharvey's idea is actually very dynamic and is at least as efficient as the current implementation but has almost limitless boundaries.

@CJ42
Copy link
Member

CJ42 commented Mar 9, 2023

@JeneaVranceanu

in most cases having 2 separate bytes 1 for the length of the length and 1 for the length of the element will still occupy the same 2 bytes as CBA does now;

Unless I am misunderstanding, I think @skimaharvey proposal design is different than that.

There would not be precisely 2 bytes anymore.
Instead, there will be:

  • (A) 1 byte for the length of the length. This byte can specify a value between 1 and 32.
  • (B) the length will be N bytes long, where N is between 1 > N > 32.
  • (C) the followed by the value itself.

example 1:

    (A) (B) (C)
      v    v    v
0x 01 0a cafecafecafecafecafe

so for example 1, the full encoded entry would look like this:

0x010acafecafecafecafecafe

example 2:

     (A)   (B)     (C) value is 300 bytes long (0x012c in hex = 300 in decimals)
       v     v         v
0x 02 012C beefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef...

so for example 2, the full encoded entry would look like this:

0x02012Cbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeefbeef

@JeneaVranceanu
Copy link
Member Author

@CJ42

The one difference I see between your description and mine is that sections (A) and (B) are for some reason limited in length in your example.
Could you or maybe @skimaharvey give a link to the proposed design regarding that CBA element length feature?

I do not see, at least it's not obvious to me, why we should limit the length.

What I thought is that we could have the following:

  • (A) 1 byte for the length of the length. This byte can specify a value between 1 and 255.
  • (B) the length will be N bytes long, where N is between 1 > N > (A) * 255.
  • (C) followed by the value itself.

Example:

1 byte, section (A)      0F (or 15) bytes, section (B)     the value, section (C)
      0x0F                       01 ... FF                  abCdE0987 ... 0001

// Quite a long encoded value
0x0F01...FFabCdE0987...0001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants