Allow extra fields in tuple consensus deserialization #307

obycode · 2024-01-25T02:17:25Z

To match interpreter behavior, from-consensus-buff? needs to allow unused fields in tuples. For example, the consensus serialization of {n: 42, extra: u32} is 0x0c000000020565787472610100000000000000000000000000000020016e000000000000000000000000000000002a.

We would expect the following expression to return none, since it contains extra fields, but it needs to instead return (some {n: 42}).

(from-consensus-buff? {n: int} 0x0c000000020565787472610100000000000000000000000000000020016e000000000000000000000000000000002a)

The text was updated successfully, but these errors were encountered:

Acaccia · 2024-03-18T13:07:24Z

I will have a try at this, it looks fun.

Acaccia · 2024-04-18T15:02:31Z

Since this task is taking a lot of time, I will do a write-up about it.

This task is way bigger than adding just a "extra fields", because we also need to add those capabilities at the same time:

tuple fields can arrive in any order
tuple fields cannot be repeated
all fields defined by the type must be present

To fix this, I went with the deserialization needing a bitset so that we know that we have all fields and they don't repeat each other. For the fields in any order, we just have locals to handle all the values needed for the all the tuples fields, and there is a "switch-case-like" structure to know which ones to fill depending on the parsed clarity-name key.

This was the easy part.

Now the problem is the extra fields. Those are not available in the expression type, so we have to be ready to parse for anything:

we have to make sure that the key is a valid clarity name
we have to skip the adequate number of bytes for the unknown type of the value

Validating the name is easy, the original regex that validates the names is easy to translate to Wat.

For the second part, I went with a simple recursive algorithm that goes through the bytes and depending on the value of the first byte, skip the adequate number of following bytes. It's recursive due to the fact that we have composite types such as lists or tuples.

Here is now a new issue: let's imagine that we have an extra field whose type is a list of tuples. Since it is an extra field, we do not know its type. Also, due to clarity restrictions, the list must be homogeneous. However, the serialization allows extra fields for the tuples. How should we make sure that the tuples all are of the same type if we are allowed to not consider all the fields?

obycode · 2024-04-18T16:01:00Z

Ah, I didn't realize this was going to be so complex. We do have the option of just changing this behavior beginning in Clarity 3, though that would prevent us from using the clarity-wasm runtime to boot from genesis.

tuple fields can arrive in any order

I didn't realize that this was allowed 😮

tuple fields cannot be repeated

all fields defined by the type must be present

This should have already been the case, no?

I'm not sure that I follow exactly what you are asking. In the serialized buffer, we do have the type. It is encoded as part of the serialization. Could you maybe expand and provide an example?

Acaccia · 2024-04-18T16:43:52Z

I didn't realize that this was allowed 😮

Me neither at first, and this is so annoying! But this is solved.

This should have already been the case, no?

Yup, but to achieve this, we relied on the fact that the serialized fields would arrive in the alphabetical order of the keys. But anyway, this is solved.

I'm not sure that I follow exactly what you are asking. In the serialized buffer, we do have the type. It is encoded as part of the serialization. Could you maybe expand and provide an example?

Yeah sorry, I should have put an example directly.

When you use from-consensus-buff?, the first element should be the expected type of the deserialization. So for example, from-consensus-buff? {a: int} expects as an argument a serialized buffer which is a tuple that contains at least the key a with an int value, but it could contain multiple extra keys, which should contain a valid clarity name for the key and a valid value for the value.
So let's imagine the notation where what is between <...> is a serialized value.
With our example, all of those serialized buffers could work:

< {a: 42} >
< {a: 42, extra: 0xdeadbeef} > where we don't know the type of extra but we can infer it
< {a: 42, extra1: "hello", extra2: u"world"} where we don't know the types of extra1 and extra2 and we can infer them
...

List being homegenous, we should check that each elements are of the same type that we don't know. This is a bit difficult, but doable.

<{ a: 42, extra: [1, 2, 3]} > we don't know the type of extra when we deserialize, but we can infer it, and it works because all elements are int
< {a: 42, extra: [1, "hello"]} > we don't know the type of extra again, but we can understand that both elements have different types and the deserialization should fail (even if extra is not part of the result)

But now, what about lists of tuples?
In this example, < {a: 42, extra: [{x: "hello"}, {x: "world"}]} >, we have list of tuples with one key x and a string value, no problem.

But now, let's imagine we have < {a: 42, extra: [{x: "hello"}, {x: "world", y: u1}] >. In this case, we have a list of tuples, and the second tuple has an extra field y. However, this is a serialized tuple, so y could be ignored, and the type of extra could be a list of tuple with one key x and a string value.

Is this behavior documented anywhere? What should we do in this case? Should we ignore y because serialized tuples can have extra fields?

obycode · 2024-04-18T17:18:26Z

Is this behavior documented anywhere?

No, I don't think it is.

In this case, I believe the type is decided by the first element in the list. This is true even outside of [de]serialization. For example, (list {x: 1} {x:2, y:3}) is valid, and evaluates as (list {x:1} {x:2}). However if you switch the order, this is no longer valid: (list {x:2, y:3} {x: 1}). I think the same will be true for deserialization -- you can determine the tuple type from the first element in the list.

Again, we can also choose to disallow these extra fields in Clarity 3.

Acaccia · 2024-04-18T17:41:29Z

Again, we can also choose to disallow these extra fields in Clarity 3.

I do not have issues with extra fields myself, and the feature should be done in a few days. But please keep me posted if it's removed later.

UPDATE: no actually I lie, I have an issue: I think this feature is extremely difficult to test.

obycode added the bug Something isn't working label Jan 25, 2024

obycode added a commit that referenced this issue Jan 25, 2024

test: ignore from_consensus_buff_tuple_extra_pair until #307 is fixed

594e4ca

smcclellan added this to the WASM Phase 1 milestone Mar 12, 2024

Acaccia self-assigned this Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow extra fields in tuple consensus deserialization #307

Allow extra fields in tuple consensus deserialization #307

obycode commented Jan 25, 2024

Acaccia commented Mar 18, 2024

Acaccia commented Apr 18, 2024

obycode commented Apr 18, 2024 •

edited

Acaccia commented Apr 18, 2024

obycode commented Apr 18, 2024

Acaccia commented Apr 18, 2024 •

edited

Allow extra fields in tuple consensus deserialization #307

Allow extra fields in tuple consensus deserialization #307

Comments

obycode commented Jan 25, 2024

Acaccia commented Mar 18, 2024

Acaccia commented Apr 18, 2024

obycode commented Apr 18, 2024 • edited

Acaccia commented Apr 18, 2024

obycode commented Apr 18, 2024

Acaccia commented Apr 18, 2024 • edited

obycode commented Apr 18, 2024 •

edited

Acaccia commented Apr 18, 2024 •

edited