Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow extra fields in tuple consensus deserialization #307

Open
obycode opened this issue Jan 25, 2024 · 6 comments
Open

Allow extra fields in tuple consensus deserialization #307

obycode opened this issue Jan 25, 2024 · 6 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@obycode
Copy link
Collaborator

obycode commented Jan 25, 2024

To match interpreter behavior, from-consensus-buff? needs to allow unused fields in tuples. For example, the consensus serialization of {n: 42, extra: u32} is 0x0c000000020565787472610100000000000000000000000000000020016e000000000000000000000000000000002a.

We would expect the following expression to return none, since it contains extra fields, but it needs to instead return (some {n: 42}).

(from-consensus-buff? {n: int} 0x0c000000020565787472610100000000000000000000000000000020016e000000000000000000000000000000002a)
@obycode obycode added the bug Something isn't working label Jan 25, 2024
@smcclellan smcclellan added this to the WASM Phase 1 milestone Mar 12, 2024
@Acaccia Acaccia self-assigned this Mar 18, 2024
@Acaccia
Copy link
Collaborator

Acaccia commented Mar 18, 2024

I will have a try at this, it looks fun.

@Acaccia
Copy link
Collaborator

Acaccia commented Apr 18, 2024

Since this task is taking a lot of time, I will do a write-up about it.

This task is way bigger than adding just a "extra fields", because we also need to add those capabilities at the same time:

  • tuple fields can arrive in any order
  • tuple fields cannot be repeated
  • all fields defined by the type must be present

To fix this, I went with the deserialization needing a bitset so that we know that we have all fields and they don't repeat each other. For the fields in any order, we just have locals to handle all the values needed for the all the tuples fields, and there is a "switch-case-like" structure to know which ones to fill depending on the parsed clarity-name key.

This was the easy part.

Now the problem is the extra fields. Those are not available in the expression type, so we have to be ready to parse for anything:

  • we have to make sure that the key is a valid clarity name
  • we have to skip the adequate number of bytes for the unknown type of the value

Validating the name is easy, the original regex that validates the names is easy to translate to Wat.

For the second part, I went with a simple recursive algorithm that goes through the bytes and depending on the value of the first byte, skip the adequate number of following bytes. It's recursive due to the fact that we have composite types such as lists or tuples.

Here is now a new issue: let's imagine that we have an extra field whose type is a list of tuples. Since it is an extra field, we do not know its type. Also, due to clarity restrictions, the list must be homogeneous. However, the serialization allows extra fields for the tuples. How should we make sure that the tuples all are of the same type if we are allowed to not consider all the fields?

@obycode
Copy link
Collaborator Author

obycode commented Apr 18, 2024

Ah, I didn't realize this was going to be so complex. We do have the option of just changing this behavior beginning in Clarity 3, though that would prevent us from using the clarity-wasm runtime to boot from genesis.

  • tuple fields can arrive in any order

I didn't realize that this was allowed 😮

  • tuple fields cannot be repeated
  • all fields defined by the type must be present

This should have already been the case, no?

I'm not sure that I follow exactly what you are asking. In the serialized buffer, we do have the type. It is encoded as part of the serialization. Could you maybe expand and provide an example?

@Acaccia
Copy link
Collaborator

Acaccia commented Apr 18, 2024

I didn't realize that this was allowed 😮

Me neither at first, and this is so annoying! But this is solved.

This should have already been the case, no?

Yup, but to achieve this, we relied on the fact that the serialized fields would arrive in the alphabetical order of the keys. But anyway, this is solved.

I'm not sure that I follow exactly what you are asking. In the serialized buffer, we do have the type. It is encoded as part of the serialization. Could you maybe expand and provide an example?

Yeah sorry, I should have put an example directly.

When you use from-consensus-buff?, the first element should be the expected type of the deserialization. So for example, from-consensus-buff? {a: int} expects as an argument a serialized buffer which is a tuple that contains at least the key a with an int value, but it could contain multiple extra keys, which should contain a valid clarity name for the key and a valid value for the value.
So let's imagine the notation where what is between <...> is a serialized value.
With our example, all of those serialized buffers could work:

  • < {a: 42} >
  • < {a: 42, extra: 0xdeadbeef} > where we don't know the type of extra but we can infer it
  • < {a: 42, extra1: "hello", extra2: u"world"} where we don't know the types of extra1 and extra2 and we can infer them
  • ...

List being homegenous, we should check that each elements are of the same type that we don't know. This is a bit difficult, but doable.

  • <{ a: 42, extra: [1, 2, 3]} > we don't know the type of extra when we deserialize, but we can infer it, and it works because all elements are int
  • < {a: 42, extra: [1, "hello"]} > we don't know the type of extra again, but we can understand that both elements have different types and the deserialization should fail (even if extra is not part of the result)

But now, what about lists of tuples?
In this example, < {a: 42, extra: [{x: "hello"}, {x: "world"}]} >, we have list of tuples with one key x and a string value, no problem.

But now, let's imagine we have < {a: 42, extra: [{x: "hello"}, {x: "world", y: u1}] >. In this case, we have a list of tuples, and the second tuple has an extra field y. However, this is a serialized tuple, so y could be ignored, and the type of extra could be a list of tuple with one key x and a string value.

Is this behavior documented anywhere? What should we do in this case? Should we ignore y because serialized tuples can have extra fields?

@obycode
Copy link
Collaborator Author

obycode commented Apr 18, 2024

Is this behavior documented anywhere?

No, I don't think it is.

In this case, I believe the type is decided by the first element in the list. This is true even outside of [de]serialization. For example, (list {x: 1} {x:2, y:3}) is valid, and evaluates as (list {x:1} {x:2}). However if you switch the order, this is no longer valid: (list {x:2, y:3} {x: 1}). I think the same will be true for deserialization -- you can determine the tuple type from the first element in the list.

Again, we can also choose to disallow these extra fields in Clarity 3.

@Acaccia
Copy link
Collaborator

Acaccia commented Apr 18, 2024

Again, we can also choose to disallow these extra fields in Clarity 3.

I do not have issues with extra fields myself, and the feature should be done in a few days. But please keep me posted if it's removed later.

UPDATE: no actually I lie, I have an issue: I think this feature is extremely difficult to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Status: 💻 In Progress
Development

No branches or pull requests

3 participants