Feat: support for multiformat represenation in IPLD schema #241

Gozala · 2022-09-13T00:56:48Z

Today there is no good way describe multiformats like https://github.com/multiformats/multihash the best we can do is basically something along the following lines:

type SignedMessage struct {
  payload: Bytes
  -- <varint pub key type code><varint sig size in bytes><actual signature>
  sig: Bytes
}

It would be a lot nicer if we could actually define the structure for sig field there instead of some comment e.g.

type SignedMessage struct {
  payload: Bytes
  sig: Signature
}


type Signature struct {
  alg: Integer
  size: Integer
  out: Bytes
} representation multiformat {
  order ["alg", "size", "digest"]
}

This in practice could be something between the lines of tuple and byteprefix representations where:

All fields MUST have either Bytes or Integer representation.
Integers's are encoded as varints.
Bytes are just appended.

Not only this would help us have better schemas when multiformats are involved, but also provide a better language for describing multiformats than a markdown files with some prose.

The text was updated successfully, but these errors were encountered:

Gozala · 2022-09-13T00:57:04Z

@warpfork would love to see what you think about it

rvagg · 2022-09-13T06:50:02Z

@Gozala can you talk more about the use-cases you have in mind for this?

I'm a little weary about this because of the history with bytesprefix. I originally introduced bytesprefix because it shows up in the Filecoin chain, there's a "Signature" type that's prefixed by a byte (not a varint) that differentiates the signature type for the bytes that follow that initial byte.

When I came to actually implementing something that used it, it turned out that it was just easier and cleaner to do it in the application layer and treat that field as Bytes; although there is now a feature in bindnode that let me do this with an intermediate layer so I was able to write converters to and from the Go types that hold the information: https://github.com/filecoin-project/go-state-types/pull/49/files#diff-8239cfa16621038547161095ccd3f084ae9be51b47b1c8f3f8d91cc96163cf21R77-R104

It's possible that in doing that I'm missing out on some benefits of properly representing them fully in the Node interface and maybe we'll come back to it, but maybe it gets to be an ADL at that future date which would still invalidate the need for bytesprefix. So the fact is that for now bytesprefix just isn't used for anything even though it seemed like it would be helpful at the time.

So it'd be good to avoid adding bespoke representations unless we have solid (and ideally multiple) use-cases for them.

Also the term "multiformat" doesn't necessarily imply a byte representation, there are various string forms of multiformat things too (multibase, multiaddr).

Gozala · 2022-09-13T20:09:03Z

@Gozala can you talk more about the use-cases you have in mind for this?

Primary use case that came up is discussed here ucan-wg/ucan-ipld#4 and also motivated me to open multiformats/multicodec#289

Secondary motivation (although it might outweigh primary one in value) is an ability to define schema for various multiformats in a schema syntax, which I find to be much better than markdown descriptions like one used in multihash repo

Format

<varint hash function code><varint digest size in bytes><hash function output>`

When I came to actually implementing something that used it, it turned out that it was just easier and cleaner to do it in the application layer and treat that field as Bytes; although there is now a feature in bindnode that let me do this with an intermediate layer so I was able to write converters to and from the Go types that hold the information: https://github.com/filecoin-project/go-state-types/pull/49/files#diff-8239cfa16621038547161095ccd3f084ae9be51b47b1c8f3f8d91cc96163cf21R77-R104

This is an interesting context and relevant one because goal here is also to represent signatures. That said I wonder if the problem there was more around IPLD schema stuff than it is with the concept itself.

On our side IPLD schema does not really serves any other purpose than a protocol definition that implementation can claim compatibility with. At the end of the day I imagine us passing bytes as is. That said I don't think passing decoded struct would prove to be less useful. In fact (at least in JS) we use a hybrid approach for this kind of thing where we can represent things as views over some bytes e.g. in this case it would be something like:

class Signature extends Uint8Array {
   get alogorithm() {
      const [code, _] = varint.decode(this)
      Object.defineProperties(this, {
        alogorithm: { value: code }
      })
      return code
   }
   get size() {
      const { buffer, byteOffset, byteLength, algorithm } = this
      const offset = varint.encodingLength(algorithm) + byteOffset
      const [size] = varint.decode(new Uint8Array(buffer, offset, byteLength))
      Object.defineProperties(this, {
        size { value: size }
      })
      return size
   }
   get bytes() {
      const { buffer, byteOffset, byteLength, algorithm, size } = this
      const offset = varint.encodingLength(algorithm) + varint.encodingLength(size) + byteOffset
      const bytes = new Uint8Array(buffer, offset, byteLength)
      Object.defineProperties(this, {
        size { value: bytes }
      })
      return bytes
   }
}

So it'd be good to avoid adding bespoke representations unless we have solid (and ideally multiple) use-cases for them.

I think more broader problem I'm getting at is:

We use multiformats all over the place.
We have no way of defining their format in IPLD schema.

Which implies following:

You just use Bytes as a type for all the multiformats.
You usually add a comment saying it's actually a multiformat e.g. multihash.
You then manually have to validate / decode all such fields.
Often time you also unwrap multiformat te get to actual hash, signature, etc...

Having a first class support for multiformats would help address 1 to 3 above. It could be that no 4 is not always desired but that I would expect you could improve on the tooling side which does actual decode e.g. generate those view structs as shown in the example, which would allow you to use them as ordinary Uint8Array or as a struct.

BigLep · 2022-09-13T22:36:47Z

@Gozala : are you planning to take on this work?

Gozala · 2022-09-14T15:38:36Z

@Gozala : are you planning to take on this work?

I'm not sure what taking this on would entail, but I'm happy to help where I can. I believe there is quite a bit of a toolchain built around IPLD schemas. I'm happy to tackle spec work and some work around js tools, however I don't think I'm qualified to tackle go tools as I'm not familiar with those tools or have much experience with a language.

In other words if someone can help scope it down necessary work, I can speak to what I'll be able to help with.

rvagg · 2022-09-20T03:57:18Z

yeah, ok, I think this seems reasonable and could potentially be leverage for other things

I think the main areas of work to get new features into IPLD schemas, in rough order of importance:

Documentation in this repo—there's both the "docs" and "specs" forms - you'd have to decide how much detail to put in the docs section (maybe use bytesprefix as a precedent for how much to bother, possibly not a whole lot). The specs section is more important to spell out what it's doing.
- schema-schema.ipldsch and schema-schema.ipldsch.json are the two critical pieces for specs. That's the canonical go-to reference for this stuff. Note there's a lot of docs in there, they're fairly important. The .json is ideally compiled from the .ipldsch, but for new features that's a chicken-and-egg thing (I'll often add the feature to a parser first just so I can do an automatic compile, but that's up to you as long as it's going to end up the same in the end).
- There's also a TypeScript form of schema-schema that will need updating but that might be interesting for you to get your head around what's going on in these
Fixtures in this repo, see https://github.com/ipld/ipld/tree/master/specs/schemas/tests - include as many files as you think are necessary to cover the feature set (there's not enough fixtures in here for what we already have, it's a start but ideally we'd be adding more to get better coverage). Each file should have a schema and expected section. You'll note some have blocks, which might be interesting for documenting, but are currently not used anywhere for testing.
go-ipld-prime implementation - not critical, we still have a bunch of unimplemented features here but it is nice to have descriptive panic()s with TODO in them to point people to the need to do some implementaiton work.
- parser in https://github.com/ipld/go-ipld-prime/tree/master/schema/dsl
- DMT in https://github.com/ipld/go-ipld-prime/tree/master/schema/dmt
js-ipld-schema implementaiton - similar to the Go one, but we have more of the feature set implemented in the parser & printer than in Go at least, and this shouldn't be too hard for you to figure out (although the pegjs definition is a little funky tbqh).
- from-dsl (parser) is the most important to implement
- to-dsl (printer) is nice to have so you can round-trip (I also think the tests require round-tripping?)
- typed (validator / transformer) is less critical, but maybe helpful for what you have in mind for this? up to you, there's a couple of features not implemented in there yet (like stringjoin) so not a big deal if nobody's going to use it.

Gozala mentioned this issue Sep 13, 2022

Capturing signing algorithm ucan-wg/ucan-ipld#4

Closed

BigLep added help wanted P3 labels Sep 13, 2022

Gozala mentioned this issue Dec 2, 2022

First Version ChainAgnostic/varsig#6

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: support for multiformat represenation in IPLD schema #241

Feat: support for multiformat represenation in IPLD schema #241

Gozala commented Sep 13, 2022

Gozala commented Sep 13, 2022

rvagg commented Sep 13, 2022

Gozala commented Sep 13, 2022

Format

BigLep commented Sep 13, 2022

Gozala commented Sep 14, 2022

rvagg commented Sep 20, 2022

Feat: support for multiformat represenation in IPLD schema #241

Feat: support for multiformat represenation in IPLD schema #241

Comments

Gozala commented Sep 13, 2022

Gozala commented Sep 13, 2022

rvagg commented Sep 13, 2022

Gozala commented Sep 13, 2022

Format

BigLep commented Sep 13, 2022

Gozala commented Sep 14, 2022

rvagg commented Sep 20, 2022