Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: support for multiformat represenation in IPLD schema #241

Open
Gozala opened this issue Sep 13, 2022 · 6 comments
Open

Feat: support for multiformat represenation in IPLD schema #241

Gozala opened this issue Sep 13, 2022 · 6 comments

Comments

@Gozala
Copy link

Gozala commented Sep 13, 2022

Today there is no good way describe multiformats like https://github.com/multiformats/multihash the best we can do is basically something along the following lines:

type SignedMessage struct {
  payload: Bytes
  -- <varint pub key type code><varint sig size in bytes><actual signature>
  sig: Bytes
}

It would be a lot nicer if we could actually define the structure for sig field there instead of some comment e.g.

type SignedMessage struct {
  payload: Bytes
  sig: Signature
}


type Signature struct {
  alg: Integer
  size: Integer
  out: Bytes
} representation multiformat {
  order ["alg", "size", "digest"]
}

This in practice could be something between the lines of tuple and byteprefix representations where:

  1. All fields MUST have either Bytes or Integer representation.
  2. Integers's are encoded as varints.
  3. Bytes are just appended.

Not only this would help us have better schemas when multiformats are involved, but also provide a better language for describing multiformats than a markdown files with some prose.

@Gozala
Copy link
Author

Gozala commented Sep 13, 2022

@warpfork would love to see what you think about it

@rvagg
Copy link
Member

rvagg commented Sep 13, 2022

@Gozala can you talk more about the use-cases you have in mind for this?

I'm a little weary about this because of the history with bytesprefix. I originally introduced bytesprefix because it shows up in the Filecoin chain, there's a "Signature" type that's prefixed by a byte (not a varint) that differentiates the signature type for the bytes that follow that initial byte.

When I came to actually implementing something that used it, it turned out that it was just easier and cleaner to do it in the application layer and treat that field as Bytes; although there is now a feature in bindnode that let me do this with an intermediate layer so I was able to write converters to and from the Go types that hold the information: https://github.com/filecoin-project/go-state-types/pull/49/files#diff-8239cfa16621038547161095ccd3f084ae9be51b47b1c8f3f8d91cc96163cf21R77-R104

It's possible that in doing that I'm missing out on some benefits of properly representing them fully in the Node interface and maybe we'll come back to it, but maybe it gets to be an ADL at that future date which would still invalidate the need for bytesprefix. So the fact is that for now bytesprefix just isn't used for anything even though it seemed like it would be helpful at the time.

So it'd be good to avoid adding bespoke representations unless we have solid (and ideally multiple) use-cases for them.

Also the term "multiformat" doesn't necessarily imply a byte representation, there are various string forms of multiformat things too (multibase, multiaddr).

@Gozala
Copy link
Author

Gozala commented Sep 13, 2022

@Gozala can you talk more about the use-cases you have in mind for this?

Primary use case that came up is discussed here ucan-wg/ucan-ipld#4 and also motivated me to open multiformats/multicodec#289

Secondary motivation (although it might outweigh primary one in value) is an ability to define schema for various multiformats in a schema syntax, which I find to be much better than markdown descriptions like one used in multihash repo

Format

<varint hash function code><varint digest size in bytes><hash function output>`

When I came to actually implementing something that used it, it turned out that it was just easier and cleaner to do it in the application layer and treat that field as Bytes; although there is now a feature in bindnode that let me do this with an intermediate layer so I was able to write converters to and from the Go types that hold the information: https://github.com/filecoin-project/go-state-types/pull/49/files#diff-8239cfa16621038547161095ccd3f084ae9be51b47b1c8f3f8d91cc96163cf21R77-R104

This is an interesting context and relevant one because goal here is also to represent signatures. That said I wonder if the problem there was more around IPLD schema stuff than it is with the concept itself.

On our side IPLD schema does not really serves any other purpose than a protocol definition that implementation can claim compatibility with. At the end of the day I imagine us passing bytes as is. That said I don't think passing decoded struct would prove to be less useful. In fact (at least in JS) we use a hybrid approach for this kind of thing where we can represent things as views over some bytes e.g. in this case it would be something like:

class Signature extends Uint8Array {
   get alogorithm() {
      const [code, _] = varint.decode(this)
      Object.defineProperties(this, {
        alogorithm: { value: code }
      })
      return code
   }
   get size() {
      const { buffer, byteOffset, byteLength, algorithm } = this
      const offset = varint.encodingLength(algorithm) + byteOffset
      const [size] = varint.decode(new Uint8Array(buffer, offset, byteLength))
      Object.defineProperties(this, {
        size { value: size }
      })
      return size
   }
   get bytes() {
      const { buffer, byteOffset, byteLength, algorithm, size } = this
      const offset = varint.encodingLength(algorithm) + varint.encodingLength(size) + byteOffset
      const bytes = new Uint8Array(buffer, offset, byteLength)
      Object.defineProperties(this, {
        size { value: bytes }
      })
      return bytes
   }
}

So it'd be good to avoid adding bespoke representations unless we have solid (and ideally multiple) use-cases for them.

I think more broader problem I'm getting at is:

  1. We use multiformats all over the place.
  2. We have no way of defining their format in IPLD schema.

Which implies following:

  1. You just use Bytes as a type for all the multiformats.
  2. You usually add a comment saying it's actually a multiformat e.g. multihash.
  3. You then manually have to validate / decode all such fields.
  4. Often time you also unwrap multiformat te get to actual hash, signature, etc...

Having a first class support for multiformats would help address 1 to 3 above. It could be that no 4 is not always desired but that I would expect you could improve on the tooling side which does actual decode e.g. generate those view structs as shown in the example, which would allow you to use them as ordinary Uint8Array or as a struct.

@BigLep
Copy link
Contributor

BigLep commented Sep 13, 2022

@Gozala : are you planning to take on this work?

@Gozala
Copy link
Author

Gozala commented Sep 14, 2022

@Gozala : are you planning to take on this work?

I'm not sure what taking this on would entail, but I'm happy to help where I can. I believe there is quite a bit of a toolchain built around IPLD schemas. I'm happy to tackle spec work and some work around js tools, however I don't think I'm qualified to tackle go tools as I'm not familiar with those tools or have much experience with a language.

In other words if someone can help scope it down necessary work, I can speak to what I'll be able to help with.

@rvagg
Copy link
Member

rvagg commented Sep 20, 2022

yeah, ok, I think this seems reasonable and could potentially be leverage for other things

I think the main areas of work to get new features into IPLD schemas, in rough order of importance:

  1. Documentation in this repo—there's both the "docs" and "specs" forms - you'd have to decide how much detail to put in the docs section (maybe use bytesprefix as a precedent for how much to bother, possibly not a whole lot). The specs section is more important to spell out what it's doing.
    • schema-schema.ipldsch and schema-schema.ipldsch.json are the two critical pieces for specs. That's the canonical go-to reference for this stuff. Note there's a lot of docs in there, they're fairly important. The .json is ideally compiled from the .ipldsch, but for new features that's a chicken-and-egg thing (I'll often add the feature to a parser first just so I can do an automatic compile, but that's up to you as long as it's going to end up the same in the end).
    • There's also a TypeScript form of schema-schema that will need updating but that might be interesting for you to get your head around what's going on in these
  2. Fixtures in this repo, see https://github.com/ipld/ipld/tree/master/specs/schemas/tests - include as many files as you think are necessary to cover the feature set (there's not enough fixtures in here for what we already have, it's a start but ideally we'd be adding more to get better coverage). Each file should have a schema and expected section. You'll note some have blocks, which might be interesting for documenting, but are currently not used anywhere for testing.
  3. go-ipld-prime implementation - not critical, we still have a bunch of unimplemented features here but it is nice to have descriptive panic()s with TODO in them to point people to the need to do some implementaiton work.
  4. js-ipld-schema implementaiton - similar to the Go one, but we have more of the feature set implemented in the parser & printer than in Go at least, and this shouldn't be too hard for you to figure out (although the pegjs definition is a little funky tbqh).
    • from-dsl (parser) is the most important to implement
    • to-dsl (printer) is nice to have so you can round-trip (I also think the tests require round-tripping?)
    • typed (validator / transformer) is less critical, but maybe helpful for what you have in mind for this? up to you, there's a couple of features not implemented in there yet (like stringjoin) so not a big deal if nobody's going to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants