Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schemas importing other schemas #243

Open
RangerMauve opened this issue Sep 14, 2022 · 10 comments
Open

Schemas importing other schemas #243

RangerMauve opened this issue Sep 14, 2022 · 10 comments
Assignees

Comments

@RangerMauve
Copy link
Contributor

Looking at schemas and integrating dynamic loading of schemas, some stuff has stuck out to me.

  1. Schemas sizes are limited by Block size restrictions which relates to encoding formats
  2. Schemas don't have a way to extend / import types from other schemas and require hard-importing to have reuse.

I think these things could be addressed by adding the ability to import schema types from another schema using a CID.

The syntax could look something like the ESM imports API:

# Imports all the types in the schema
import "ipld://CID_HERE/"
import {Example1, Example2 as SomethingElse} from "ipld://CID_HERE"

The second line could show how we can rename a type when importing it to avoid conflicts, or import just a subset of types.

I think this will be really useful for future integration with other data ecosystems like schema.org

I'm down to work on speccing this out and adding something to the JS side of things.

This does mean that schema validators would end up depending on IPLD URLs and LinkSystems. 😅

cc @rvagg @warpfork @Gozala What do y'all think?

@Gozala
Copy link

Gozala commented Sep 14, 2022

I would personally prefer not to introduce dependence on URLs and instead embrace IPLD native links instead, e.g syntax for the bringing defs from other schema may look like regular type def / alias

type {
  Foo
  Beep as Bar
} from bafy...hash

@rvagg
Copy link
Member

rvagg commented Sep 20, 2022

type { Foo Beep as Bar } from

is really just import { Foo, Beep as Bar } from ... but less clear about what's going on I think? First there's ambiguity about what comes after the type token since we now branch between a type name and a {, the from is in the same place as representation and then inside the type there's the new as thing going on too. I think I'd prefer to bundle all of this new syntax into an entirely new construct and import seems like a decent pattern, so there's no confusion or weird branching at each of these points through the tokens.

But yeah, just CIDs would be good, no need for URLs, especially if we get CIDv2 with its additional funky possibilities for use here.

There is a bit of weirdness though - the link should point to the DMT form of a schema, but you're going to be mostly representing it inside a DSL, presumably to be compiled into a DMT. I wonder then what a workflow would look like for a set of linked schemas? Perhaps this is a non-problem where you're building on an existing schema that's already "shipped", but perhaps you have a whole set of complex schemas you want to join together — like the ones Vulcanize did for the ETH chain structure @ https://ipld.io/specs/codecs/dag-eth/. What would be the process to get from the DSL to the DMT for these? Would we need a placeholder prior to some kind of "compile" step, or just accept the fact that you need to jump through some hoops to compile the schemas yourself in dependent order?

@RangerMauve
Copy link
Contributor Author

Cool, I agree that a new keyword would be useful for encapsulating this new functionality.

Also, agreed about not involving URLs in this. 😁

It does feel like being able to use import "./example.ipldschema" in the DLS in addition to the CID would be useful in some sort of preprocessor step.

We could have the the DMT of the dependency imported along side, and the import statement rewritten to point at the CID instead of the DMT.

This would result in the DMT/DSL specs diverging a bit more, but I don't think it'd be the end of the world given we already diverge for stuff like comments.

@rvagg
Copy link
Member

rvagg commented Sep 21, 2022

In that case we'd need a rule like "DMT refers to imports strictly by CID, but DSL can refer by url-ish as long as the toolchain used to create the DMT supports it".

Thinking through some use-cases here one obvious one is Bindnode on the Go side where we have a higher-level abstraction that allows you to register a type using a schema DSL and it'll do the compiling for you, e.g. https://github.com/filecoin-project/go-fil-markets/blob/727a2b14a263ebfaead1a1bacd56e1149234d549/retrievalmarket/types.go#L543

We could also come up with at "SchemaLoader" interface that you could pass in as one of the options to this thing that would allow you to both load a schema by CID or by URL/path and have that fed into the compile process.

So would it be better for the DSL to use CIDs or actual URLs, including file URLs? Or only CIDs or file paths? It's getting pretty complicated if we go with URLs but I can imagine that being quite handy.

With that example above, we're already pushing to have some of these components spread across repos, specifically in filecoin-project/go-state-types#49 where there are generic Filecoin types which we'd be pulling in to various places where we use Bindnode. So you might end up with schemas that want to pull in pieces from other repos. Of course, having a fixed CID for that would be nice, but where do we record the CID for the version we want? Perhaps it'd be more useful to be able to refer to a github raw url to the version (commit) you want and have the toolchain compile and work out that CID for you.

Of course then there's questions of what you're doing with these multiple things when you do compile them, at least with a DSL->DMT transformation we just hand you a single Node (i.e. root of the single DMT block), now we'd need to get a block interface involved and hand you a root CID to a graph.

@BigLep
Copy link
Contributor

BigLep commented Oct 4, 2022

2022-10-04 triage conversation: @RangerMauve will turn this into an exploration report.

@Gozala
Copy link

Gozala commented Oct 5, 2022

@RangerMauve I think it would be a great idea to also evaluate idea from unison language, which uses hash based, as opposed to named, referencing which in turn eliminates need for imports and seems to be natural feet for hash linked data.

@jaredly also have written whole new language which, among other things, explores this idea in typescript. This is something I always wish had time explore, but never got around to so I thought I'd surface it here.

Conceptually idea is pretty simple, you ignore type names & instead replace those with CID of the definition and swap all references by that name with a CID. Locally you could built up db of name -> CID mappings e.g from local source (which could include be included through git submodules, package managers or whatever tool just needs to know what to index).

If I'm not mistaken both unison and jerd use field order to determine naming which is probably not great from the schema extensibility perspective (e.g. if I add a new field at the top that would make new type incompatible with an old one), but then again maybe that is ok and users just need a heads up or maybe it makes more sense to retain field names here

@jaredly
Copy link

jaredly commented Oct 5, 2022

fwiw I'm rewriting jerd to use field names instead of field order (I determined that names have important semantic value that I don't want to discard)

@RangerMauve
Copy link
Contributor Author

Hmm. Regarding unison and jerd, these are whole new schema languages and it feels like a separate set of constraints to what IPLD Schemas already do.

IPLD Schemas have the DMT format and a way to hash data in that format already so it seems more straightforward to link to that than reevaluate how schemas should work.

I'll defs read up on those for inspo though.

Gonna work on an exploration report tonight to dream up the UX and talk about tradeoffs / caveats.

@RangerMauve
Copy link
Contributor Author

One thing that might be useful is to avoid import * functionality and opt for requiring types to be implicitly imported by name.

This would make some of the preprocessor steps more simple in that one could check if the types in the imports are being used in the current schema before bothering to check if they exist in the remote schema, and having a quick way to verify that they exist in the remote schema.

Otherwise we might run into weird cases where types cause conflicts over time.

@rvagg
Copy link
Member

rvagg commented Oct 18, 2022

Thinking through the mechanics of this, it might be good to namespace types such that they have to be explicitly brought in to scope and not just imported by name and copied. e.g. type Foo in schema bafyaaaaa might be brought into scope with an import like import Foo from bafyaaaaa as Foo - the as being a required, otherwise Foo remains a property of bafyaaaaa. That means, that if Foo is dependent on type Bar and Baz inside bafyaaaaa, those two additional types are hidden because we haven't explicitly imported them. Foo refers to them, but they don't pollute the local scope because we haven't asked for them. Naming the import means we can also easily rename it, so if we have a local Foo we can import Foo from bafyaaaaa as FooImported and it'll be known by this new name in the local scope.

Internally, when building a DMT to represent all of this, this could be represented with prepended CIDs, so as soon as you refer to some other Schema, that whole schema (or perhaps just the tree we care about, that probably wouldn't be too hard) is brought in and the types become bafyaaaaa$Foo, bafyaaaaa$Bar, bafyaaaaa$Baz etc. until you start naming with them your import statements at which point they get unprefixed names.

One might opt to import all of the types from a schema, but you have to do so explicitly; and we could defer any * semantics until we decide they really are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🥞 Todo
Development

No branches or pull requests

5 participants