Skip to content
This repository has been archived by the owner on Aug 17, 2022. It is now read-only.

Motivation for banning intertype recursion is not 100% clear #137

Open
matklad opened this issue Jun 23, 2021 · 12 comments
Open

Motivation for banning intertype recursion is not 100% clear #137

matklad opened this issue Jun 23, 2021 · 12 comments

Comments

@matklad
Copy link

matklad commented Jun 23, 2021

Mostly drive by observation, but the explainer currently says:

intertype type definitions are required to be acyclic. This restriction follows from the fact that interface types are meant to be used at inter-language boundaries to copy values and matches the restrictions of other IPC/RPC schema languages.

I am not sure it’s fair to say that the restriction (logically) follows from IPC requirement: there are IPC/RPC systems which are fine with recursive type. To give one example from a domain I am familiar with, DocumetSymbol from language server protocol is sent over the wire and is recursive. Apparently, protobuf also allows recursive types: https://stackoverflow.com/questions/5945136/self-referencing-fields-in-protobuf-messages

@fgmccabe
Copy link
Contributor

Personally, I am not very familiar with LSP. Generally, just because some IPC schema languages support recursive types (it is obviously possible to support them) does not mean that we should support recursive types.
Our main inspiration came from WebIDL which does not support recursive types.
I think that a primary driver here is that Interface Types are structural. This, in turn, is motivated by a combination of the type proposal in the Wasm GC, the WebIDL type system, and the expected use cases for Interface Types.
Defining structural recursive types is possible but nasty™.
That leaves the expected use cases; which revolve around interchange across limited trust boundaries. Here a key choice is whether we want to allow user-defined types to be communicated – as Protobuf does, for example. We felt that that was inconsistent with the idea of communicating data values. (There is a wrinkle here: we do expect to be using nominal types to model communicating resource identifiers. However, we plan for those to be inherently opaque.)
Example: we do not allow something like a hash map to be communicated as a value in an IT signature. You can have a hash map based resource, but access to that resource will be via additional functions (with their own IT signatures) that allow you to get/put entries on the map. You can, of course, communicate the contents of a hash map – as a list of key/value pairs.

@theduke
Copy link

theduke commented Dec 15, 2021

I can appreciate the technical difficulties involved and the added complexity of generating efficient conversion code.

But I'd like to argue for and urge a reconsideration.

Recursive types are really common.

Supporting them is important for IT to be useful outside of low level domains like WASI, and to fulfill the promise of seamless language interoperability.

If they can't be represented, a lot of code will skip defining any IT types and just pass around byte slices with a custom encoding (Json, Proto/flatbuf, ...) instead, because it doesn't make sense to have 2/3 of your API defined as native interface types and the rest as serialized byte slices.

Especially because a lot of the value would come from auto-generating language bindings based on just the IT definitions. So if you need a generator like Protobuf or a json-schema anyway, you'll just keep everything in there, except possibly function and resource definitions.

Replacing recursive types with resource handles would be possible, but would kill performance (due to the reasons outlined in the explainer), and would introduce a lot of friction, since you now need to write custom glue code to map back to a sane representation.

Consider that even something basic like a JSON-compatible Value type requires recursion.

variant Value {
    Null,
    Number,
    String,
    Array(list<Value>),
    Object(map<string, Value>), 
    // with map<K, V> = list< tuple<K, V> >
}

Representing that with resources doesn't seem feasible (to the point that you'd be much better off just serializing to JSON).

As a motivating example: today I tried to sketch out the typings for a plugin system with wit-bindgen. After an hour I had about 5 instances of recursive types, which is incidentally why I noticed that IT doesn't currently plan for recursion. Some of those could be worked around by changing the representation (eg flattening, arenas), but that would be intrusive and require custom glue code for each language again.

I do believe that IT has the potential to enabling a truly interoperable ecosystem where you can seamlessly share code between languages. But for that to be realized common basic data structures need to be representable.

@fgmccabe
Copy link
Contributor

There may be some additional misconceptions here. We do not just 'ban' recursive types. In the base Interface Types language, it is not possible to define any new types. This is justified by a combination of factors: not permitting any user defined types maximizes the potential for interoperability (at some cost of course) and the domain is one of limited trust crossing language boundaries so having a small type language makes that easier to manage.
Having said all that, IMO, the resource story is very important and it does seem to change the calculus somewhat.

@lukewagner
Copy link
Member

Just expanding on that last sentence of @fgmccabe's a bit: one way to express recursive values would be to use resources and handles. They aren't in the IT explainer yet, but prototype support is in wit-bindgen and a rough sketch is described here. Resources are also have the advantage of being able to describe more than just trees.

@theduke
Copy link

theduke commented Dec 15, 2021

@lukewagner

one way to express recursive values would be to use resources and handles

I did cover that above.

Resources/handles are awesome for certain use cases where the data actually represents resources with associated actions or where you only want partial access. But they are basically a hack for cases where you just want to transfer data.

  • When performance is important, that approach is not feasible because it requires way too many intermediate calls and allocations to fully transfer types between the native representations

  • If performance isn't critical, in many contexts you still lose the ability to auto-generate language bindings that feel native, because the types would usually just be nested structs/sum types/classes/objects/arrays. So you end up writing custom glue code (or a custom glue generator) for each language to provide a decent API, negating most of the usefulness of IT. Because you may as well just dump to JSON or use a schema-based system like Proto/flatbuf and not bother with intermediate/partial IT representations but just copy a byte slice and deserialize instead.

@theduke
Copy link

theduke commented Dec 15, 2021

Or said another way: resources/handles are great for sharing behaviour, but a very limiting solution for exchanging data.

@theduke
Copy link

theduke commented Dec 15, 2021

@fgmccabe

Just to be clear, I would not expect interface types to have any meaning outside of adapters.

The "only" additional requirement would be allowing the definition of recursive structures.
I can see how that is an extension to the capabilities of the type system.

But I don't see how this causes a burden on interoperability.
Lists of arbitrary length already impose a requirement for storing data outside of statically known structures and need dynamic memory management, so the ability to box recursive structures doesn't seem like a fundamental addition.

@bathos
Copy link

bathos commented Dec 15, 2021

Our main inspiration came from WebIDL which does not support recursive types.

WebAssembly is not my domain, but I work with Web IDL a fair amount and I’m curious what this means. I know Web IDL disallows a typedef’s type descriptor from directly being a typedef-ref (which prevents totally tautological definitions) and it forbids dictionaries from recursing in any manner (which I think stems from the risk of default dictionaries expanding infinitely, though I’m uncertain about that). However AFAICT recursive types are generally possible with both sequence and record types at least, e.g.

typedef sequence<(DOMString or DOMStringTreeThing)> DOMStringTreeThing;

Perhaps I have misunderstood the meaning of recursive type as used by @fgmccabe here?

(Since my question is off-topic — and possibly just stems from me totally misunderstanding the subject — I’ll hide it later, but this piqued my curiosity at random and maybe it’s a quickie to answer.)

@lukewagner
Copy link
Member

@theduke oops, sorry, I missed that. Yes, no argument that they're not as performant or ergonomic. I think the current design could be relaxed to allow recursive type definitions, and there would be some clear expressivity benefits, but they also make the implementation (particularly around adapter fusion) significantly more complicated, so we were inclined to hold off on this in the initial "MVP" version. This has the added benefit of, in the meantime, letting us benefit from the significant, ongoing work in the GC subgroup around defining structural recursive types and their subtyping1,2.

@bathos The dictionary restriction seems fairly general (specifically the list starting with A type includes a dictionary D if at least one of the following is true:), covering not just direct self-nesting but also indirect recursion through sequences, unions, records, etc. I had assumed this reflected a more general restriction on type recursion, although I can't find explicit wording preventing the sequence+union recursion you're describing. Are you aware of this actually being used in practice in any Web IDL-defined APIs? It's possible this is a spec-omission and not actually supported by the browsers' Web IDL C++ glue-code generators (for which support would require non-trivial additional complexity, similar to what we're concerned with here).

@bathos
Copy link

bathos commented Dec 15, 2021

@lukewagner I don't know if it's leveraged anywhere currently. My own Web IDL tools permit it because I aimed for spec compliance and, as far as I can tell, the spec also does. Dictionary is the only "includes" definition that pierces generic params like that, but I've never been sure why.

It's true that the spec could just be omitting intended constraints though. Notably, the specified constraint about typedefs doesn't prevent tautological nullable, annotated, or union types. Since it would be a paradox if those weren't forbidden, there is at least one example of constraints being left unstated, which does hint at the possibility of others.

Edit: I just realized the spec is arguably sound regarding tautological nullable, annotated, or union types given rules associated with the definitions of those types (nullable inner cannot be nullable, union cannot contain itself, and annotations are applicable to only specific types).

FWIW, supporting such cases in my tools did not seem to change much at all, but maybe that's just because I only had to deal with the abstract definitions + the ES conversion algorithms, not serialized representations etc.

@badeend
Copy link

badeend commented Jan 9, 2022

Just for clarity; are recursive datatypes themselves the problem, or the fact that they might form circular references at runtime? The former does not automatically introduce the latter.

@fgmccabe
Copy link
Contributor

@badeend neither. Although circular structures are definitely not wanted. It has more to do with the intended scope of interface types. Recall that values are coerced across the function boundary. That implies copying the recursive structure during a function call.
IMO it is more important to resolve the typing issue for resources than to allow recursive types.
Other issues with user defined types of any description involve the automatic construction of adaptive functions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants