Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with GC types without copy #305

Open
oovm opened this issue Feb 20, 2024 · 5 comments · May be fixed by #317
Open

Working with GC types without copy #305

oovm opened this issue Feb 20, 2024 · 5 comments · May be fixed by #317

Comments

@oovm
Copy link

oovm commented Feb 20, 2024

I'm having some trouble switching to wasi preview 2.

For example, the following interface:

package wasi:random@0.2.0;
interface random {
    get-random-bytes: func(len: u64) -> list<u8>;
}

The function signature is func (u64) -> (list<u8>)

But its lower type is core func (i64, i32) -> (), which is very difficult to use.

If I want to convert it to core type (array (mut u8)), a very long glue code is required.


I hope to add a GC mode canon option that can make the lower type similar to core func (i64) -> (array u8).

For complex nested types, getting the specified data requires very complex pointer algebra, whereas if using array it only requires multiple array.get.

I think this helps simplify the use of some external interfaces, such as:

package wasi:filesystem@0.2.0;
interface preopens {
    get-directories: func() -> list<tuple<descriptor, string>>;
}
@lukewagner
Copy link
Member

Yes, agreed. It's definitely the plan of record to add a gc canonical ABI option, just like you're describing. (It's one of the original motivations for having an IDL that abstracts low-level memory representation, even.) We've mostly been waiting for (1) wasm-gc to be finalized, which it now is and (2) an implementation of wasm-gc to show up in a runtime that also implements components (e.g., one is in progress in Wasmtime). But, if you or anyone else wants to run ahead and create a PR adding the gc option to the proposal (Explainer.md, Binary.md and, mostly significantly, CanonicalABI.md), that would be welcome too.

@oovm
Copy link
Author

oovm commented Feb 21, 2024

Before ref-types, gc-types, stringref and other features are stable, we have enough time to discuss how the gc language should obtain wasi data.

In fact, after considering gc types, there is a better correspondence between the wasi type and the wasm type.

No options indicate pointer mode, add reference-type(tentative) to indicate conversion to immutable reference, add mutable-reference(tentative) to indicate internal mutable reference.

Upper Type Lower Type Canonical Options Requisite
u32 i32
tuple<u32, u32> (i32, i32)
tuple<u32, u32> (struct (field i32) (field i32)) reference-type gc
tuple<u32, u32> (struct (field mut i32) (field mut i32)) mutable-reference gc
record {a: u32, b: u32} (flatten layout) (i32, i32)
record {a: u32, b: u32} (struct (field $a i32) (field $b i32)) reference-type gc
list<u8> (i32, i32)
list<u8> (array u8) reference-type gc
list<u8> (array mut u8) mutable-reference gc
string (i32, i32)
string stringref reference-type gc, stringref
string (string.encode_utf8 stringref) reference-type + string-encoding=utf8 gc, stringref
borrow<string> string_view reference-type gc, stringref
resource i32
resource externref reference-type ref-types
flags (flatten layout) (i32 × ⌈flags / 32⌉)
enum i32
option<u32> (ref null i32) / i31ref reference-type gc
option<t> (ref null T) reference-type gc
result<t, e> ? ? ?
variant ? ? ?

variant may be similar to subtype with downcast in gc context.

@oovm
Copy link
Author

oovm commented Feb 21, 2024

Another benefit is that if all gc types are used, there is no need to bring in a memory allocator, which helps reduce the size and warm up faster.

rustc's cabi_export_realloc takes about 27000 lines of wasm instructions(release mode), libc is even larger.

Other smaller allocators sacrifice either speed or security.

(component
    ;; Define a memory allocator
    (core module $MockMemory ;; Replace here by an actual allocator module, such as libc
        (func $realloc (export "realloc") (param i32 i32 i32 i32) (result i32)
            (i32.const 0)
        )
        (memory $memory (export "memory") 255)
    )
    (core instance $mock_memory (instantiate $MockMemory))
    ;; import wasi function
    (import "wasi:random/random@0.2.0" (instance $wasi:random/random@0.2.0
        (export "get-random-bytes" (func (param "length" u64) (result (list u8))))
    ))
    ;; wasi function to wasm function
    (core func $wasi:random/random@0.2.0/get-random-bytes (canon lower
        (func $wasi:random/random@0.2.0 "get-random-bytes")
        (memory $mock_memory "memory")
        (realloc (func $mock_memory "realloc"))
    ))
    ;; import wasm function
    (core module $TestRandom
        (type (func (param i64 i32)))
        (import "wasi:random/random@0.2.0" "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes (type 0)))
    )
    ;; instantiate wasm module with wasi instance
    (core instance $test_random (instantiate $TestRandom
        (with "wasi:random/random@0.2.0" (instance (export "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes))))
    ))
)

If using the gc type, this can be simplified to:

(component
    ;; import wasi function
    (import "wasi:random/random@0.2.0" (instance $wasi:random/random@0.2.0
        (export "get-random-bytes" (func (param "length" u64) (result (list u8))))
    ))
    ;; wasi function to wasm function
    (core func $wasi:random/random@0.2.0/get-random-bytes (canon lower
        (func $wasi:random/random@0.2.0 "get-random-bytes")
        reference-type
    ))
    ;; import wasm function
    (core module $TestRandom
        (type (func (param i64) (result (array u8))))
        (import "wasi:random/random@0.2.0" "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes (type 0)))
    )
    ;; instantiate wasm module with wasi instance
    (core instance $test_random (instantiate $TestRandom
        (with "wasi:random/random@0.2.0" (instance (export "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes))))
    ))
)

Obtaining a field of gc type requires only one instruction and does not require pointer algebra (at least three instructions), further reducing the binary size.

@lukewagner
Copy link
Member

Yes, really good point regarding mutability vs. immutability; we probably do want both as ABI options. A really nice benefit of immutability is that if both sides of a component-to-component call use immutable GC references, no copy needs to be made when passing a reference across the boundary. OTOH, if your language ultimately does need a mutable array of bytes, then the immutable GC option may impose an extra unnecessary copy; thus having both options make sense.

String its its own story, but definitely a Unicode-encoded (array u8) makes sense (if we treat string-encoding as orthogonal, then all three of utf8, utf16 and latin1+utf16 could be encoded into this array of u8/u16). Based on the last CG meeting, stringref is either not going to happen or not any time soon. However, we could add something stringref-y at the Component Model level in which we lower string values to a reference type (externref initially, later we could eliminate dynamic type checks with type imports) and supply canonical built-ins for operating on these strings (being quite careful to support only basic operations that have the same O(1)/O(n) cost on all host string representations such as sequential code-point iteration or bulk-copy-into-linear-memory and are trivial to implement w/o giant Unicode tables). But (array u8) is probably the right place to start.

@oovm
Copy link
Author

oovm commented Feb 21, 2024

Considering the complexity of mutable and some incoming features such as partially mutable, readonly and freeze, it may need to exist as a reference-type parameter.

Taking into account proposals such as thread and share-everything-threading, you can consider implementing this feature in stages.

The initial version only provided immutable types that did not require copying.

Mutability is a post-MVP content, before which users need to sacrifice certain performance to manually implement some glue code to copy to the required types.

@oovm oovm linked a pull request Mar 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants