Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array passing back and forth between JS and AS #263

Closed
jbousquie opened this issue Sep 12, 2018 · 34 comments
Closed

Array passing back and forth between JS and AS #263

jbousquie opened this issue Sep 12, 2018 · 34 comments

Comments

@jbousquie
Copy link

jbousquie commented Sep 12, 2018

Hello,

I'm one the core team member of the BabylonJS 3D framework (https://github.com/BabylonJS/Babylon.js) and I'm evaluating the opportunity to port some parts of BabylonJS to AS.
A 3D engine needs to compute fast many maths between two frames (so ideally in less than 16 milliseconds to get a framerate 60 fps) and to lower the GC impact for performance reasons.

In my test, I would like to :

  • allocate once some big size-fixed float32 typed array. Say, more than 1 million floats. No matter where it's allocated, JS or WASM side, as long as it's shared and re-usable along the render loop
  • populate this array with float32 values JS side once
    then each frame :
  • update the array JS side with new values (new mesh coordinates along time)
  • call an AS function that compute some trigonometry on this array values, store the results (million float32 also) in the array in a free reserved zode (note that we could also use 2 arrays, one for the input, the other for the output, as long as they remain allocated for ever, shared, accessible and uptadable/readable either from JS as from AS).
  • read the results, so the updated shared array, from JS and copy these results to the WebGL buffer.

What I found so far in the Wiki or in the github issues is how to allocate an array AS side and return the pointer to JS, then to read its values with the right typed view or the right offset.
We are really concerned by allocating once only the memory and by sharing it between JS and AS. I guess that I unfortunately copied or reallocated the memory each call either JS side, either AS side as I didn't understand how to access the same initial array both sides in my AS test so far.

Would you please mind to provide a snippet JS and AS side to achieve this ? mainly, the creation of the persistent array among JS and TS and how to access it again and again along the computation calls then.

@dcodeIO
Copy link
Member

dcodeIO commented Sep 12, 2018

In such a case, I'd most likely allocate a Float32Array on the JS side, use it as the memory of the WebAssembly module, and manually load<f32>(atOffset) / store<f32>(atOffset) them in AS to achieve the best performance. This way around you can access the values, without any sort of copying involved, by keeping a Float32Array view on the same buffer around on the JS side. When copying to a WebGL buffer, you'd then just .set the region of interest from WASM memory to a WebGL buffer to issue a memcpy.

If there's more memory necessary than just those f32 values, there's the --memoryBase compiler option that reserves a fixed amount of memory in front, while putting anything else (i.e. static strings or otherwise allocated on the AS side) behind, so with --memoryBase 4000000 there'd be enough space for 1.000.000 32-bit floats from offset 0 to 3999996.

It's not strictly necessary to import the memory, actually, as you could simply new Float32Array(myModule.memory.buffer, 0, 1000000) on the memory as well to obtain the view on the reserved f32 values after instantiating the module.

One thing to take care of, when keeping a view on the memory around in JS, is that if the WASM memory resizes (i.e. through a memory.grow instruction), that the view is invalidated because the original memory becomes detached (and a new view on the updated memory must be created). Pre-allocating a sufficient amount of pages works around this (see also).

@jbousquie
Copy link
Author

jbousquie commented Sep 12, 2018

Thank you for your fast answer and the warning about the memory resize concern :-)

Do you mean to load and to store float by float in a big loop ?
How to get AS side the pointer on the first position of the allocated Float32Array JS side ?

@dcodeIO
Copy link
Member

dcodeIO commented Sep 12, 2018

Yeah, you'd simply write load<f32>(index << 2) in AS instead of myArray[index] in JS. Here, load simply compiles to a f32.load instruction. Same goes for store<f32>(index << 2, someValue).

The pointer in this case is index << 2, i.e. 0 for the first element (JS: myArray[0]), 4 for the second (JS: myArray[1]) etc. This works because your f32 array is at the beginning of the WASM memory. If you had two arrays (another one not starting at 0), you'd simply add its start offset, like so: load<f32>(startOffset + (index << 2)). If all array sizes are constant, you can also provide the start offset as an additional constant argument, like so: load<f32>(index << 2, startOffset), which is equivalent to f32.load offset=CONSTANT, if startOffset is constant.

Example:

const INPUT_START: usize = 0; // 500000 elements
const OUTPUT_START: usize = 4 * 500000; // 500000 elements

@inline function getInput(index: i32): f32 {
  return load<f32>(index << 2, INPUT_START);
}

@inline function setOutput(index: i32, value: f32): void {
  store<f32>(index << 2, value, OUTPUT_START);
}

The Game of Life example does something like this as well.

@kyr0
Copy link

kyr0 commented Sep 12, 2018

Also:

WebAssembly/design#1231

@MaxGraey
Copy link
Member

@jbousquie I do similar interop for earcut AS port. See this and this. Also you could do even faster and avoid TypedArray#set: WebAssembly/design#1231 (comment)

@kyr0
Copy link

kyr0 commented Sep 12, 2018

@dcodeIO and @MaxGraey Are you fine with me implementing the WebAssembly/design#1231 proposals in lib/loader and PR?

@dcodeIO
Copy link
Member

dcodeIO commented Sep 12, 2018

Possibly related: #136

Note though, that an implementation of getArray etc. would have one level of indirection, whereas the solution proposed here is as direct as it can get.

@MaxGraey
Copy link
Member

MaxGraey commented Sep 12, 2018

@dcode I think much better use code-generation as postprocess for js <-> wasm interop like wasm-bindgen does.

@dcodeIO
Copy link
Member

dcodeIO commented Sep 12, 2018

Well, that would still use the loader, or something very similar, for array interop, because arrays in AS can be at any offset anyway (with a buffer being referenced from anywhere), and there is no concept of a C-like static array in TS (hence the proposed load/store here).

@kyr0
Copy link

kyr0 commented Sep 12, 2018

@MaxGraey Yeah, I was about to write another comment that code generation would be much better. But then I thought again... maybe not everybody wants to use a generated facade? So maybe having both options would be nice. Having it available in the lib, and code generation probably using this lib code.

@kyr0
Copy link

kyr0 commented Sep 12, 2018

Well, that would still use the loader, or something very similar, for array interop, because arrays in AS can be at any offset anyway, and there is no concept of a C-like static array in TS (hence the proposed load/store here).

Yeah, and it would be cool if there would be a separation of the API's for the two different behaviors: copy or "pass-by-reference". And having both available in a simple, yet easy to understand way. And to put some docs towards this in the wiki etc. as I see more and more issues here popping up just to ask for help how to do it :)

@kyr0
Copy link

kyr0 commented Sep 12, 2018

I'll write a PoC tonight... and maybe it is good or you decide to throw it ;)

@dcodeIO
Copy link
Member

dcodeIO commented Sep 12, 2018

Also note that these things will become more convenient once reference-types/GC are there. With that, there can be typed array types on the AS side that can flow out to/in from JS naturally.

@kyr0
Copy link

kyr0 commented Sep 12, 2018

Also note that these things will become more convenient once reference-types/GC are there. With that, there can be a typed array types on the AS side that can flow out to JS naturally.

Good point. I just feel that it is maybe a good idea not to wait until this lands in all WASM supporting engines as this might take some time.

But it basically means that the support for this should probably be flagged as experimental in the loader lib because the internal implementation will surely change once reference-types/GC lands and so will the lib's API as ptr arguments might not be necessary anymore by then (?)

But being one of the first projects to support passing TypedArrays between JS/WASM forth and back easily might be a big deal for this project as I guess many dev's are not soo much into "bit dancing" :))

@jbousquie
Copy link
Author

thanks guys, I'll make a test these next days with your proposal :-)

@jbousquie
Copy link
Author

jbousquie commented Sep 13, 2018

mmh... I guess I need to compile also with the flag --importMemory then (just using npm run asbuild so far)

@jbousquie
Copy link
Author

@MaxGraey I just couldn't find where you used load/store in your port of earcut. Maybe you use another method.
@dcodeIO Another naive question please :
I understood quite well the use of load/store AS side in the shared buffer.
I just wonder if, instead of using load/store, we could also use views like new Float32Array( _referenceToSharedBufferASside_, offset, size) to access the existing shared buffer from AS ?
My concern here is to lower the porting cost from the BabylonJS existing TS code to AS while keeping the sharing method you proposed. That said, I have no idea about how to get the reference to the shared buffer in AS.

@MaxGraey
Copy link
Member

@jbousquie I used ordinal arrays instead load/store because input and output arrays hasn't fixed length. If you want see how using load/store for reading or writing static memory, you should follow mandelbrot or game-of-life examples

@jbousquie
Copy link
Author

jbousquie commented Sep 13, 2018

thanks a lot.
It seems that load and store require the optional offset to be a real constant.
So no way ("Operation not supported") to do something like this :

function getInput(index: i32, byteOffset; usize): f32 {
  return load<f32>(index << 2, byteOffset);
}
function setOutput(index: i32, value: f32, byteOffset: usize): void {
  store<f32>(index << 2, value, byteOffset);
}

This is sad because the big buffer that I share between JS and AS contains many different data types (mesh geometry, mesh normals, transformations, results), although all float32, and I would like to pass the WASM module each byteOffset, meaning each data type or each buffer section. Actually although the size of the buffer is fixed once for all, it depends on the 3D mesh geometry and complexity. So a fixed buffer (and offsets) but only once the mesh is built.

Maybe should I get rid of the byteOffset and compute each index directly from zero each time or something like this ?

return f32(load<f32>((index + offset) << 2));

I'll have a try...
tried in vain (maybe my error is somewhere else though, but I'm not really sure to access the right floats yet)

@kyr0
Copy link

kyr0 commented Sep 15, 2018

So, after working on the code and an PR I realized that if we:

  • assume that we only need to exchange TypedArray in one specific place in memory
  • we know the address space in memory upfront
  • we only have to deal with one datatype (say f32)
  • process the data in a WASM module and access it in JS context afterwards

...we can easily go with the API's we have already (more or less).

Here is a working example for f32 / Float32Array:

JS context:

const env = {
    memoryBase: 0,
    tableBase: 0,
    memory: new WebAssembly.Memory(),
    table: new WebAssembly.Table()
};

var yourModule = loader.instantiateBuffer(source, {
    env
});

// reading is only safe up to the length of F32.length (cleared values)
yourModule.F32.set([33.333, 44.4444, 55.555, 0, 0, 0, 0, 0, 0, 0]);

console.log('Typed data to process:', yourModule.F32);

// assuming the C-like array memory is stored in modules memory from address 0x0000 on
// you can omit this parameter as the first byte is located at 0x000 all the time
// it's only written here for the better understanding (not having a magical 0x0000 in AS context)
yourModule.processTypedArray(0x0000, yourModule.F32.length);

// here, the WASM module has already processed the data and stored the results back into the same address space
// that is viewed by the .view -> we can access the result directly without any additional operation
console.log('Processed typed data:', yourModule.F32);

WASM module:

import "allocator/tlsf";

export function processTypedArray(ptr: usize, length: i32): void {

    const inputPtr: Pointer<f32> = new Pointer<f32>(ptr);

    for (let i: i32 = 0; i < length; i++) {
        // mutate the values directly in memory
        inputPtr[i] = inputPtr[i] * 2;
    }
}

The Experimental Pointer<T> implementation by @dcodeIO saves you the manual pointer arithmetic impl.

This example works well for up to 16384 values for a F32 (Float32Array) because of the max addressable space for Float32Array (as far as I understood, I might be wrong here).

Other TypedArray subclasses:
All other TypedArray types work in a similar way. You just use a different exported TypedArray view reference in JS context and the corresponding numeric value type in WASM module context.

You can look them up in the type mapping for the JS context and the type declaration for the AS context.

On memory allocation:
It is worth to notice, that the module.F32 etc. exported TypedArray view members "virtually" reserve take max length, but they don't actually call .allocate() in the lib/loader.js impl. This is fine because the chosen WASM module's allocator handles the auto-grow if necessary (@dcodeIO
please correct me, if this is wrong). At least this is my experimental observation and understanding after a quick look at the std/allocator/* implementations.

So, if you go with import "allocator/tlsf"; the auto-grow works without any issues.
If you go with import "allocator/arena"; it will throw wasm execution error RangeError: Source is too large.

On memory clearing:
Because there is no active memory management, there is no automated clearing of memory. If you do set 10 values using .set([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) you can't know what you will find at index 10 (11th value), but you can access this value, because a Float32Array lets you access the memory up to index 16383. So, it's important not to access out of bounds, even if it is possible. And better clear the all values that you plan to access with 0 values, otherwise it may end up like this:

bildschirmfoto 2018-09-15 um 15 41 57

So far so good, all my personal needs would be already served if the Pointer<T> impl. would not be experimental anymore and lands in AssemblyScript officially.

Also, I remember that it was mentioned (somewhere here) that there are plans for more complex datatype interop between JS and WASM for upcoming WebAssembly standards already. Given this, it may be seen as questionable, if we should add special API's for complex datatype interop now or just go with what we have plus Pointer<T> and wait for the spec updates to land. I would be super happy with the latter as it is a simple, but working approach, solves all problems and is agnostic to upcoming spec changes.

What do you guys think?

@jbousquie
Copy link
Author

jbousquie commented Sep 16, 2018

Nice approach.
Actually, in my case, I need to pass float32 data and uInt32 data to compute float32 results. In order to limit the porting cost from the current TS code to AS and to abstract as much as possible the memory management for the developer, I'm thinking about something like this (not tested yet), assuming we've imported enough memory once for all and that all arrays have a fixed size once created :

JS side : initialization phase

// in a persistent object, call AS array creation functions
this.arrIN1 = module.createFloatArray(size1);   // returns the pointer on the array, right ?
this.arrIN2 = module.createIntArray(size2);
this.arrIN3 = ...
this.result = module.createFloatArray(size3); 

AS side : initialization exported function

export function createFloatArray(size: i32): f32[] {
    return new Float32Array(size);    // should allocate a f32 array in the imported memory and return the pointer
}
// ... same for function createIntArray(size)

We don't need to know where the arrays are allocated in the memory (successive locations or not) and we can pass and get back different types of data.

JS side, initial array population :
Let's access and store data in all IN arrays like described here : #105 (comment)
by using this.arrIN1, this.arrIN2, etc as array pointer in the linked example.

Then still JS side, let's call the exported function compute() as often as necessary, say 60 times per second in a render loop.
AS side : let's compute things by accessing values with the syntax arr[i]

export function compute(arr1 : f32[], arr2: u32[], arr3: f32[], ..., result: f32[]) {
   for(let i = 0; i < arr2.length; i++) {
     for(let j = 0; j < arr1.length; j++ {
        result[i] = computeStuff(arr1[i], arr3[j]); // internal computations, values stored in result
      }
  }
}

This code and its logic really look like the current TS code so far.

JS side :

// update the IN arrays with fresh values each frame like the array population example, 
// then simply call compute() to update the result array
module.compute(this.arrIN1, this.arrIN2, this.arrIN3, ..., this.result);

// read back updated data from result to pass them to the WebGL buffer

Ending, once the render loop is over :
JS side : let's free the memory

module.free(this.arr1);
module.free(this.arr2);
module.free(this.arr3);

AS side

export free(ptr) {
  GC.collect(ptr); // or something else setting the memory free
}

Not sure this could work.

@kyr0
Copy link

kyr0 commented Sep 17, 2018

Not sure this could work.

I see it working until the point where the JS side "initialized" Float32Array should write to the same backing memory as the Float32Array in AS context. To make a long story short: I bet, it won't :) Also the Float32Array in AS context has a slightly different memory layout. It stores a byteLength value in front of the actual array entries whereas JS TypedArrays are C-like arrays staring with values at 0x0000.

But, aside of this nifty details: So yes, in my PR I was working on exactly what you proposed, but using a slightly different implementation and now, as I see that there is a real need for more than only one initial data passing, I will finish my PR as it seems to me to require only a few LoC left to implement all your needs.

eta. this evening, if all runs well :)

@jbousquie
Copy link
Author

thank you
I'll go on my tests anyway and wait for your smart PR

@kyr0
Copy link

kyr0 commented Sep 17, 2018

@jbousquie So, the PR is ready. I guess the code is almost self-explaining:

// create a TypedArray subtype like Float32Array, set the length
var float32Array = module.newTypedArray(2, Float32Array);

// set the data, in JS context
float32Array.view.set([1.2e-38, 3.4e38]);

// process the data in AS context
module.processFloat32Array(float32Array.ptr, float32Array.view.length);

// AS mutates the memory state directly, so the view in JS reflects all changes 
float32Array.view // is still a standard JS Float32Array

float32Array.view[0] // first value of the processed Float32Array

You can free the memory by calling memory.free(...) in AS context or module.memory.free(...) in JS context, as you like.

All https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays subtypes are supported.

@jbousquie
Copy link
Author

Huge ! I'll test all of this soon

@jbousquie
Copy link
Author

Finally I used a simpler way, based on the first proposition, to pass floats, ints and get back computed floats. I'll describe this later in this post.
Here's the working version : http://jerome.bousquie.fr/BJS/test/SPSWasm/spsWasm.html
compared to the same JS version : http://jerome.bousquie.fr/BJS/test/spsBuffer.html
Beware, it's CPU intensive as it animates 40K solid particles.
gain in my Chrome browser : fps x4.4

@jbousquie
Copy link
Author

jbousquie commented Sep 19, 2018

Ok here's how I did up now, assuming it's compiled using --importMemory
The inline comments should be enough to understand the process.
JS side :

    var memory = new WebAssembly.Memory({
            initial: 1000
    });
    this.wasmBuffer = memory.buffer;

    // Views on different parts of the wasm buffer
    // all types are 4 bytes long here
    this.offset1 = this.localPos.length;        // don't care the var values, they are just array lengths
    this.offset2 = this.offset1 + this.localNor.length;
    this.offset3 = this.offset2 + this.shapeLengths.length;
    this.offset4 = this.offset3 + this.transformLength;
    this.offset5 = this.offset4 + this.localPos.length;

    this.byteOffset1 = this.offset1 * 4;
    this.byteOffset2 = this.offset2 * 4;
    this.byteOffset3 = this.offset3 * 4;
    this.byteOffset4 = this.offset4 * 4;
    this.byteOffset5 = this.offset5 * 4;

    WebAssembly.instantiateStreaming(fetch(wasmURL), imports).then(obj => {
        this.wasmModule = obj.instance.exports;
        this.compiled = true;

        // All this must be AFTER the module instanciation, else something altering the data is written in the memory buffer 
       // Keep a specific typed view on each part of the global shared buffer
        this.wasmPos = new Float32Array(this.wasmBuffer, 0, this.localPos.length);
        this.wasmNor = new Float32Array(this.wasmBuffer, this.byteOffset1, this.localNor.length);
        this.wasmShp = new Uint32Array(this.wasmBuffer, this.byteOffset2, this.shapeLengths.length);
        this.wasmTransforms = new Float32Array(this.wasmBuffer, this.byteOffset3, this.transforms.length);
        this.wasmTransformedPos = new Float32Array(this.wasmBuffer, this.byteOffset4, this.localPos.length);
        this.wasmTransformedNor = new Float32Array(this.wasmBuffer, this.byteOffset5, this.localNor.length);
        // init the buffer with some existing array values
        this.wasmPos.set(this.posBuffer);
        this.wasmNor.set(this.norBuffer);
        this.wasmShp.set(this.shapeLengths);
        this.wasmTransforms.set(this.transforms);
        this.wasmTransformedPos.set(this.localPos);
        this.wasmTransformedNor.set(this.localNor);
    });

    // .... further in the render loop, each frame
    // call the wasm function "transform" that updates the wasm buffer
    this.wasmModule.transform(this.particleNb, this.offset1, this.offset2, this.offset3, this.offset4, this.offset5);

Now AS side (I won't write all the code, you can find the source here : http://jerome.bousquie.fr/BJS/test/SPSWasm/index.ts ), just :

  • pick the data (floats or ints) from each part of the global buffer using the passed offset parameters as "array"-like indexes
  • compute 3D stuff
  • store the results in the right part (the last one) of the global buffer
  • return nothing
// function transform(particleNb, offset1, offset2, offset3, offset4, offset5, offset6)
// particleNb : number of particles
// the buffer is populated like this :
//       0..offset1 : float32 localPositions
// offset1..offset2 : float32 localNormals
// offset2..offset3 : uInt32  shapeLengths (one per particle)
// offset3..offset4 : float32 transformations (one set of 9 floats per particle : position/rotation/scaling)
// offset4..offset5 : float32 transformedPositions
// offset5..offset6 : float32 transformedNormals

export function transform(particleNb: u32, offset1: u32, offset2: u32, offset3: u32, offset4: u32, offset5: u32): void {
\\ subpart example ...

    for (let p: u32 = 0; p < particleNb; p++) {
        // get the current particle transformation
        // transformations are stored from offset3
        tIdx = p * stride;          
        offsetTransforms = offset3 + tIdx;

        pos_x = load<f32>((offsetTransforms) << 2);
        pos_y = load<f32>((offsetTransforms + 1) << 2);
        pos_z = load<f32>((offsetTransforms + 2) << 2);

        rot_x = load<f32>((offsetTransforms + 3) << 2);
        rot_y = load<f32>((offsetTransforms + 4) << 2);
        rot_z = load<f32>((offsetTransforms + 5) << 2);

        scl_x = load<f32>((offsetTransforms + 6) << 2);
        scl_y = load<f32>((offsetTransforms + 7) << 2);
        scl_z = load<f32>((offsetTransforms + 8) << 2);
       ... etc
    }
}

Back JS side :
As the memory buffer is directly updated by the WASM module, there's nothing more to do in the JS array after the call. We can pass the related updated sub-buffers (typed arrays) straight to the VBO (WebGL buffer)

    // each frame
    // call the wasm function "transform" that updates the wasm buffer
    this.wasmModule.transform(this.particleNb, this.offset1, this.offset2, this.offset3, this.offset4, this.offset5);

    // and update the actual mesh computed positions and normals
    this.mesh.updateVerticesData(BABYLON.VertexBuffer.PositionKind, this.wasmTransformedPos, false, false);
    this.mesh.updateVerticesData(BABYLON.VertexBuffer.NormalKind, this.wasmTransformedNor, false, false);

Thank you a lot for your support, answers and suggestion.
Next step : try the @kyr0 's newTypedArray :-)

@kyr0
Copy link

kyr0 commented Sep 25, 2018

Example code (using current impl. by @dcodeIO):

https://github.com/kyr0/assemblyscript-js-wasm-interop-example

@stale
Copy link

stale bot commented Feb 8, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ericblade
Copy link

forgive the resurrection, but ... it seems like either none of this is relevant in the modern assemblyscript, or none of this is exposed in the same way, making it a wee bit difficult to figure out --

i'm here just trying to figure out how to write a simple test to prove this out, from 'asinit', i've got a Assemblyscript module that exports an 'add' function by default, and a Typescript module that imports it, and it runs in node.

Trying to figure out how to get to the point where I can access a Uint8Array shared between both sides, and I am completely missing how to do anything here. There is no lib/loader, no export { memory }, no newTypedArray etc etc.

Help? Thanks :)

@MaxGraey
Copy link
Member

@ericblade
Copy link

I am aware that the default passing of arrays is by copy, but I need to understand how to allocate an array on one side or the other and modify it in place from either side. :|
The official manual seems to be missing a LOT of information.

@MaxGraey
Copy link
Member

Currently, assemblyscript generate all necessary glue code automatically. Just add --binding raw or --binding esm. If you really interested how this happened under the hood you can check generated release.js file which contain __lowerArray / __liftArray helper methods

@ericblade
Copy link

That does not supply enough information to get there, given the documentation, examples, and what is prebuilt with 'asinit'. This is incredibly frustrating with the documentation being full of 404s, examples being years out of date, and the prebuilt sample being the absolute bare minimum to be functional -- if it even is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants