WebGPURenderer: RenderBundle #28347

RenaudRohlinger · 2024-05-12T13:49:58Z

Description
WebGPU RenderBundle offers performance benefits and introduces a new approach to batch processing the instructions of our scene, reducing the amount of CPU time spent issuing repeated rendered commands.

This PR adds the WebGPU support for RenderBundle and a new faster Renderer pipeline in order to reduce JS overhead and is the first step towards Threejs Static Scenes.

Using the WebGLBackend also works and will still benefit from a reduction of JS overhead by skipping most of the renderer pipeline code.

Example for 8000 meshes, 23.7ms average in the default renderer and 15.9ms average in the RenderBundle mode (32%+ performance increase):
https://raw.githack.com/renaudrohlinger/three.js/utsubo/feat/render-bundles/examples/webgpu_renderbundle.html

This contribution is funded by Utsubo and Plasticity

…undles

github-actions · 2024-05-12T13:51:52Z

📦 Bundle size

Full ESM build, minified and gzipped.

Filesize `dev`	Filesize PR	Diff
678.9 kB (168.2 kB)	678.9 kB (168.2 kB)	+21 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize `dev`	Filesize PR	Diff
456.9 kB (110.3 kB)	457 kB (110.3 kB)	+21 B

examples/jsm/nodes/accessors/NormalNode.js

examples/jsm/nodes/accessors/PositionNode.js

examples/webgpu_renderbundle.html

sunag · 2024-05-12T18:38:07Z

I'm getting this message of error in the example:

I was imagining something like renderer.renderBundle = true, and maybe hash the frustrum to make the update automatic.

RenaudRohlinger · 2024-05-12T23:29:47Z

Ok I will investigate on this error today.

Regarding renderer.renderBundle = true what do you think about renderer.recordBundles() which returns an array of renderBundle and renderer.renderBundles(bundle).

const bundle = renderer.recordBundles(scene, camera)

// your anim loop
function animate() {
  // your static scene gets drawn, for example in a video game it would be all the elements in the background
  renderer.renderBundles(bundle)
  // render your complex stuff that needs to be update every frame (or you can have a postprocess pipeline here)
  renderer.render(sceneComplex, camera)
}

I understand that it might seem optimal to be able to record and always execute in one batch all the renderer commands as it sounds like a super optimization, but in reality, it will just end up becoming so restrictive that there is nothing you can do with it. Cherry-picking a specific part of your pipeline to freeze it as you like sounds a lot more useful, in my opinion.

Also I would say that I still prefer a lot more handling this per scene as I initially proposed as it shouldn't change the renderer API much unlike such structure. On top of that @mrdoob seems to like the idea of a static scene:
#26876 (comment)

By the way the example should be fixed! 😄
https://raw.githack.com/renaudrohlinger/three.js/utsubo/feat/render-bundles/examples/webgpu_renderbundle.html

/cc @sunag

sunag · 2024-05-13T02:34:18Z

I found your second idea interesting... maybe something like?

const renderBundle = new RenderBundle( scene, camera );
renderBundle.transparent = false;
// renderBundle.needsUpdate = true;

renderer.renderBundle( renderBundle );

I think RenderBundle should not be seen as being limited to static scenes, as any uniform can be updated before executeBundles using writeBuffer if necessary, it's base optimization is to avoid CPU load by avoiding JS calls.

RenaudRohlinger · 2024-05-13T03:32:02Z

I like the concept of having a RenderBundle interface. I can update this PR to focus solely on the RenderBundle part for now.

In that case, I will propose another API for the static part in a dedicated pull request.
As mentioned in this PR, it would be great if you could guide me on how you envision the creation of a shared UBO dedicated to the scene (camera, fog), which will be shared among all the objects in the scene per render.

…undles

sunag · 2024-05-13T17:23:41Z

As mentioned in this PR, it would be great if you could guide me on how you envision the creation of a shared UBO dedicated to the scene (camera, fog), which will be shared among all the objects in the scene per render.

My hypothesis would be, during render.renderBundle(), generate a renderBundleData from the RenderBundle class, register all RenderObjects perhaps using _handleObjectFunction... and render "normally" without frustrum, in the second rendering try use this._nodes.updateForRender( renderObject ) and this._bindings.updateForRender( renderObject ); for all previously registered objects and then executeBundles, this already saves a lot of JS in the second call onwards, it certainly won't be compatible with others features like backdrop for example.

UBO optimization should be independent of RenderBundle, it is certainly a very important step for performance, some things I have in mind:

We still have a group of binds for everything, such as setBindGroup( 0, ... ), the next step would we have multiples.
RFC: WebGPURenderer prototype single uniform buffer update / pass #27388
Related: WebGPURenderer: Increase performance #26673

In this sense, the current status is still that #27134, sharedUniformGroup() partially works.

mrdoob · 2024-05-22T10:19:59Z

@RenaudRohlinger

Also I would say that I still prefer a lot more handling this per scene as I initially proposed as it shouldn't change the renderer API much unlike such structure. On top of that @mrdoob seems to like the idea of a static scene:
#26876 (comment)

100%

I'm not sure adding renderer.renderBundle() is the way to go...

I still prefer the "static graph" approach better:

const group = new Group();
group.static = true;

const mesh1 = new Mesh( geometry, material );
group.add( mesh1 );

const instances1 = new InstancedMesh( geometry, material );
group.add( instances1 );

We can make it so the renderer only traverses a static group when group.needsUpdate is set to true.

That way the developer is able to update the bundle data when needed.
We can then use matrixWorldNeedsUpdate in the children to control what needs to be updated.
As well as material.needsUpdate = true.

group.needsUpdate = true; // forces the renderer to traverse the children and update internal bundle

mesh1.position.x = Math.random();
mesh1.matrixWorldNeedsUpdate = true; // recomputes child matrices

instances1.setMatrixAt( index, matrix );
instances1.instanceMatrix.needsUpdate = true;

API wise, it would be mostly adding the properties static and needsUpdate to Group.

We can continue serializing the scene graph as usual while letting the developer "flatten" parts of the graph at render time.

@gkjohnson Maybe this approach could also be used for BatchedMesh? Even replace it? 🤔

nkallen · 2024-05-22T10:32:59Z

Re-using Group this way is an nice API approach. It may run into some tension with the way to get maximum performance...

I do think we would typically want to flatten transforms and other uniforms into one flat array for the whole Bundle. For the best performance, the client code would want to do something like

group.setMatrixAt(group.indexOf(mesh1), matrix));

Along those lines, my hope is to push nearly everything projectObject must do per-frame onto the GPU, including visibility testing, layer testing, frustum culling, etc. So I do think you want to do things like

group.setVisibleAt(group.indexOf(mesh1), false);

(Doing a full traverse just to update one matrix or visibility flag is probably undesirable)

Exposing this behavior into group could work but it also might make the API a bit bulky

sunag · 2024-05-22T18:33:52Z

I think it would be better to deal with the group.static in another PR, the most work is related to the way the renderer deals with bindings, the implementation of this PR currently is related to command optimization, and could be modified to work internally with the similar principle if ( object.static === true ) this._renderBundle( group, camera ) during renderer when the bindings are updated to respect the correct binding groups, now including static ones.

gkjohnson · 2024-05-23T00:03:53Z

@gkjohnson Maybe this approach could also be used for BatchedMesh? Even replace it? 🤔

My impression is that Batching and RenderBundles are different techniques and it's valuable to use both together. Here's a quick overview of my understanding:

Batching / instancing are useful techniques for reducing draw calls and changes to graphics context state.
RenderBundles are a completely unique WebGPU concept designed specifically to avoid the overhead of the validating and marshaling values from Javascript -> native. Basically a series of state commands is "recorded" on the native side so it can be prevalidated and easily "replayed" without the JS -> native overhead.

RenderBundles won't save any draw calls, though. If you record a set of commands that issues 1000 draw calls then 1000 draw calls will still be issued via native code (though faster). So batching saves draw calls, render bundles save JS overhead, and they should be used together.

Sources:

Since I'm here I'll say I think a more transparent API like Group.static or StaticGroup that implicitly uses RenderBundles would be best. Maybe something like a needsUpdate flag indicating that a render bundle update should happen is needed (ie matrices, materials, etc have changed in the children).

group.setMatrixAt(group.indexOf(mesh1), matrix));
group.setVisibleAt(group.indexOf(mesh1), false);

Explicit calls like this shouldn't be necessary I don't think. These things should be implicitly determined based on the hierarchy state when generating the render bundle, no?

RenaudRohlinger · 2024-05-23T01:53:00Z

It seems that we're all in agreement regarding the concepts and direction that the bundle and static techniques should take.

the implementation of this PR currently is related to command optimization, and could be modified to work internally with > the similar principle if ( object.static === true ) this._renderBundle( group, camera ) during renderer when the bindings are > updated to respect the correct binding groups, now including static ones.

As @sunag explained, the newly introduced RenderBundle support is an internal feature, currently exposed to the user for advanced usage. It is on its way to being automatically handled internally in the render pipeline with the Group.static in the _renderScene() API, as mentioned by @mrdoob. I will be working on a new PR following this one to incorporate these changes.

group.setMatrixAt(group.indexOf(mesh1), matrix));
group.setVisibleAt(group.indexOf(mesh1), false);

Explicit calls like this shouldn't be necessary I don't think. These things should be implicitly determined based on the hierarchy state when generating the render bundle, no?

With this RenderBundle PR, it is still possible to update most buffers and uniforms per mesh dynamically, as demonstrated in the live example, where all matrices are dynamically updated.

Another performance optimization to consider is the writeBuffer cost involved with each uniform/buffer update, which will be partially addressed by the single uniform buffer update #27388 PR. /cc @nkallen @gkjohnson

Here's how I see the implementation unfolding:

Part 1 (basic bundle + static pipeline):

renderer._renderBundle(): RenderBundle support and reduce JS overhead
Group.static v1: Automatic internal render bundling using the new RenderBundle interface.
Group.static v2: Improved management of writeBuffer (related RFC: WebGPURenderer prototype single uniform buffer update / pass #27388 I believe /cc @aardgoose)

Part 2 (more advanced features):

Group.static v3: Add a global UBO pipeline (UBO for the camera and scene elements such as fog, etc.) to update only a single UBO per frame, (instead of pre-calculating the modelViewMatrix on the CPU per mesh per frame) and multiply camera matrices by model matrices directly on the GPU instead of on the CPU. (needs RFC: WebGPURenderer prototype single uniform buffer update / pass #27388)
RenderBundle v2 + Group.static v4: Support for drawCallIndirect ([WebGPU] drawIndirect and drawIndexedIndirect #28389) in order to handle frustum culling via a compute shader and to facilitate "dynamic" draw calls count while maintaining the bundle's static state.

RenaudRohlinger · 2024-05-23T04:56:11Z

I added renderBundle.needsUpdate that will regenerate the bundle in the next render call and also renamed renderer.renderBundle() to renderer._renderBundle().

This PR should now be ready for review @sunag 😊

nkallen · 2024-05-23T09:48:32Z

group.setMatrixAt(group.indexOf(mesh1), matrix));
group.setVisibleAt(group.indexOf(mesh1), false);

Explicit calls like this shouldn't be necessary I don't think. These things should be implicitly determined based on the hierarchy state when generating the render bundle, no?

I am thinking ahead a bit to Part 2 ("more advanced features") which is our usecase.

We have a 3d modeling/editor program. The scene is primarily static (most objects have unchanging geometry and transforms). The camera is moving frequently as the user navigates the scene. But when the user initiates an editing operation, some very small subset of items have visibility flags and transforms change as a result of user edits. These are changing per frame (e.g., onpointermove).

So we do want to be able to change these attributes per frame (transforms and visibility), without rebuilding the render bundle, without traversing all items to update a very few matrixWorlds, and without re-uploading all buffers. The existing "live example" as mentioned above does allow transforms to be updated per frame, but at the cost of iterating through every item in the bundle and spending additional cpu and and memory bandwidth, erasing a good portion of the gains from the optimization. (correct me if I'm mistaken?)

We additionally want to render the scene from multiple camera angles (multiple viewports). So for a given user edit, we will issue (for example) four render calls with four cameras. Thus our hope is to get this usecase supported in such a way as to remove anything O(n) on the CPU from the (per-frame part of) the render pipeline.

I'm not sure if this clarifies anything, but we would want an API vaguely like the following.

const bundle = buildBundle();

onUserEdit(edit => {
    updateUniforms(bundle, edit);
    setNeedsRender(allViewports)
} );

for (const viewport of allViewports)
   viewport.orbitControls.onMove(() => setNeedsRender([viewport]));

sunag · 2024-05-23T23:07:38Z

This PR should now be ready for review @sunag 😊...
Group.static v1: Automatic internal render bundling using the new RenderBundle interface.

@RenaudRohlinger I'm reviewing and did this second part Group.static v1 before merging, I think it will be necessary for the example to in compliance with the next steps.

sunag · 2024-05-23T23:32:27Z

We can continue serializing the scene graph as usual while letting the developer "flatten" parts of the graph at render time.

@mrdoob Let's adopt this strategy, the current PR will be part of this as @RenaudRohlinger commented.

@RenaudRohlinger After this step, I find it interesting to include the management of bindings by group, today we are dealing with bindings in the same group.

three.js/examples/jsm/renderers/webgpu/WebGPUBackend.js

Line 809 in 2f55b35

passEncoderGPU.setBindGroup( 0, bindGroupGPU );

sunag · 2024-05-24T00:07:55Z

Group.static v2: Improved management of writeBuffer (related #27388 I believe /cc @aardgoose)

About Group.static v2 and binding group, the ideia we have multiples groups it would be exactly to separate the binds group and share between the materials, in this case we could have a group just for the camera.

For example in this case https://webgpu.github.io/webgpu-samples/?sample=renderBundles

I think the implementation of #27388 should be after that.

RenaudRohlinger · 2024-05-24T01:00:32Z

Awesome thanks @sunag!

he ideia we have multiples groups it would be exactly to separate the binds group and share between the materials, in this case we could have a group just for the camera.

If I understand correctly, there will be a shared frame buffer (passEncoderGPU.setBindGroup(0, bindingsData.frameBindGroup)) that is distinct from the buffers of the render list (passEncoderGPU.setBindGroup(1, bindingsData.group)). We could use this frameBindGroup for the camera matrices and the scene fog, while also allowing the user to add custom and vital extra data, such as a global float timer.

All the global scene-related data would be bound to index 0, and all the per-object level data and shaders would be bound to index 1 (global stuff bound to 0 in the shaders).

This way, we could render the scene from multiple camera angles in a split-view manner, for example, without having to update anything except that specific buffer.

Or do you have something different in mind?

sunag · 2024-05-24T04:25:53Z

I have this in mind:

The first step is to make the UniformGroupNode generate the groups in NodeBuilder at buffer level and can be accessed through of a function like renderObject.getBindingGroups(). Backend functions like createBindings() should also be group-oriented.

Each uniformGroup( 'name' ) will be an individual group, for example if we have:

// sharedUniformGroup( 'frame' ) // global timer .. let's ignore it for now

sharedUniformGroup( 'camera' )
sharedUniformGroup( 'render' ) // material, fog, toneMapping, etc
uniformGroup( 'object' ) // default group, object matrices, etc
uniformGroup( 'custom' ) // user defined group

Will be:

setBindGroup( 0, bindingsData.groups.camera ) // bindingsData.groups[ 0 ]
setBindGroup( 1, bindingsData.groups.render ) // bindingsData.groups[ 1 ]
setBindGroup( 2, bindingsData.groups.object ) // bindingsData.groups[ 2 ]
setBindGroup( 3, bindingsData.groups.custom ) // bindingsData.groups[ 3 ]

We'll probably have to sort them.

Now any Node can be stored in an individual group at buffer level, as the code part would already be ready here.

Defining a uniform in a group would be very simple, currently e.g:

import { uniform, renderGroup } ...

// the buffer will be updated only once per render call
const myGlobalPosition = uniform( new THREE.Vector( 0, 100, 0 ) ).setGroup( renderGroup );

// or

const customGroup = sharedUniformGroup( 'myGroup' );
const myGlobalPosition = uniform( new THREE.Vector( 0, 100, 0 ) ).setGroup( customGroup );

// custom groups must be updated using `.needsUpdate`
customGroup.needsUpdate = true;

--

After this process we will be able to detect the patterns and share them between materials once shared is true. It would be better to do it in a separate PR. We can use a hash library for this, through the input nodes per group.

In each draw() the renderer will compare whether the previous group is the same as the current one, and will only update if it is different, a bind group of a material would be updated once in the rendering for example.

RenaudRohlinger added 15 commits May 2, 2024 20:12

wip

b5e90f1

add demo

9581a48

add gpu metrics

601829e

Merge remote-tracking branch 'upstream/dev' into utsubo/feat/render-b…

f0a0f7d

…undles

fix bundeType condition

4d45eeb

cleanup

f980c88

refactor and cleanup

a108cc2

support postprocess and multisample

cbd2952

update

6e7c6be

cache and pbr on bundle example

da2c578

wip static mode

8416940

update

4117374

update

e35068c

Merge branch 'dev' into utsubo/feat/render-bundles

36450f6

revert shared

78554d0

github-advanced-security bot found potential problems May 12, 2024

View reviewed changes

RenaudRohlinger added 2 commits May 12, 2024 23:02

ci

03974d6

circular dep

3c0547b

RenaudRohlinger added 3 commits May 13, 2024 08:53

move the logic to the renderContext

ad881a6

add screenshot for ci

e114f30

cleanup

453c9d9

Merge remote-tracking branch 'upstream/dev' into utsubo/feat/render-b…

0aa652c

…undles

RenaudRohlinger mentioned this pull request May 16, 2024

[WebGPU] drawIndirect and drawIndexedIndirect #28389

Open

merge with dev

a58f07b

RenaudRohlinger added 2 commits May 19, 2024 23:34

TODO: Need to handle FBO too

fa28f29

cleanup

84ed55d

RenaudRohlinger changed the title ~~WebGPURenderer: RenderBundle and static mode~~ WebGPURenderer: RenderBundle May 19, 2024

RenaudRohlinger added 3 commits May 19, 2024 23:52

more cleanup

cc65c9c

fix deepscan

9b53b73

fix framebuffer

9d26c2b

RenaudRohlinger marked this pull request as ready for review May 20, 2024 02:40

RenaudRohlinger added 3 commits May 20, 2024 11:41

update example

978fb04

update scene too

ab55f96

reuse correct scene for update matrices

61b6697

RenaudRohlinger added 3 commits May 23, 2024 13:35

merge with upstream

b832ce7

introduce renderBundle.needsUpdate and rename to private _renderBundle()

b65317d

cleanup

e19de5f

RenaudRohlinger added 2 commits May 23, 2024 14:30

improve example

8a193ef

fix capsule constructor

74aca82

remove confusing gui in example

1a46e3d

Adding RenderBundles and Group.static

777d2ee

sunag merged commit 82b78e7 into mrdoob:dev May 23, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebGPURenderer: RenderBundle #28347

WebGPURenderer: RenderBundle #28347

RenaudRohlinger commented May 12, 2024 •

edited

github-actions bot commented May 12, 2024 •

edited

sunag commented May 12, 2024

RenaudRohlinger commented May 12, 2024 •

edited

sunag commented May 13, 2024 •

edited

RenaudRohlinger commented May 13, 2024

sunag commented May 13, 2024

mrdoob commented May 22, 2024 •

edited

nkallen commented May 22, 2024 •

edited

sunag commented May 22, 2024

gkjohnson commented May 23, 2024 •

edited

RenaudRohlinger commented May 23, 2024 •

edited

RenaudRohlinger commented May 23, 2024 •

edited

nkallen commented May 23, 2024 •

edited

sunag commented May 23, 2024

sunag commented May 23, 2024

sunag commented May 24, 2024 •

edited

RenaudRohlinger commented May 24, 2024 •

edited

sunag commented May 24, 2024 •

edited

WebGPURenderer: RenderBundle #28347

WebGPURenderer: RenderBundle #28347

Conversation

RenaudRohlinger commented May 12, 2024 • edited

github-actions bot commented May 12, 2024 • edited

📦 Bundle size

🌳 Bundle size after tree-shaking

sunag commented May 12, 2024

RenaudRohlinger commented May 12, 2024 • edited

sunag commented May 13, 2024 • edited

RenaudRohlinger commented May 13, 2024

sunag commented May 13, 2024

mrdoob commented May 22, 2024 • edited

nkallen commented May 22, 2024 • edited

sunag commented May 22, 2024

gkjohnson commented May 23, 2024 • edited

RenaudRohlinger commented May 23, 2024 • edited

RenaudRohlinger commented May 23, 2024 • edited

nkallen commented May 23, 2024 • edited

sunag commented May 23, 2024

sunag commented May 23, 2024

sunag commented May 24, 2024 • edited

RenaudRohlinger commented May 24, 2024 • edited

sunag commented May 24, 2024 • edited

RenaudRohlinger commented May 12, 2024 •

edited

github-actions bot commented May 12, 2024 •

edited

RenaudRohlinger commented May 12, 2024 •

edited

sunag commented May 13, 2024 •

edited

mrdoob commented May 22, 2024 •

edited

nkallen commented May 22, 2024 •

edited

gkjohnson commented May 23, 2024 •

edited

RenaudRohlinger commented May 23, 2024 •

edited

RenaudRohlinger commented May 23, 2024 •

edited

nkallen commented May 23, 2024 •

edited

sunag commented May 24, 2024 •

edited

RenaudRohlinger commented May 24, 2024 •

edited

sunag commented May 24, 2024 •

edited