WebGLRenderer: Tight morph target packing. #27768

wizgrav · 2024-02-19T00:26:36Z

While optimizing morph target performance for this demo I noticed that positions and normals carry an unused component.

This PR changes the morph target texture from rgba -> single float to enable fetching only the xyz components for positions and normals while for color it still stores and fetches the full rgba set.

It is functionally identical as before but gives 25% better fps (23 vs 18) on my laptops 3060 in this example which I also include in the PR. If you don't want it I can take it out but it would be nice to have something to stress test morph targets.

github-actions · 2024-02-19T00:28:46Z

📦 Bundle size

Full ESM build, minified and gzipped.

Filesize `dev`	Filesize PR	Diff
676.4 kB (168.3 kB)	676.8 kB (168.3 kB)	+374 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

Filesize `dev`	Filesize PR	Diff
457.3 kB (111 kB)	457.7 kB (111.1 kB)	+376 B

Mugen87 · 2024-02-19T08:59:13Z

It seems the webgl_loader_mmd example fails with this PR. Do you mind verifying?

Mugen87 · 2024-02-19T09:00:58Z

One example that demonstrates instanced morph targets is sufficient. Do you mind modifying webgl_instancing_morph instead of adding a new demo?

Mugen87 · 2024-02-19T09:06:26Z

src/renderers/shaders/ShaderChunk/morphtarget_pars_vertex.glsl.js

 			int y = texelIndex / morphTargetsTextureSize.x;
 			int x = texelIndex - y * morphTargetsTextureSize.x;

 			ivec3 morphUV = ivec3( x, y, morphTargetIndex );
-			return texelFetch( morphTargetsTexture, morphUV, 0 );


Are multiple fetches really faster than a single one? I'm afraid this needs to be tested one more than one device. Especially mobile devices could show a performance regression here.

As it is currently with rgba format it will always fetch 4 floats so fetching 3 instead should always be faster especially on mobiles where bandwidth is precious. I'll replace the horse example with the faces one and do some more testing but I only have one android phone with an older snapdragon and the quest 3 which all show similar improvements

Alternatively, you could also add a slider to webgl_instancing_morph that controls the number of instances.

I would prefer to have the faces instead because they are using more influences and also morph the normals so its a more complete case than the horses.

wizgrav · 2024-02-19T09:51:14Z

It seems the webgl_loader_mmd example fails with this PR. Do you mind verifying?

I saw that it's failing when checking the screenshot with "Diff wrong in 0.4% of pixels in file: webgl_loader_mmd" but this is irrelevant to any of the files in this PR. I've seen it happening in the past and usually triggering the tests again with a commit fixes it but I've done it once and it didn't. I've checked the files on my repo and they look identical on the eye, maybe its a transparency issue? I'm a bit lost on this one, any help would be appreciated

Mugen87 · 2024-02-19T10:30:48Z

src/renderers/shaders/ShaderChunk/morphtarget_pars_vertex.glsl.js

@@ -20,12 +20,34 @@ export default /* glsl */`

 		vec4 getMorph( const in int vertexIndex, const in int morphTargetIndex, const in int offset ) {

-			int texelIndex = vertexIndex * MORPHTARGETS_TEXTURE_STRIDE + offset;
+			int texelIndex = vertexIndex * MORPHTARGETS_TEXTURE_STRIDE + 3 * offset;


Why do we need to multiply the offset with 3 now?

To keep in line with what we previously had on the shader chunks that call getMorph. Now the position is at offset 0, normals at offset 3 and colors at offset 6. Alternatively we can modify the other shader chunks

To avoid any misunderstanding, you want me to change the other shader chunks to eliminate the multiplication with 3? At some point we will probably have to revisit that part anyway if/when we add additional morphing attributes like uv. When that time comes more extensive changes on how we handle morphing will be needed because now it's a bit fixed like if there's a color morph but not normals the latter will be empty space in the texture. This is how it used to be even before this PR but we can make it more flexible when more attributes are introduced

Let's keep the current version of this PR 👍 .

Mugen87 · 2024-02-19T10:36:22Z

I've checked the files on my repo and they look identical on the eye, maybe its a transparency issue? I'm a bit lost on this one, any help would be appreciated

I've downloaded your branch and tested locally. I can clearly see an artifact at Miku's skirt. You can also see this when going to webgl_loader_mmd_pose and then set the 笑い morph target influence to 1. Notice the black triangle on the left bottom side of the skirt.

wizgrav · 2024-02-19T21:19:30Z

@Mugen87 that artifact was caused by an incorrect calculation of the texture width when it exceeded the hardware maximum. Its fixed now, I did quite a bit of testing and it seems solid. I've also added some sliders on the example to tweak the instance count and set it up to start conservative so its much more useful now for testing

Mugen87

The implementation looks good now! The only issue is that performance testing isn't really possible since there is no example that allows to compare the existing and new approach. Especially since the reference example has been updated with this PR. Given the advantage in bandwidth cost I still think it's worth to given this new approach a try.

wizgrav · 2024-02-20T10:10:16Z

@Mugen87 I've set up the example with the current dev to compare

Current Dev - RGBA format - Fetching 4 floats

This PR - Red format- Fetching 3 floats

Mugen87 · 2024-02-20T10:19:59Z

On a Pixel 4a with latest Chrome I see this with the default grid settings:

Current Dev: 23 FPS
This PR: 20 FPS

On an iPad (8.Gen) with Safari with max grid settings:

Current Dev: 24 FPS
This PR: 23 FPS

So I can't confirm based on both devices that the new implementation is in general faster. Fetching a single RGBA texel seems to be faster than fetching 3 Red texels on the above devices.

Performance regressions on mobile devices.

Mugen87 · 2024-02-20T10:21:30Z

Sry, I have to revert my approval. When can't merge the PR if we see performance degradations. This needs more testing especially since most devices outside there are mobile.

wizgrav · 2024-02-20T10:48:38Z

Sry, I have to revert my approval. When can't merge the PR if we see performance degradations. This needs more testing especially since most devices outside there are mobile.

No worries that's why I set up the examples. At least on my phone with an adreno 640 I see consistent improvement on all grid size eg on 3 x 3 grid it's 34 vs 31 fps

I believe you run into thermal throttling issues but yeah we can keep it in testing for a bit. If you want we can at least include this example instead of the horse one as its way more useful for testing

Mugen87 · 2024-02-20T10:51:11Z

If you want we can at least include this example instead of the horse one as its way more useful for testing

That would be great!

mrdoob · 2024-02-20T12:28:21Z

Pixel 8 Pro

dev	pr

wizgrav · 2024-02-20T12:55:56Z

@Mugen87 @mrdoob yeah your results seem to be consistent with what I get as well but I made a shocking discovery. I tried the examples on a Quest 3 where current dev gives 40 fps while this PR gives almost 90 with the default grid.

This can't be attributed to bandwidth savings alone there seems to be an issue with fetching rgba floats there and if that's the case it probably affects skinned meshes as well as the bone matrices are also stored as rgba. @cabanier can you verify? I will investigate further

wizgrav · 2024-02-20T18:18:21Z

@Mugen87 I did a lot of testing including stress testing skinned meshes and it seems that the quest exhibits this behaviour only when used with instanced morph targets with the current dev. The only thing that comes to my mind is that because we are fetching the weights from a single channel texture and then the morph attributes from rgba ones, this triggers some issue in that platform/driver. With this PR where both textures are single channel it seems that it doesn't

In any case, on all my devices I can only see improvement and I think @mrdoob also has a similar experience from what he posted. How should I handle it? Should I clean this PR and leave only the example, set up the cases live and open an issue for people to test. Or should I leave this PR open and somehow invite people to share here? Please advise

Mugen87 · 2024-02-20T18:59:53Z

Or should I leave this PR open and somehow invite people to share here?

Let's do this. It would be good if a few more people could test both links (ideally with mobile devices) and report performance data. If the majority of devices show an improvement, I'm okay with approving the PR.

CodyJasonBennett · 2024-02-20T19:41:38Z

Tested on Pixel 3a and iPhone 13. Can benchmark on Quest 2-3 later.

Show results:

Pixel 3a, Chrome 121, Android 12

Dev	PR

iPhone 13, iOS 17.3.1

Dev	PR

cabanier · 2024-02-21T04:54:56Z

I tried the examples on a Quest 3 where current dev gives 40 fps while this PR gives almost 90 with the default grid.

That is quite the improvement!

This can't be attributed to bandwidth savings alone there seems to be an issue with fetching rgba floats there and if that's the case it probably affects skinned meshes as well as the bone matrices are also stored as rgba.

I'm out of town this week with no access to a device. Can you describe the change you made? I can forward to our team that works on the low level graphics stack to get their opinion.

wizgrav · 2024-02-21T09:09:11Z

@cabanier

The context is that we are instancing meshes that do morph target animations. Each instance has its own set of weights for the morph targets.

In the current dev we fetch some floats from a single channel texture and fill an array with the weights for the morph targets, then we iterate that array and if an element is > 0 we fetch one or more rgba texels from an array texture to get the attributes for the morphing(position/normal/color). shaderchunk

In this PR we still fetch the weights as before but we convert the attribute array texture to single channel as well and fetch the xyz components for position/normal individually to save bandwidth because in the rgba case we have one component which is always set to zero. shaderchunk

I tested the examples on the Quest 3 and the speedup(>100%) is way more than what we could theoretically achieve which leads me to believe that the setup of the current dev causes some issue with the stack/shader compilation which degrades performance

You can use the links in this comment for reference #27768 (comment)

cabanier · 2024-02-23T00:53:51Z

examples/webgl_instancing_morph.html

-				light.castShadow = true;
+				const renderer = new THREE.WebGLRenderer( {
+					antialias: true,
+					powerPreference: 'high-performance'


Did you mean to set this to 'high-performance'?

cabanier · 2024-02-23T00:56:01Z

We're still looking into why this speedup might happen but someone noticed that you added high-performance to the WebGLRenderer.
Is it still faster if you disable that feature?

wizgrav · 2024-02-23T10:39:05Z

We're still looking into why this speedup might happen but someone noticed that you added high-performance to the WebGLRenderer. Is it still faster if you disable that feature?

I tried removing it on both the current dev and this PR, I even set it to "low-power" but it didn't make a difference. From my understanding this hint applies to dual gpu setups and instructs the browser to select the better one for the webgl context. Like eg in laptops where you have an intel alongside an nvidia. You use it in your stack to enable optimizations?

From my limited knowledge on graphic stacks, I suspect that the slowdown in the current dev happens because different texture formats are used and the stack has to do some reconfiguration on the fly to switch the indexing from one format to the other and this is expensive. But again my knowledge is limited.

cabanier · 2024-02-24T17:47:53Z

From my limited knowledge on graphic stacks, I suspect that the slowdown in the current dev happens because different texture formats are used and the stack has to do some reconfiguration on the fly to switch the indexing from one format to the other and this is expensive. But again my knowledge is limited.

FYI, we're still looking into why this is making such a huge difference. We are suspecting that this could have a more efficient caching strategy but haven't proved it yet.

RenaudRohlinger · 2024-03-24T08:46:02Z

FYI I implemented this PR in the WebGPURenderer and did run some test in both WebGPU and WebGL Backends and using Tight packing via Red Channel seems to increase by 5 to 10% the GPU usage.

#27984

wizgrav · 2024-03-24T11:58:21Z

FYI I implemented this PR in the WebGPURenderer and did run some test in both WebGPU and WebGL Backends and using Tight packing via Red Channel seems to increase by 5 to 10% the GPU usage.

#27984

This is for apple silicon from what I can see in your PR, there are probably alignment issues there but other architectures behave differently. On intel/nvidia/amd and some mobile chipsets we see a good speedup. On the Q3 specifically, and maybe snapdragons in general, we see a major regression when we mix single and multi channel float textures, as the current dev does, in the vertex shader. I'm still waiting for some insight by meta and @rcabanier on this

tight morph target packing

83d7f36

remove redundant assignment

9d48b7a

Mugen87 reviewed Feb 19, 2024

View reviewed changes

Mugen87 changed the title ~~Tight morph target packing~~ WebGLRenderer: Tight morph target packing. Feb 19, 2024

wizgrav added 3 commits February 19, 2024 13:31

updated example with variable instance count

28e55a0

update screenshot

eef9f15

fix width calculation when > maxTextureSize

bda507e

Mugen87 previously approved these changes Feb 20, 2024

View reviewed changes

Mugen87 added this to the r162 milestone Feb 20, 2024

cabanier reviewed Feb 23, 2024

View reviewed changes

mrdoob modified the milestones: r162, r163 Feb 29, 2024

This was referenced Mar 20, 2024

InstancedMesh: Add clone and dispose method for morphTexture #27952

Merged

WebGPURenderer: Tight morph target packing #27984

Closed

mrdoob modified the milestones: r163, r164 Mar 29, 2024

mrdoob modified the milestones: r164, r165 Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebGLRenderer: Tight morph target packing. #27768

WebGLRenderer: Tight morph target packing. #27768

wizgrav commented Feb 19, 2024 •

edited

github-actions bot commented Feb 19, 2024 •

edited

Mugen87 commented Feb 19, 2024

Mugen87 commented Feb 19, 2024 •

edited

Mugen87 Feb 19, 2024 •

edited

wizgrav Feb 19, 2024 •

edited

Mugen87 Feb 19, 2024

wizgrav Feb 19, 2024 •

edited

wizgrav commented Feb 19, 2024

Mugen87 Feb 19, 2024 •

edited

wizgrav Feb 19, 2024 •

edited

wizgrav Feb 19, 2024

Mugen87 Feb 19, 2024

Mugen87 commented Feb 19, 2024 •

edited

wizgrav commented Feb 19, 2024 •

edited

Mugen87 left a comment •

edited

wizgrav commented Feb 20, 2024

Mugen87 commented Feb 20, 2024 •

edited

Mugen87 commented Feb 20, 2024 •

edited

wizgrav commented Feb 20, 2024

Mugen87 commented Feb 20, 2024

mrdoob commented Feb 20, 2024

wizgrav commented Feb 20, 2024 •

edited by mrdoob

wizgrav commented Feb 20, 2024 •

edited

Mugen87 commented Feb 20, 2024 •

edited

CodyJasonBennett commented Feb 20, 2024 •

edited

cabanier commented Feb 21, 2024

wizgrav commented Feb 21, 2024 •

edited

cabanier Feb 23, 2024

cabanier commented Feb 23, 2024

wizgrav commented Feb 23, 2024

cabanier commented Feb 24, 2024

RenaudRohlinger commented Mar 24, 2024

wizgrav commented Mar 24, 2024 •

edited

WebGLRenderer: Tight morph target packing. #27768

Are you sure you want to change the base?

WebGLRenderer: Tight morph target packing. #27768

Conversation

wizgrav commented Feb 19, 2024 • edited

github-actions bot commented Feb 19, 2024 • edited

📦 Bundle size

🌳 Bundle size after tree-shaking

Mugen87 commented Feb 19, 2024

Mugen87 commented Feb 19, 2024 • edited

Mugen87 Feb 19, 2024 • edited

Choose a reason for hiding this comment

wizgrav Feb 19, 2024 • edited

Choose a reason for hiding this comment

Mugen87 Feb 19, 2024

Choose a reason for hiding this comment

wizgrav Feb 19, 2024 • edited

Choose a reason for hiding this comment

wizgrav commented Feb 19, 2024

Mugen87 Feb 19, 2024 • edited

Choose a reason for hiding this comment

wizgrav Feb 19, 2024 • edited

Choose a reason for hiding this comment

wizgrav Feb 19, 2024

Choose a reason for hiding this comment

Mugen87 Feb 19, 2024

Choose a reason for hiding this comment

Mugen87 commented Feb 19, 2024 • edited

wizgrav commented Feb 19, 2024 • edited

Mugen87 left a comment • edited

Choose a reason for hiding this comment

wizgrav commented Feb 20, 2024

Mugen87 commented Feb 20, 2024 • edited

Mugen87 commented Feb 20, 2024 • edited

wizgrav commented Feb 20, 2024

Mugen87 commented Feb 20, 2024

mrdoob commented Feb 20, 2024

wizgrav commented Feb 20, 2024 • edited by mrdoob

wizgrav commented Feb 20, 2024 • edited

Mugen87 commented Feb 20, 2024 • edited

CodyJasonBennett commented Feb 20, 2024 • edited

cabanier commented Feb 21, 2024

wizgrav commented Feb 21, 2024 • edited

cabanier Feb 23, 2024

Choose a reason for hiding this comment

cabanier commented Feb 23, 2024

wizgrav commented Feb 23, 2024

cabanier commented Feb 24, 2024

RenaudRohlinger commented Mar 24, 2024

wizgrav commented Mar 24, 2024 • edited

wizgrav commented Feb 19, 2024 •

edited

github-actions bot commented Feb 19, 2024 •

edited

Mugen87 commented Feb 19, 2024 •

edited

Mugen87 Feb 19, 2024 •

edited

wizgrav Feb 19, 2024 •

edited

wizgrav Feb 19, 2024 •

edited

Mugen87 Feb 19, 2024 •

edited

wizgrav Feb 19, 2024 •

edited

Mugen87 commented Feb 19, 2024 •

edited

wizgrav commented Feb 19, 2024 •

edited

Mugen87 left a comment •

edited

Mugen87 commented Feb 20, 2024 •

edited

Mugen87 commented Feb 20, 2024 •

edited

wizgrav commented Feb 20, 2024 •

edited by mrdoob

wizgrav commented Feb 20, 2024 •

edited

Mugen87 commented Feb 20, 2024 •

edited

CodyJasonBennett commented Feb 20, 2024 •

edited

wizgrav commented Feb 21, 2024 •

edited

wizgrav commented Mar 24, 2024 •

edited