Push constant proposal #4612

shaoboyan · 2024-04-30T03:07:16Z

Push constants proposal. Related issue #75

github-actions · 2024-04-30T03:12:09Z

Previews, as seen when this build job started (f15e36a):
WebGPU _{webgpu.idl | Explainer | Correspondence Reference}
WGSL _{grammar.js | wgsl.lalr.txt}

magcius · 2024-04-30T03:17:35Z

proposals/push-constants.md

+dictionary GPUPipelineLayoutDescriptor
+         : GPUObjectDescriptorBase {
+    required sequence<GPUBindGroupLayout> bindGroupLayouts;
+    uint32_t pushConstantsSize = 0;


It should be explicit what size this is (byte size? number of 32-bit parameters?)

Maybe pushConstantBytes?

Change to immediateDataRangeByteSize :)

magcius · 2024-04-30T03:19:02Z

proposals/push-constants.md

+    uint32_t pushConstantsSize = 0;
+};
+```
+two pipeline layouts are defined to be “compatible for push constants” if they were created with identical push constant size. It means push constants values can share between pipeline layouts that are compatible for push constants.


I assume it's a compile error for a module to use push constants outside of its defined size range?

My thoughts is that we follow out-of-bounds access here.

We can validate that the shader module uses no more than the push constants allowed by the module so we should be fine. Also dynamic indexing of push constants is not possible so in the shader we know that all accesses are in bounds.

magcius · 2024-04-30T03:19:41Z

proposals/push-constants.md

+| --- | --- | --- | --- | --- | --- | --- |
+| `var<push_constant>` | Immutable | [Module](https://www.w3.org/TR/WGSL/#module-scope) | [Concrete](https://www.w3.org/TR/WGSL/#type-concrete) [constructible](https://www.w3.org/TR/WGSL/#constructible) [host-shareable](https://www.w3.org/TR/WGSL/#host-shareable) excludes [array types](https://www.w3.org/TR/WGSL/#array-types) and [structure types](https://www.w3.org/TR/WGSL/#struct-types) contains array members | Disallowed	| | Yes. [uniform buffer](https://www.w3.org/TR/WGSL/#uniform-buffer) |
+
+NOTE: Each [entry point](https://www.w3.org/TR/WGSL/#entry-point) can statically use at most one push constant variable.


Why, exactly?

To make implementation a simpler. So we don't need to introduce @offset in wgsl to describe multiple push constant variable.

If we want to enforce a single push constant interface we should consider an alternate design of making it a parameter to the entry point (identified via an attribute). If you did consider it, what did you see as the draw backs?

I personally would prefer push constants to be like bindings, and not tied to a specific entry point. I tend to keep my bindings/push constants in a separate file I can reuse across passes like so.

HLSL seems to have requirements around having push constants be structs anyways gfx-rs/wgpu#5571

Was my strawman in #75 (comment) considered at all? If we assume 4-byte elements and alignment, I think this could become a lot easier.

@magcius in that proposal, where are @parameter variable declared? At the global scope of as function arguments? It seems very similar to specifying an offset, except that it stays in "words" instead of bytes. It'd be nice to support smaller data types in push constants eventually so IMHO it's still be better to act on bytes.

If we want to enforce a single push constant interface we should consider an alternate design of making it a parameter to the entry point (identified via an attribute). If you did consider it, what did you see as the draw backs?

I agree it is an optional design. Personally, I'm slightly prefer current address space mode proposal because:

It defines a global scope variable (like uniform), so that the helper functions could use it directly. ( instead of accepting attributes as parameter, if we pass push constant variable as entry point function parameter)

From API aspect, pushConstantBytes is related with PipelineLayout rather than ShaderModule. Take it as a special binding resource is natural.

magcius · 2024-04-30T03:20:20Z

proposals/push-constants.md

+
+```javascript
+interface GPUCommandEncoder {
+        void setPushConstantU32(uint32_t offset, uint32_t value);


What units is offset in? If bytes, are there any restrictions? If I do setPushConstantU32(1, 0x12345678);, will it overwrite bytes 1, 2, 3, 4 and leave byte 0 alone? Should there be alignment restrictions?

(Same as bufferOffset )
It is in bytes, and it should be multiple of 4 bytes. And the alignment should follow this(https://gpuweb.github.io/gpuweb/wgsl/#alignment-and-size). So it means a the pushconstantSize = variable sizes + padding sizes.

What about f16? Should it just be set via a packed shared buffer?

F16 should be set via packed shared buffer, or setPushConstantU32 so that we don't change the bits.

magcius · 2024-04-30T03:21:16Z

proposals/push-constants.md

+        void setPushConstantU32(uint32_t offset, uint32_t value);
+        void setPushConstantI32(uint32_t offset, int32_t value);
+        void setPushConstantF32(uin32_t offset, float32_t value);
+        void setPushConstants(uint32_t offset, uint32_t count, ArrayBuffer data);


This should probably take a AllowSharedBufferSource

I think it could.

alan-baker · 2024-04-30T13:03:24Z

proposals/push-constants.md

+
+| Address space | Sharing among invocations | Default access mode | Notes |
+| --- | --- | --- | --- |
+| `push_constant` | Invocations in [shader stage](https://www.w3.org/TR/WGSL/#shader-stages) | [read](https://www.w3.org/TR/WGSL/#access-read) | For [uniform buffer](https://www.w3.org/TR/WGSL/#uniform-buffer) variables exclude [array types](https://www.w3.org/TR/WGSL/#array-types) variable or [structure types](https://www.w3.org/TR/WGSL/#struct-types) variable contains [array types](https://www.w3.org/TR/WGSL/#array-types) attributes |


Is the uniform buffer reference a copy paste error?

Separately, are arrays excluded to avoid Vulkan's default uniform access requirements? It will seem arbitrary to users unfamiliar with Vulkan. Is this a better resolution than requiring non-uniform indexing (probably)? Is it better than having a uniformity requirement?

Is the uniform buffer reference a copy paste error?

No, I think uniform buffer variable defined in wgsl aligns with what I want with push constant. I think push constant variables should be host-shareable constructible type, and must satisfy the address space layout constraints.

Separately, are arrays excluded to avoid Vulkan's default uniform access requirements?

Not only Vulkan, but also DX12. DX12 says Arrays are not permitted in constant buffers that get mapped onto root constants since dynamic indexing in the root signature space is not supported (https://learn.microsoft.com/en-us/windows/win32/direct3d12/using-constants-directly-in-the-root-signature)

alan-baker · 2024-04-30T13:05:35Z

proposals/push-constants.md

+| --- | --- | --- | --- | --- | --- | --- |
+| `var<push_constant>` | Immutable | [Module](https://www.w3.org/TR/WGSL/#module-scope) | [Concrete](https://www.w3.org/TR/WGSL/#type-concrete) [constructible](https://www.w3.org/TR/WGSL/#constructible) [host-shareable](https://www.w3.org/TR/WGSL/#host-shareable) excludes [array types](https://www.w3.org/TR/WGSL/#array-types) and [structure types](https://www.w3.org/TR/WGSL/#struct-types) contains array members | Disallowed	| | Yes. [uniform buffer](https://www.w3.org/TR/WGSL/#uniform-buffer) |
+
+NOTE: Each [entry point](https://www.w3.org/TR/WGSL/#entry-point) can statically use at most one push constant variable.


If we want to enforce a single push constant interface we should consider an alternate design of making it a parameter to the entry point (identified via an attribute). If you did consider it, what did you see as the draw backs?

alan-baker · 2024-04-30T13:06:54Z

proposals/push-constants.md

+
+```javascript
+interface GPUCommandEncoder {
+        void setPushConstantU32(uint32_t offset, uint32_t value);


What about f16? Should it just be set via a packed shared buffer?

alan-baker · 2024-04-30T13:10:51Z

proposals/push-constants.md

+Four new functions in `GPUCommandEncoder`.
+
+```javascript
+interface GPUCommandEncoder {


It feels like these commands should either take a pipeline or be on render/compute encoders.

On render/compute encoders. And that's why I tend to add these APIs in GPUCommandEncoder.

As it stands wouldn't all pipelines need the same push constants. That's more what I was getting at of needing to associate the command with a particular pipeline.

GPUCommandEncoder is the top-level encoder. If you want to add these methods to render/compute encoders you need to add them to GPUBindingCommandsMixin (assuming they be used in render bundles).

We could associate push constant values with a particular pipeline by passing them as part of setPipeline() (kind of parallel to overrides in createPipeline()), but since IIUC setPushConstant() is often going to be called multiple times per setPipeline(), I think it makes more sense to make it parallel setBindGroup(): there are separate states for the current {pipeline, bind groups, push constants} and we capture all the current states each time we hit a draw/dispatch call.

As it stands wouldn't all pipelines need the same push constants. That's more what I was getting at of needing to associate the command with a particular pipeline.

Yes. My expected use case is:

passEncoder.setPipeline(). passEncoder.setBindGroup(); passEncoder.setPushConstant(); ... // setBindGroup(); // setPushConstant();

If you want to add these methods to render/compute encoders you need to add them to GPUBindingCommandsMixin (assuming they be used in render bundles).

Thanks! That's the place I expected.

proposals/push-constants.md

kainino0x · 2024-05-08T23:38:43Z

proposals/push-constants.md

+        void setPushConstantU32(uint32_t offset, uint32_t value);
+        void setPushConstantI32(uint32_t offset, int32_t value);
+        void setPushConstantF32(uin32_t offset, float32_t value);
+        void setPushConstants(uint32_t offset, uint32_t count, AllowSharedBufferSource data);


Kelsey points out we would want to add a dataOffset and dataCount (like other entry points that take BufferSources) so we can point directly into a larger array buffer instead of copying it into a new ArrayBuffer).

Updated the API shape to void setImmediateDataRange(uint32_t rangeOffset, AllowSharedBufferSource data, optional dataOffset, optional size);

mwyrzykowski · 2024-05-08T23:38:58Z

proposals/push-constants.md

+        void setPushConstantU32(uint32_t offset, uint32_t value);
+        void setPushConstantI32(uint32_t offset, int32_t value);
+        void setPushConstantF32(uin32_t offset, float32_t value);
+        void setPushConstants(uint32_t offset, uint32_t count, AllowSharedBufferSource data);


In general this looks good, but I think we only need the last signature taking an AllowSharedBufferSource and not the other overloads (U32, I32, F32, etc).

Yes. But tend to keep these U32, I32, F32 for convenience , especially we tend to support small amount of data uploading.

mwyrzykowski · 2024-05-08T23:47:27Z

I think this will have impact on the already complex matrix of GPUSupportedLimits.

Have we considered making setVertexBuffer take an ArrayBufferSource? Then we don't need to change the limits.

I didn't take a look at the D3D or Vulkan backends yet so maybe that is the motivation? On Metal, this is as simple as calling setVertexBytes: / setFragmentBytes:

litherum · 2024-05-09T01:00:57Z

Assuming that push constants would be implemented using D3D12's root constants...

If I'm not mistaken, D3D12 has a fixed (per-device?) amount of space in the root signature. In shipping WebGPU, the WebGPU implementation gets to decide how to use that space.

For example, if the WebGPU Pipeline Layout includes 2 textures, the WebGPU implementation can place those 2 texture descriptors directly inside the D3D12 root signature. Or, alternatively, the WebGPU implementation can choose to put both texture descriptors into a descriptor table, and make the root signature point to the descriptor table. The WebGPU implementation has freedom here to use this fixed root signature space as it desires.

Okay, now let's add in support for push constants. Push constants would also live directly within the fixed root signature space. But what if all the root signature space is already occupied (e.g. by those texture descriptors from the above paragraph)?

Even if WebGPU only allows a single push constant, that doesn't actually solve the problem. The root signature space may be 100% occupied by descriptors, leaving no space at all for push constants.

There are a variety of potential solutions to this problem:

Make push constants space allocation be best-effort. If there isn't space within the root signature, fail to set the push constant. Different devices will fail in different situations.
Tell every implementation that it must leave X bytes of extra space in the root signature for push constants, where "X" is a literal value spelled out in the spec. This is unfortunate because it pessimizes - the app might not need all X bytes, and the remaining bytes might be better spent on descriptors
Have the app request how much push constant space it needs, within the WebGPU Pipeline Layout, and add platform-agnostic validation rules to make sure the app doesn't ask for too much space. Inventing such validation rules will be tricky, because the rules would have to balance the freedom of WebGPU implementations to place descriptors freely, with the desire to allow applications to request as much push constant memory as they want/need. Another reason I don't like this option is that, if the application requests too much push constant space and pipeline layout creation fails, that doesn't mean they can't access data of the requested size, but it instead just means they can't access data of the requested size as push constants.
(My favorite option): Let push constants be an implementation detail, rather than an explicit part of the WebGPU API, and treat the root signature space as just another part of the pipeline layout.
- As @mwyrzykowski suggests, the group could add a new method to GPUBindingCommandsMixin (as a sibling to setBindGroup()) that accepts immediate data (in the form of an ArrayBuffer) rather than a BindGroup. (Or, maybe, alternatively, this could be an overload of createBindGroup() that takes immediate data.)
- The group could even add a maxBindingSize field to GPUBufferBindingLayout, which would allow the the WebGPU pipeline creation routine can know whether the data is small enough that it can guarantee that the data fits into push constant space.
- Limits accounting doesn't change at all, which is appealing

litherum · 2024-05-09T01:13:33Z

Previously conducted performance analysis of push constants:

Kangz · 2024-05-14T22:14:29Z

GPU Web WG 2024-05-08 Pacific-time

SY: Added a new address space called “push_constant”. Similar to uniform but:
- Each entry point can only use one push_constant binding, for easier layout.
- Cannot index arrays in push_constant space.
SY: Question of how large the space can be. We need to reserve some space for the implementation. Limit is small on backends (e.g. root descriptor in d3d12)
SY: Also question of how to upload push constants. Proposed to add where setBindGroup is available (render pass, compute pass, and render bundle encoders).
SY: Improved performance over uniform buffer
- Investigation : Push Constants #75 (comment)
KG: Mozilla needs some time to review this
SY: Alan is asking about lifting the restriction for one push_constant per entry point.
MM: Isn’t the amount of available space dependent on the number of bind groups (which also take space in the root descriptor)?
- SY: Yes, d3d12 64 DWORDs. Even a small limit is useful, though prefer to make it larger
- MM: If you add accounting that bind groups + push constants is under some limit (think this is a good idea) then the mapping from bind groups to root constants needs to be described in our spec. Restricts implementations ability to hoist push constants in/out of tables.
- SY: Would you mind leaving a comment on the PR?
- MM: Sure
- MW: Is there a reason we are designing this proposal instead of setVertexBuffer with dynamic data or similar?
- MM: 3 or 4 years ago we discussed this; we ended up on the trail of whether push constants was actually faster than putting the data in a buffer. Why can’t it be polyfilled by the application. Not sure if performance was proven to be better. Then the discussion veered off into whether, if we’re going to specify push constants, should we say there are no limits on push constants, and the implementation would only use backend push constants if it was small enough. Otherwise silently promote to a buffer.
- MW: Do think the performance is better (at least in metal backend for small sizes). Question is why introduce new comment instead of allowing attaching an ArrayBuffer as a vertex buffer.
  - Other thoughts:
    - I mean we could have either:
    - void setPushConstants(uint32_t offset, uint32_t count, AllowSharedBufferSource data);
    - or
    - setVertexBuffer(GPUIndex32 slot, AllowSharedBufferSource data, optional GPUSize64 offset, optional GPUSize64 size);
    - The latter seems simpler so we don't have to rework the GPUSupportedLimits as this will impact those. The backend internally can decide whether or not to create a buffer or use push constants.
    - There are also implications where if this is supported by GPURenderBundle, then we would likely create a buffer in any case
- MM: In SPIR-V, shader needs to know whether it’s reading from a push constant. Would need to know at compile time where it’s pulling the data from.
- KG: For why we have setPushConstantU32/I32/F32 instead of just the AllowSharedBufferSource overload, still can be debated.
KG: (This issue here is M2, so making this M2 also)
KG: We will take a closer look.

shaoboyan · 2024-05-15T02:35:51Z

Let push constants be an implementation detail, rather than an explicit part of the WebGPU API, and treat the root signature space as just another part of the pipeline layout.

Just point out possible implementation issues I observed:

On Metal this works fine since it is super align with metal API SetVertexBytes() and so on. But it might have some problem for Vulkan and D3D12. Both Vulkan and D3D12 defines push constants or root constants in pipelineLayout. When calling SetPushConstants(), we need to ensure the pipeline has such push constants range. I think this implementation hints that we need to allocate remain root space as root constants and we need to always keep a internal binding point for each bind group to handle the case that "User setPushConstants() but we don't have enough space, so fallback to a buffer".

mwyrzykowski · 2024-05-15T04:26:21Z

I think this implementation hints that we need to allocate remain root space as root constants and we need to always keep a internal binding point for each bind group to handle the case that "User setPushConstants() but we don't have enough space, so fallback to a buffer".

It sounds like D3D and Vulkan can be optimized to not use a buffer when the push constants are small enough and there is space. With Metal, we always need to reserve a buffer slot for the push constants. Though it would be possible to map them all into one buffer slot.

Abstracting this concept seems to further justify that the term 'push constants' does not necessarily need to exist in WebGPU. Rather, the D3D and Vulkan backends can implement small, dynamic setBindGroup calls into push constants and on Metal, we would use set[Vertex/Fragment]Bytes unless the data is large, then we would make a buffer

JMS55 · 2024-05-15T04:33:44Z

If you tie push constants to specifically render pipelines, then you can't use them for compute shaders (and later, other types of pipelines), which would be unfortunate.

litherum · 2024-05-15T04:39:00Z

@JMS55

Yeah, people in this thread are using a variety of different terminology. Above, when people say "setVertexBytes()" or "setFragmentBytes()," I believe they are referring to the Metal functions of the same name, as an example of a function that takes immediate data (rather than a method taking resources or bind groups). Even in Metal, there's a way to supply immediate data to compute pipelines.

litherum · 2024-05-15T05:23:57Z

tl;dr: Thinking deeper, I think the only way it can work in Vulkan is either to just not add the feature at all, or add an explicit API for it.

I think the biggest unanswered question here regarding "let push constants be an implementation detail" is this: Consider such a program:

let renderPassEncoder = commandEncoder.beginRenderPass(...);
renderPassEncoder.setPipeline(...);
renderPassEncoder.setImmediateData(new Int32Array([42])); // Unclear how this will interact with bind groups, but let's assume that problem is surmountable...
renderPassEncoder.draw(...);
renderPassEncoder.setImmediateData(new Int32Array(/* a very long array */));
renderPassEncoder.draw(...);

The question is: What should the code in the shader be to handle both of these cases? In the first case, we'd want to promote the 42 to a push constant (probably? AFAICT it hasn't been demonstrated whether push constants are actually consistently better to use cross-platform), but in the second case, the data is too big to fit in push constants. Both situations would (probably) need to be supported by a single shader - setPipeline() is only called once in the above code sample.

In SPIR-V, the text of the shader is actually different depending on whether or not you're accessing a push constant or not. For example, in GLSL (which I'm using here just as a readable form of SPIR-V), you'd either have this:

layout(push_constant) uniform Constants {
	int pushConstantData;
};

Or this:

layout(set = 0, binding = 0) uniform Constants {
	int uboData;
};

If we operate under the constraint that no amount of recompilation is allowed to happen while commands are being recorded into the render pass - not even supplying specialization constants - then the only way this can work is if both blocks are present in the shader, and each access to the data is guarded by a flag - in software - to determine whether the data was bound to pushConstantData or uboData. Adding this selection in software probably defeats the whole purpose of trying to use push constants in the first place. It's probably a dealbreaker. (But its performance should probably be measured nevertheless.)

Assuming it is in fact a dealbreaker, it seems like the only way this could ever work (again, assuming that the committee does want to make it work, which I'm still not convinced has been sufficiently justified) would be to come up with a system where the implementation can guarantee that the immediate data being set by the app will fit into push constant space. This is why I suggested a maxBindingSize field within the pipeline layout. Such a field could allow the implementation to know that the data provided by the application will fit into push constants.

But there's even an additional problem - what about the following program:

let renderPassEncoder = commandEncoder.beginRenderPass(...);
renderPassEncoder.setPipeline(...);
renderPassEncoder.setImmediateData(new Int32Array([42]));
renderPassEncoder.draw(...);
renderPassEncoder.setBindGroup(/* clobbering the immediate data above */);
renderPassEncoder.draw(...);

How should this work, assuming no amount of recompilation is acceptable? Imagine the bind group contains a buffer whose contents is populated by a previous compute shader - the CPU has never seen this data, and therefore such data cannot be hoisted to push constants. There would have to be some way, at the API level, to distinguish the immediate data from the bind group data, so that the shader wouldn't have to handle both cases in software at runtime.

I think the only way this could work is to add a new type of descriptor in the bind group layout. That way, an entry in the bind group layout could either refer to a buffer, or refer to immediate data, but not both. Any data set by immediates for one draw call would be incapable of being clobbered by a GPU-populated buffer for a subsequent draw call - thereby allowing the shader compiler to know that it's safe to only emit the layout(push_constant) block above.

So then that begs the question: What should this new type of descriptor in the bind group layout be named? Perhaps it should be named ... "push constant"...

magcius · 2024-05-15T05:47:19Z

This is turning into a bit of a bikeshed opportunity, but I'll add my two cents to the jar:

I do not want shader recompilation at command buffer runtime, after pipelines have been created. This would be a very unfortunate thing to require implementations to do, after we spent the effort to put PSOs into the spec. Expensive synchronous recompilations cause stalls that we are trying to avoid in WebGPU.
The big value I see in push constants is boot-strapping indices into other buffers or bind groups; this is where I would expect to see them come in the most handy.
Just because D3D12 offers an ability to embed some maximum number of root constants, does not mean that going all the way up to that number is necessarily free. My proposal was limited to 4 u32's for a reason.
Making push constants transparent to the application would be extra difficult to support uses that mix both bind groups and push constants. See below for more details.
Push constants might be especially relevant when talking about newer tools like "bindless", where shaders index arrays of resources like textures or buffers.

To give a bit of an unprompted history lesson and awareness of how these features are implemented in practice...

The simplified AMD interpretation of push constants (not applicable to all drivers and hardware, just giving one mental model): The hardware has some general-purpose 32-bit scalar registers (SGPRs); these registers are active storage for threads, but also can be initialized with user constants set up by the hardware that launches threads. Some of these registers are used by the driver itself for internal purposes (e.g. the "base draw offset" feature from Vulkan would get passed by an SGPR), some of them are used by the driver to implement binding models (D3D12 CBVs might get mapped directly to an SGPR), and some of them are used to store memory addresses to binding tables (e.g. if you have a lot of textures, the driver will allocate a block of memory which has the texture descriptors, and store a pointer to the table in an SGPR; these are D3D12's "descriptor tables").

The D3D12 idea was that all root signature parameters (root constants, CBVs, descriptor tables) would map to SGPRs; however, as the driver sometimes needs some for its own purposes, this isn't a 1:1 mapping, and sometimes the driver has to jettison some of the parameters to a memory block and do some finagling to rearrange all of the binding updates. Note that since the usage of an SGPR is arbitrary (it can contain a memory address or a value, just depending on how it's used), the usage needs to be determined at root signature time; in practice, the AMD compiler effectively reserves a few driver SGPRs that work across all platforms.

So, in order to make "transparent push constants" work, we can't simply replace the first N bytes of a uniform buffer of a bind group with immediate data; using it with a bind group will require a pointer in the SGPRs, while using it with push constants will require the values in SGPRs. Instead, we'll need syntax that lets us optimistically compile the first N push constants to SGPRs, and then the rest get jettisoned to a memory buffer. This is enabled a bit more by the extra pipeline layout field so we know up-front what will fit, but it still seems more complicated than just having a small fixed-size limit in the specification. We can always expand to support bigger sizes through emulation later.

litherum · 2024-05-15T08:11:47Z

The big value I see in push constants is boot-strapping indices into other buffers or bind groups

shaders index arrays of resources like textures or buffers.

I think we have to be careful here - if the biggest use case is simply "make some indices available to a shader" then a simple API convenience method would be sufficient, without changing the binding model or adding push constants at all.

litherum · 2024-05-16T23:48:14Z

(I just realized that we already have that simple API convenience method: GPUQueue.writeBuffer(). I think the burden of proof needs to be "why push constants in addition to writeBuffer()?"

shaoboyan · 2024-05-17T06:34:51Z

I think the burden of proof needs to be "why push constants in addition to writeBuffer()?"

A quick thoughts is that developer don't need to call CreateBuffer() and manage that buffer to hold these small amount of data.

magcius · 2024-05-17T06:52:03Z

I think the burden of proof needs to be "why push constants in addition to writeBuffer()?"

writeBuffer cannot be used while inside of a render pass, as it executes at the queue level

litherum · 2024-05-17T18:07:12Z

Aha! You’re right.

So I guess the alternative would be “something like writeBuffer but works in a render pass”

magcius · 2024-05-17T19:46:53Z

There is no way to support writeBuffer-but-it-works-in-a-render-pass easily in D3D12 and Vulkan. You can support a limited amount of data being sent immediately to the shader through D3D12 root constants and Vulkan push constants, but that limit is fairly small, and it requires special shader changes.

One alternative to push constants on Vulkan is dynamic uniform buffer offsets, but those have heavy limitations on alignment that can be detrimental.

If you go down the end of this road, you end up at OpenGL uniforms/uniform buffers, and D3D11 constant buffers, both of which have numerous, well-documented pitfalls and performance hazards. The easiest and safest way to use D3D11 constant buffers were with the DISCARD/NOOVERWRITE flags which effectively turned them into our dynamic uniform buffers. They were replaced in D3D12/Vulkan.

if the biggest use case is simply "make some indices available to a shader" then a simple API convenience method would be sufficient, without changing the binding model or adding push constants at all.

What kind of features are you thinking of?

mwyrzykowski · 2024-05-17T20:37:59Z

proposals/push-constants.md

+dictionary GPUPipelineLayoutDescriptor
+         : GPUObjectDescriptorBase {
+    required sequence<GPUBindGroupLayout> bindGroupLayouts;
+    uint32_t pushConstantsByte = 0;


what is pushConstantsByte? Should this be pushConstantsSize?

Ah, maybe pushConstantsBytes? The previous comments is that we don't know the pushConstantSize is defined as bytes or number.

Changing to immediateDataRangeByteSize :)

mwyrzykowski · 2024-05-17T20:39:10Z

proposals/push-constants.md

@@ -0,0 +1,73 @@
+# PushConstants


Should we really use the term 'push constants' which seems like a non-universal, backend specific name as opposed to something like 'immediate data'?

mwyrzykowski · 2024-05-17T20:42:57Z

There is no way to support writeBuffer-but-it-works-in-a-render-pass easily in D3D12 and Vulkan. You can support a limited amount of data being sent immediately to the shader through D3D12 root constants and Vulkan push constants, but that limit is fairly small, and it requires special shader changes.

We could also make setBindGroup take an ArrayBuffer and document that the corresponding pipeline's immediate / push constant size in bytes must be >= size in bytes of the given ArrayBuffer or ArrayBuffers bound using this immediate mode.

I didn't check the D3D specifics but that seems like it would work in Vulkan and Metal

shaoboyan · 2024-05-20T03:19:42Z

We could also make setBindGroup take an ArrayBuffer and document that the corresponding pipeline's immediate / push constant size in bytes must be >= size in bytes of the given ArrayBuffer or ArrayBuffers bound using this immediate mode.

My understanding is that the benefit of this is we don't introduce new API name?
Because this seems identical to void setPushConstants(uint32_t offset, uint32_t count, ArrayBuffer data);.

Kangz · 2024-05-20T17:02:51Z

Recapping a bunch of the discussion above (hopefully the non-controversial points):

We see the need to pass in a few per-draw indices in the shader to bootstrap access to other resources. Dynamic uniform buffers are the current solution but a bit too expensive (due to the alignment requirement) and annoying to use.
The constraint on no pipeline recompilation means that something has to be declared at pipeline compilation time, likely in the pipeline layout.
Push constant as a name is a Vulkan-ism. We'll need to bikeshed the name at some point.

The "immediate" terminology @litherum used is nice because it is related to what the developer intends to do (immediately set this piece of data) (also similar to immediates in assembly instructions like "move") instead of a potential hardware mechanism (push constants).

IMHO it would be better to have immediates set directly instead of via an aggregate command like setBindGroup because there are likely many cases (alluded to by @magcius) where no other resource needs to change, just information as to where to index in these resources. We are also fairly constrained by Vulkan because the WebGPU spec uses the 4 guaranteed Vulkan descriptor set slots, so we can't really use a new one just for immediates.

@mwyrzykowski isn't your implementation of WebGPU using set*Bytes already? In Dawn we needed to pass some resource sizes, viewport bounds and a few other things in there. It would be possible to have a buffer slot used for both the developer's immediates and the implementation's immediates (by concatenating all of them).

mwyrzykowski · 2024-05-20T17:14:57Z

@mwyrzykowski isn't your implementation of WebGPU using set*Bytes already? In Dawn we needed to pass some resource sizes, viewport bounds and a few other things in there. It would be possible to have a buffer slot used for both the developer's immediates and the implementation's immediates (by concatenating all of them).

Yes we already call setBytes/setVertexBytes/setFragmentBytes. This is a good point, we could reuse the slot assuming this is limited to a certain number of bytes (like 1024 or less). So then no impact to the limits 👍

I agree the term 'immediate' is very much preferred over 'push constants'.

litherum · 2024-05-20T18:13:48Z

The constraint on no pipeline recompilation means that something has to be declared at pipeline compilation time, likely in the pipeline layout.

I don't think this has been proven yet. This would be true if:

Performance was the motivation for push constants, rather than convenience, and
Dynamically selecting between push constants and buffers in software in the shader was demonstrated to be too slow.

litherum · 2024-05-20T18:22:04Z

There is no way to support writeBuffer-but-it-works-in-a-render-pass easily in D3D12 and Vulkan.

if the biggest use case is simply "make some indices available to a shader" then a simple API convenience method would be sufficient, without changing the binding model or adding push constants at all.

What kind of features are you thinking of?

You're right that, from within a render pass, it's impossible* to update a buffer in a well-defined way (where "well-defined" means: a set of known draw calls from early in the render pass don't see the update, and a set of draw calls from later in the render pass do see the update).

(*For some definition of "impossible.")

Therefore, if the goal was just convenience, it would have to work like this:

let renderPassEncoder = commandEncoder.beginRenderPass(...);
renderPassEncoder.setPipeline(...);
let newBindGroup = device.createBindGroup(..., new Int32Array([42]), other, resources, etc); // immediate data gets either put into temp buffer storage (which IIRC all implementations already use) and/or push constants
renderPassEncoder.setBindGroup(0, newBindGroup); // gets turned into both vkCmdBindDescriptorSets() and vkCmdPushConstants()
renderPassEncoder.draw(...);
let secondNewBindGroup = device.copyBindGroupWithUpdate(newBindGroup, new Int32Array([43])); // only update the immediate data
renderPassEncoder.setBindGroup(0, secondNewBindGroup); // gets turned into _just_ vkCmdPushConstants(), because the previously-bound Vulkan descriptor set is still correct
renderPassEncoder.draw(...);

(Note that the second call to setBindGroup() only ends up setting push constants, which solves @Kangz's concern "there are likely many cases ... where no other resource needs to change.")

Kangz · 2024-05-21T15:31:45Z

I'm not sure that copyBindGroupWithUpdate is that much more convenient than setImmediateU32(0, 42) TBH. Also won't you need to make placeholder value in bindgroup creation in most cases since indices into that bindgroup are not known? (or specify it is defaulted to 0)

litherum · 2024-05-29T20:53:35Z

I'm not sure that copyBindGroupWithUpdate is that much more convenient than setImmediateU32(0, 42) TBH.

Sorry, I don't think I was being clear. From where I stand, it looks like there are these possible paths forward:

(The null hypothesis) Implementations make no changes. Don't use push constants.
Don't make any change to the WebGPU spec, but implementations opportunistically and silently promote specific calling patterns to use push constants under-the-hood. Implementations compile shaders with both codepaths (push constants and traditional buffers) with a software flag to switch between them at runtime.
Same as (2), but add additional "immediate" methods to WebGPU, with their semantics defined using the existing binding model (and therefore can always be implemented on top of traditional buffers), but whose semantics make it clear that the provided data would be good candidates to (silently!) hoist to push constants.
Same as (3), but make the "immediate" methods only work with a new kind of binding layout type specifically for immediate data. If the data provided is too large, silently fall back to using buffers under the hood. Shaders still have to be compiled with both runtime-switchable code paths, but the API implementation gets a little simpler (by essentially treating buffers and immediate data as namespaced separately - rather than both ultimately being backed by buffers).
Same as (4), but limit the size of immediate data so that it's guaranteed to fit into push constants. Shaders only need to be compiled with a single path. Add a bunch of validation rules to guarantee the immediate data can coexist with the rest of the pipeline layout.

If performance is the goal (e.g. performance with push constants is shown to be consistently better than buffers on a variety of vendors' cards), and it's demonstrated that a software flag to switch between both codepaths is too slow, then option 5 is the only acceptable option.

However, if convenience is the goal, or a software codepath flag is fast enough, options 2-4 are better than options 0 or 5.

magcius · 2024-05-29T21:31:25Z

Don't make any change to the WebGPU spec, but implementations opportunistically and silently promote specific calling patterns to use push constants under-the-hood.

If e.g. Chrome does this but WebKit does not, this means that performance can vary rapidly between vendor, leading to a real issue with performance portability. Making portable web applications that run acceptably across vendors is already hard enough.

While I don't think there's any spec guarantee that createShaderModule/createRenderPipeline only compiles a single shader, I think it's a reasonable assumption. Having web vendors spend precious startup time compiling multiple variations of the same shader "just in case" seems silly when I could instead tell the API what I want.

From that perspective, options 2, 3, and 4 are unattractive to me as a web developer and would be more inconvenient to me than options 0 and 5, because now I have multiple ways of doing something and the performance characteristics can change not just based on the GPU vendor, but also the browser vendor as well.

litherum · 2024-05-29T21:43:27Z

If e.g. Chrome does this but WebKit does not, this means that performance can vary rapidly between vendor, leading to a real issue with performance portability.

This seems to assume that performance is the motivation for push constants, which I'm not sure is true. But, if it is, you are correct, so long as the spec doesn't describe what the hoisting heuristic is (which the spec could do!)

I don't think there's any spec guarantee that createShaderModule/createRenderPipeline only compiles a single shader

None of the options I listed involve a single call to createShaderModule()/createRenderPipeline() compiling multiple shaders. I'm describing a single shader being compiled, which contains both layout(push_constant) uniform Constants1 { stuff } constants1 and layout(set = 0, binding = 0) uniform Constants2 { stuff } constants2 with accesses to stuff being guarded with a software flag.

the performance characteristics can change not just based on the GPU vendor, but also the browser vendor as well.

I think this is reductive. The amount of push constants space in Vulkan will be different than the amount of root constant space in D3D12 (which will be different than any potential driver optimizations of setVertexBytes() in Metal would use). If the spec limits WebGPU push constants to only fit into the smaller of the two sizes, implementations would be leaving either performance, or memory, on the table. I understand the desire for portable performance, but that also needs to be weighed against going as fast as the underlying API allows for.

shaoboyan · 2024-05-29T23:49:49Z

This seems to assume that performance is the motivation for push constants, which I'm not sure is true

I'll run my RootConstant Sample on various vendors GPUs to see the results.

Sorry, I don't think I was being clear. From where I stand, it looks like there are these possible paths forward:

I'm still concern about the implementation burdens we take. Especially on Vulkan backend we don't have extra descriptor set slot for implicit uploading buffer. And a switch-branch in shader has concerns too.

I think this feature doesn't want to encourage developers to uploading large data. That's why I'm not a fan of 3, 4. I don't want to provide a method that hints people "We can upload large amount of data in render/compute passes". To avoid confusing developers about "which uploading approach should I choose ?", a LIMIT helps hint "Immediate data" is only for small amount of data, even we do a silently fallback in implementation in final state (For some large limit size).

shaoboyan requested review from Kangz, kainino0x, kdashg and bjjones April 30, 2024 03:07

shaoboyan mentioned this pull request Apr 30, 2024

Investigation : Push Constants #75

Open

magcius reviewed Apr 30, 2024

View reviewed changes

alan-baker reviewed Apr 30, 2024

View reviewed changes

dneto0 reviewed Apr 30, 2024

View reviewed changes

proposals/push-constants.md Outdated Show resolved Hide resolved

kdashg added the wgsl WebGPU Shading Language Issues label Apr 30, 2024

kainino0x added api WebGPU API proposal labels Apr 30, 2024

shaoboyan force-pushed the pushconstants branch from 35d9772 to dff461f Compare May 7, 2024 00:57

shaoboyan requested a review from toji May 7, 2024 08:09

mwyrzykowski self-requested a review May 8, 2024 23:29

kdashg added this to the Milestone 2 milestone May 8, 2024

kainino0x reviewed May 8, 2024

View reviewed changes

mwyrzykowski reviewed May 8, 2024

View reviewed changes

mwyrzykowski reviewed May 17, 2024

View reviewed changes

shaoboyan added 5 commits May 30, 2024 10:53

Push constant proposal

4070be3

Address comments.

815f74a

Address some comments

b6873f3

Fix setPushConstant() by moving into GPUBindingCommandsMixin interface

d4ae72b

Push Constant -> Immediate Data

f15e36a

shaoboyan force-pushed the pushconstants branch from fc36660 to f15e36a Compare May 30, 2024 03:10

Push constant proposal #4612

Are you sure you want to change the base?

Push constant proposal #4612

Conversation

shaoboyan commented Apr 30, 2024

github-actions bot commented Apr 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shaoboyan Apr 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shaoboyan May 6, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shaoboyan May 6, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwyrzykowski commented May 8, 2024

litherum commented May 9, 2024 • edited

litherum commented May 9, 2024

Kangz commented May 14, 2024

shaoboyan commented May 15, 2024

mwyrzykowski commented May 15, 2024

JMS55 commented May 15, 2024

litherum commented May 15, 2024

litherum commented May 15, 2024

magcius commented May 15, 2024

litherum commented May 15, 2024 • edited

litherum commented May 16, 2024

shaoboyan commented May 17, 2024 • edited

magcius commented May 17, 2024

litherum commented May 17, 2024

magcius commented May 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mwyrzykowski commented May 17, 2024

shaoboyan commented May 20, 2024

Kangz commented May 20, 2024

mwyrzykowski commented May 20, 2024

litherum commented May 20, 2024

litherum commented May 20, 2024 • edited

Kangz commented May 21, 2024

litherum commented May 29, 2024

magcius commented May 29, 2024

litherum commented May 29, 2024 • edited

shaoboyan commented May 29, 2024

github-actions bot commented Apr 30, 2024 •

edited

shaoboyan Apr 30, 2024 •

edited

shaoboyan May 6, 2024 •

edited

shaoboyan May 6, 2024 •

edited

litherum commented May 9, 2024 •

edited

litherum commented May 15, 2024 •

edited

shaoboyan commented May 17, 2024 •

edited

magcius commented May 17, 2024 •

edited

litherum commented May 20, 2024 •

edited

litherum commented May 29, 2024 •

edited