Speed up zero initialization of workgroup memory #4592

raphlinus · 2023-10-27T18:15:49Z

This is related to #4591; when forcing spv::ZeroInitializeWorkgroupMemoryMode::Polyfill in device_from_raw(), we observe very slow (but correct!) behavior for zeroing the workgroup shared array - all the work is done on one thread. It would be better to distribute this; in this case the array size and workgroup size match, so for each invocation to zero one array element would be simple and efficient.

zerooooooo.zip

Repro case is the same as the linked bug, but changing line 1307 of vulkan/adapter.rs to Polyfill.

The text was updated successfully, but these errors were encountered:

cwfitzgerald · 2023-10-27T18:25:09Z

For reference, the zero-init code:

SPIRV: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/spv/writer.rs#L1327
MSL: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/msl/writer.rs#L4441-L4549
HLSL: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/hlsl/writer.rs#L1280-L1305
GLSL: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/glsl/mod.rs#L1688-L1718

I think the easiest lift thing to do is that, for top level arrays, use the local index to init that element of the array, masking off the higher invocations than the array length, and doing a compile time loop for arrays longer than the element count.

cwfitzgerald · 2023-10-27T18:36:52Z

To be clear, I think the init shoudl look like this:

var<workgroup> array1: array<u32, 652>;
var<workgroup> array2: array<u32, 256>;
var<workgroup> array3: array<u32, 45>;
var<workgroup> non_array: u32;

@compute @workgroup_size(16, 16)
fn main(@builtin(local_index) local_index: u32) {
    // All unconditional array init
    // Do loop at compile time, just generate multiple writes for long arrays
    array1[local_index] = <zero init>;
    array1[local_index + 256] = <zero init>;
    array2[local_index] = <zero init>;
    if local_index < 140 {
        // Conditional part of array1 is in conditional
        array1[local_index + 512] = <zero init>;
        if local_index < 45 {
            array3[local_index] = <zero init>;
            if local_index < 1 {
                non_array = <zero init>;
            }
        }
    }
    workgroupBarrier();
}

teoxoy mentioned this issue Nov 2, 2023

Spread out workgroup variable initialization across workgroup #4469

Closed

This was referenced Apr 8, 2024

Allow configuring whether workgroup memory is zero initialised #5508

Merged

Fix zero initialization of workgroup memory gfx-rs/naga#2259

Merged

Improve the polyfill for workgroup variable zero initialization #5521

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up zero initialization of workgroup memory #4592

Speed up zero initialization of workgroup memory #4592

raphlinus commented Oct 27, 2023

cwfitzgerald commented Oct 27, 2023

cwfitzgerald commented Oct 27, 2023 •

edited

Speed up zero initialization of workgroup memory #4592

Speed up zero initialization of workgroup memory #4592

Comments

raphlinus commented Oct 27, 2023

cwfitzgerald commented Oct 27, 2023

cwfitzgerald commented Oct 27, 2023 • edited

cwfitzgerald commented Oct 27, 2023 •

edited