Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up zero initialization of workgroup memory #4592

Open
raphlinus opened this issue Oct 27, 2023 · 2 comments · May be fixed by #5521
Open

Speed up zero initialization of workgroup memory #4592

raphlinus opened this issue Oct 27, 2023 · 2 comments · May be fixed by #5521
Labels
api: dx11 Issues with DX11 or DXGI api: dx12 Issues with DX12 or DXGI api: gles Issues with GLES or WebGL api: metal Issues with Metal api: vulkan Issues with Vulkan area: naga back-end Outputs of naga shader conversion area: performance How fast things go lang: GLSL OpenGL Shading Language lang: HLSL D3D Shading Language lang: Metal Metal Shading Language lang: SPIR-V Vulkan's Shading Language naga Shader Translator type: enhancement New feature or request

Comments

@raphlinus
Copy link
Contributor

This is related to #4591; when forcing spv::ZeroInitializeWorkgroupMemoryMode::Polyfill in device_from_raw(), we observe very slow (but correct!) behavior for zeroing the workgroup shared array - all the work is done on one thread. It would be better to distribute this; in this case the array size and workgroup size match, so for each invocation to zero one array element would be simple and efficient.

zerooooooo.zip

Repro case is the same as the linked bug, but changing line 1307 of vulkan/adapter.rs to Polyfill.

@cwfitzgerald
Copy link
Member

For reference, the zero-init code:

SPIRV: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/spv/writer.rs#L1327
MSL: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/msl/writer.rs#L4441-L4549
HLSL: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/hlsl/writer.rs#L1280-L1305
GLSL: https://github.com/gfx-rs/wgpu/blob/trunk/naga/src/back/glsl/mod.rs#L1688-L1718

I think the easiest lift thing to do is that, for top level arrays, use the local index to init that element of the array, masking off the higher invocations than the array length, and doing a compile time loop for arrays longer than the element count.

@cwfitzgerald cwfitzgerald added api: dx12 Issues with DX12 or DXGI api: metal Issues with Metal api: gles Issues with GLES or WebGL api: dx11 Issues with DX11 or DXGI api: vulkan Issues with Vulkan area: naga back-end Outputs of naga shader conversion naga Shader Translator lang: SPIR-V Vulkan's Shading Language lang: GLSL OpenGL Shading Language lang: Metal Metal Shading Language lang: HLSL D3D Shading Language type: enhancement New feature or request area: performance How fast things go labels Oct 27, 2023
@cwfitzgerald
Copy link
Member

cwfitzgerald commented Oct 27, 2023

To be clear, I think the init shoudl look like this:

var<workgroup> array1: array<u32, 652>;
var<workgroup> array2: array<u32, 256>;
var<workgroup> array3: array<u32, 45>;
var<workgroup> non_array: u32;

@compute @workgroup_size(16, 16)
fn main(@builtin(local_index) local_index: u32) {
    // All unconditional array init
    // Do loop at compile time, just generate multiple writes for long arrays
    array1[local_index] = <zero init>;
    array1[local_index + 256] = <zero init>;
    array2[local_index] = <zero init>;
    if local_index < 140 {
        // Conditional part of array1 is in conditional
        array1[local_index + 512] = <zero init>;
        if local_index < 45 {
            array3[local_index] = <zero init>;
            if local_index < 1 {
                non_array = <zero init>;
            }
        }
    }
    workgroupBarrier();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: dx11 Issues with DX11 or DXGI api: dx12 Issues with DX12 or DXGI api: gles Issues with GLES or WebGL api: metal Issues with Metal api: vulkan Issues with Vulkan area: naga back-end Outputs of naga shader conversion area: performance How fast things go lang: GLSL OpenGL Shading Language lang: HLSL D3D Shading Language lang: Metal Metal Shading Language lang: SPIR-V Vulkan's Shading Language naga Shader Translator type: enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants