Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrefixSum and PostfixSum not working #28

Open
xaphier opened this issue Jun 26, 2019 · 3 comments
Open

PrefixSum and PostfixSum not working #28

xaphier opened this issue Jun 26, 2019 · 3 comments

Comments

@xaphier
Copy link

xaphier commented Jun 26, 2019

Even in a very simple compute shader (DX11), the AmdDxExtShaderIntrinsics_WavePrefixSum and AmdDxExtShaderIntrinsics_WavePostfixSum produce wrong results. I tried on different hardware (RX 480, WX7100 and WX9100) all giving completely wrong results. I also tried different driver versions and always check that the AGS_DX11_EXTENSION_INTRINSIC_WAVE_REDUCE extension is supported.
This is a simple shader using the AmdDxExtShaderIntrinsics_WavePrefixSum (which produces bogus values) or optionally (just commenting out the USE_WAVE_PREFIX_SUM define) using AmdDxExtShaderIntrinsics_SwizzleU & AmdDxExtShaderIntrinsics_ReadlaneU to manually creating the prefix sum (which produces the correct values).

#include "ags_shader_intrinsics_dx11.hlsl"

RWBuffer<uint> dst : register(u0);

#define USE_WAVE_PREFIX_SUM

#define MAKE_MASK(XOR, OR, AND) (((XOR) << 10) | ((OR) << 5) | (AND))

[numthreads(8, 8, 1)] void main(uint3 groupId
                                : SV_GroupID, uint3 dispatchThreadId
                                : SV_DispatchThreadID, uint3 groupThreadId
                                : SV_GroupThreadID, uint groupIndex
                                : SV_GroupIndex) {
    uint groupIdx = groupId.x * 8 * 8;
    uint v0 = groupIndex;
	
#ifndef USE_WAVE_PREFIX_SUM
    uint sum = v0;
    uint value = 0;
    value = AmdDxExtShaderIntrinsics_SwizzleU(sum, AmdDxExtShaderIntrinsicsSwizzle_SwapX1);
    sum += (groupIndex & 1) == 0 ? 0 : value;
    value = AmdDxExtShaderIntrinsics_SwizzleU(sum, MAKE_MASK(0x00, 0x01, 0x1C));
    sum += (groupIndex & 2) == 0 ? 0 : value;
    value = AmdDxExtShaderIntrinsics_SwizzleU(sum, MAKE_MASK(0x00, 0x03, 0x18));
    sum += (groupIndex & 4) == 0 ? 0 : value;
    value = AmdDxExtShaderIntrinsics_SwizzleU(sum, MAKE_MASK(0x00, 0x07, 0x10));
    sum += (groupIndex & 8) == 0 ? 0 : value;
    value = AmdDxExtShaderIntrinsics_SwizzleU(sum, MAKE_MASK(0x00, 0x0F, 0x00));
    sum += (groupIndex & 16) == 0 ? 0 : value;
    value = AmdDxExtShaderIntrinsics_ReadlaneU(sum, 31);
    sum += (groupIndex & 32) == 0 ? 0 : value;
#else
    uint sum = AmdDxExtShaderIntrinsics_WavePrefixSum(v0);
#endif

    dst[groupIdx + groupIndex] = sum;
}
@gareththomasamd
Copy link
Contributor

thanks for reporting. We'll investigate!

@xaphier
Copy link
Author

xaphier commented Jun 28, 2019

CodeXL shows the code generated for the AmdDxExtShaderIntrinsics_WavePrefixSum:

V_ADD_U32	v5 vcc v5 v5 row_shr:1
S_NOP
V_ADD_U32	v5 vcc v5 v5 row_shr:2
S_NOP
V_ADD_U32	v5 vcc v5 v5 row_shr:4
S_NOP
V_ADD_U32	v5 vcc v5 v5 row_shr:8
S_NOP
V_ADD_U32	v5 vcc v5 v5 row_bcast:15 row_mask:0xa
S_NOP
V_ADD_U32	v5 vcc v5 v5 row_bcast:31 row_mask:0xc
S_NOP
V_MOV_B32	v5 v5 wave_shr:1 bound_ctrl:0

But according to https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
it should be more like: https://github.com/ROCm-Developer-Tools/LLVM-AMDGPU-Assembler-Extra/blob/master/examples/gfx8/dpp_reduce.s

v_add_u32 v1, v0, v0 row_shr:1 bound_ctrl:0
v_add_u32 v1, v0, v1 row_shr:2 bound_ctrl:0
v_add_u32 v1, v0, v1 row_shr:3 bound_ctrl:0
s_nop 0 // Nop required for data hazard in SP
s_nop 0 // Nop required for data hazard in SP
v_add_u32 v1, v1, v1 row_shr:4 bank_mask:0xe
s_nop 0 // Nop required for data hazard in SP
s_nop 0 // Nop required for data hazard in SP
v_add_u32 v1, v1, v1 row_shr:8 bank_mask:0xc
s_nop 0 // Nop required for data hazard in SP
s_nop 0 // Nop required for data hazard in SP
v_add_u32 v1, v1, v1 row_bcast:15 row_mask:0xa
s_nop 0 // Nop required for data hazard in SP
s_nop 0 // Nop required for data hazard in SP
v_add_u32 v1, v1, v1 row_bcast:31 row_mask:0xc

@xaphier
Copy link
Author

xaphier commented Jan 1, 2020

I was able to find what is wrong in ags and create a pull request for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants