Skip to content
This repository has been archived by the owner on May 9, 2024. It is now read-only.

Add shared memory support for L0 path #116

Open
kurapov-peter opened this issue Nov 18, 2022 · 16 comments · May be fixed by #390
Open

Add shared memory support for L0 path #116

kurapov-peter opened this issue Nov 18, 2022 · 16 comments · May be fixed by #390
Assignees

Comments

@kurapov-peter
Copy link
Contributor

kurapov-peter commented Nov 18, 2022

We need to add shared memory support for Intel GPU using L0 backend.

  1. Create an equivalent of the Nvidia GpuSharedMemoryTest.cpp for Intel GPU via L0
  2. Activate SM on Backend / Execution Engine.
  3. Check hardcoded parameters for each GPU architecture
@lmontigny
Copy link
Contributor

Created branch: 116-add-shared-memory-support-for-l0-path

@lmontigny lmontigny linked a pull request Apr 17, 2023 that will close this issue
@lmontigny
Copy link
Contributor

lmontigny commented May 2, 2023

Moved to pakurapo/smem-2 branch, need to change the is_l0 flag to true in several place to avoid CPU fallback (and use GPU via L0). Otherwise reported address space casting issue

Clarify strange address space casting:
%dest_slot_adr_0 = addrspacecast i8 addrspace(3)* %2 to i32 addrspace(21875)*

@lmontigny
Copy link
Contributor

lmontigny commented May 3, 2023

Large address space issue related to the traits object?
solved using

    if (slot_bytes == sizeof(int32_t)) {
      // return traits.smemPointerType(llvm::Type::getInt32Ty(context));
      return llvm::Type::getInt32PtrTy(context, /*address_space=*/3)

generate:
%dest_slot_adr_0 = bitcast i8 addrspace(3)* %2 to i32 addrspace(3)*

Current issue:

/GpuSharedMemoryTestIntel.INFO
 2023-05-03T04:39:46.972076 I 1772468 0 0 L0Mgr.cpp:75 Discovered 1 driver(s) for L0 platform.
 2023-05-03T04:39:47.003210 I 1772468 0 0 L0Mgr.cpp:221 L0 module build log: error: IGC SPIRV Consumer does    not support the following opcode: 65534
 ^@

@lmontigny
Copy link
Contributor

L0Mgr issue, it seems to be an invalid opcode as 65535 is not reserved by any vendor:
https://github.com/KhronosGroup/SPIRV-Headers/blob/main/include/spirv/spir-v.xml#L152

<ids type="opcode" start="6656" end="65535" comment="Opcode range reservable for future use by vendors"/>

Wrong data is read as an opcode earlier in the IGC SPIR-V process

@lmontigny
Copy link
Contributor

Searching the invalid opcode, see spir-v disassembly

@lmontigny
Copy link
Contributor

Invalid casting in the write_projection_int64 from RuntimeFunctions.cpp

/usr/bin/llvm-as-14: after.linking.before.insert_declaration.spirv.ll:1712:16: error: invalid cast opcode for cast from 'i8 addrspace(4)*' to 'i64*'
  %6 = bitcast i8 addrspace(4)* %0 to i64*, !ResultSetReductionCodegen.cpp !10

Need to create a custom method in genx.ll with the appropriate casting to addrspace(4)

@lmontigny
Copy link
Contributor

Added write_projection_int64 in genx.ll with the casting to addrspacecast(4)
%6 = bitcast i8 addrspace(4)* %0 to i64 addrspace(4)*

write_projection_int32 is required in genx.ll as well.

Now same casting issue is appearing for agg_sum_double_skip_val, this function needs to be added in genx.ll as well. (with the casting addrspace(4)).

@lmontigny
Copy link
Contributor

Fixed several function and added implementation in genx.ll

Need to fix this one:

/usr/bin/llvm-as-14: after.linking.spirv.ll:1978:13: error: '@reduce_from_smem_to_gmem' defined with type 'void (i64 addrspace(4)*, i64 addrspace(4)*, i32)*' but expected 'void (i64 addrspace(4)*, i64 addrspace(3)*, i32)*'
  call void @reduce_from_smem_to_gmem(i64 addrspace(4)* %output_buffer_ptr, i64 addrspace(3)* %smem_input_buffer_ptr, i32 14952)

@lmontigny
Copy link
Contributor

Now on the shared_memory related issue:

$ /usr/bin/llvm-as-14 after.linking.spirv.ll
/usr/bin/llvm-as-14: after.linking.spirv.ll:1258:45: error: use of undefined value '@slm.buf.i64'
  %res = bitcast [1024 x i64] addrspace(3)* @slm.buf.i64 to i64 addrspace(3)*

Current genx.ll file:

; TODO(Petr): this should be dynamically sized depending on the arch, but at least 64KB
@slm.buf.i64 = internal local_unnamed_addr addrspace(3) global [1024 x i64] zeroinitializer, align 8

; https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpControlBarrier
declare void @__spirv_ControlBarrier(i32 %execution_scope, i32 %memory_scope, i32 %memory_semantics)

; L0 zeroes out the memory by default, so an empty implementation can work in simple cases
define i64 addrspace(3)* @init_shared_mem(i64 addrspace(4)* %agg_init_val, i32 noundef %groups_buffer_size) {
    %res = bitcast [1024 x i64] addrspace(3)* @slm.buf.i64 to i64 addrspace(3)*
    ret i64 addrspace(3)* null

@lmontigny
Copy link
Contributor

Need to copy the global variables from genx.ll to the IR
Done following main...pakurapo/slm-copying-example

@lmontigny
Copy link
Contributor

Initializer is not copied, need to add it separately

$ /usr/bin/llvm-as-14 after.linking.spirv.ll
/usr/bin/llvm-as-14: after.linking.spirv.ll:25:76: error: expected value token
@slm.buf.i64 = internal local_unnamed_addr addrspace(3) global [1024 x i64], align 8

Missing zeroinitializer at the end

@lmontigny
Copy link
Contributor

Solved init issue using

 // Initialize shared memory buffer
  const auto slm_buffer = module_->getNamedGlobal("slm.buf.i64");
  CHECK(slm_buffer);
  llvm::ArrayType* ArrayTy_0 =
      llvm::ArrayType::get(llvm::IntegerType::get(module_->getContext(), 64), 1024);
  llvm::ConstantAggregateZero* const_array_2 =
      llvm::ConstantAggregateZero::get(ArrayTy_0);
  slm_buffer->setInitializer(const_array_2)

@lmontigny
Copy link
Contributor

lmontigny commented May 24, 2023

By running the test ./Tests/GpuSharedMemoryTestIntel:

hdk_log/./Tests/GpuSharedMemoryTestIntel =

 2023-05-24T05:52:15.323533 I 21622 0 0 L0Mgr.cpp:75 Discovered 1 driver(s) for L0 platform.
 2023-05-24T05:52:15.366145 I 21622 0 0 L0Mgr.cpp:221 L0 module build log: error: IGC SPIRV Consumer does not   support the following opcode: 65534

Need to rebase and/or clean the current branch with the Intel GPU tests

@lmontigny
Copy link
Contributor

lmontigny commented May 25, 2023

Current state [2023-05-25]:

  • Rebased with origin/main bc154e46cafcfbd73892bd96cbb5079df8319690
  • Now using lmontigny/smem-3 branch
  • See draft PR Shared Memory for Intel GPU - Debugging, WIP, from lmontigny/smem-3 branch #507
  • genx.ll is clean, with the required custom methods using the correct addrspace(4) casting.
  • No casting issue with /usr/bin/llvm-as-14 after.linking.spirv.l anymore
  • Compilation HDK with Intel GPU Test ok
  • At runtime, ./Tests/GpuSharedMemoryTestIntel gives:
$ ./Tests/GpuSharedMemoryTestIntel
[==========] Running 7 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 7 tests from SingleColumn
[ RUN      ] SingleColumn.VariableEntries_CountQuery_4B_Group
Adding global slm.buf.i64
unknown file: Failure
C++ exception with description "L0 error: error occurred when building module, see build log for details" thrown in the test body.
[  FAILED  ] SingleColumn.VariableEntries_CountQuery_4B_Group (90 ms)
...
[----------] Global test environment tear-down
[==========] 7 tests from 1 test suite ran. (481 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] SingleColumn.VariableEntries_CountQuery_4B_Group
[  FAILED  ] SingleColumn.VariableEntries_CountQuery_8B_Group
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_1
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_2
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_3
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_4
[  FAILED  ] SingleColumn.VariableNumBuffers

 7 FAILED TESTS
  • L0 error from hdk_log shows:
-  2023-05-25T10:11:17.637143 I 565758 0 0 L0Mgr.cpp:75            Discovered 1 driver(s) for L0 platform.
 2023-05-25T10:11:17.690471 I 565758 0 0 L0Mgr.cpp:221 L0        module build log: error: IGC SPIRV Consumer does not support    the following opcode: 65534
  • Unclear where the unsupported opcode comes from..

@lmontigny
Copy link
Contributor

Internal recommendations using IR

A couple of suggestions how to proceed here

I don't think that looking into LLVM IR would help here, better look at SPIR-V
I've just tried to do IR -> SPIR-V -> IR roundtrip with the version of SPIR-V translator we use at intel/llvm - this should be more or less fresh KhronosGroup/SPIRV-LLVM-Translator version
I've encountered no problems with that, which suggests that SPIR-V consumer you are using probably has some bug comparing to the upstream SPIR-V translator

  • as I can see, SPIR-V produced from this IR doesn't use any extensions and therefore you can try use spirv-val from https://github.com/KhronosGroup/SPIRV-Tools to see if SPIR-V itself is valid. Perhaps it is generated incorrectly in the first place. I mentioned extensions here, because we have a known issue with a lot of Intel SPIR-V extensions not being known to SPIRV-Tools repo
  • you can try insert some debug prints to the SPIR-V consumer so it prints every instruction it has parsed. That way you will at least localize which particular instruction confuses the translator and what are the instructions which precede it. I guess you can also use a debugger for that, if that's easier for you
  • you can try https://llvm.org/docs/Bugpoint.html to get minimized LLVM IR which reproduces the problem
  • you can also try contact someone from IGC - they may know their SPIR-V consumer better and therefore suggest something else as well. XXX could be a good candidate here

Work in progress to debug our SPIR-V

@kurapov-peter
Copy link
Contributor Author

The support is merged. Will keep the issue open until we have enabled the tests in CI.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants