Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMDGPU] Dynamic Shared/LDS #6

Open
1 task
DiamondLovesYou opened this issue Apr 18, 2021 · 0 comments
Open
1 task

[AMDGPU] Dynamic Shared/LDS #6

DiamondLovesYou opened this issue Apr 18, 2021 · 0 comments
Assignees

Comments

@DiamondLovesYou
Copy link
Member

Just found this:

__device__
inline
void* __get_dynamicgroupbaseptr()
{
    // Get group segment base pointer.
    return (char*)__local_to_generic((void*)__to_local(__llvm_amdgcn_groupstaticsize()));
}

In HIP's source.

llvm.amdgcn.groupstaticsize returns an i32, which as you'd expect returns the size of the statically allocated WG storage (ie via a global in the LLVM IR), with any dynamic LDS placed after (and thus we see the i32 -> i8* cast).

As it is now, Geobacter/Rust prevents statics from being generic, which prevents e.g. GEMM kernels from being generic over the type of the matrix element (without severe performance degradation). But this Rust limitation can be avoided by allowing kernels to allocate some LDS dynamically. Geobacter's shared source will then ensure that the host and device sides match.

TODO:

  • Add a type and memory safe way to allocate host side and use device side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant