-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getValidWorkDiv()
and isValidWorkDiv()
is broken
#2222
Comments
suggestion for the interface change: template<
typename TAcc,
typename TDev,
typename TGridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>,
typename TThreadElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>>
ALPAKA_FN_HOST auto getValidWorkDiv(
[[maybe_unused]] TDev const& dev,
[[maybe_unused]] TGridElemExtent const& gridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
[[maybe_unused]] TThreadElemExtent const& threadElemExtents = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
[[maybe_unused]] bool blockThreadMustDivideGridThreadExtent = true,
[[maybe_unused]] GridBlockExtentSubDivRestrictions gridBlockExtentSubDivRestrictions
= GridBlockExtentSubDivRestrictions::Unrestricted)
-> WorkDivMembers<Dim<TGridElemExtent>, Idx<TGridElemExtent>> to template<
typename TAcc,
typename TDev,
typaname TKernel,
typename TGridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>,
typename TThreadElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>>
ALPAKA_FN_HOST auto getValidWorkDiv(
[[maybe_unused]] TDev const& dev,
[[maybe_unused]] TGridElemExtent const& gridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
[[maybe_unused]] TThreadElemExtent const& threadElemExtents = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
TKernel const& kernel,
[[maybe_unused]] bool blockThreadMustDivideGridThreadExtent = true,
[[maybe_unused]] GridBlockExtentSubDivRestrictions gridBlockExtentSubDivRestrictions
= GridBlockExtentSubDivRestrictions::Unrestricted)
-> decltype(createTaskKernel<>(...)) Instead of returning a workdivision, we can return the kernel task directly because it makes no sense to pass the kernel and return a work division which is then used to create a task kernel. |
In the VC today, we agreed to overwork the function like @psychocoderHPC suggest. The API break is okay, because the function does not work and because of this, nobody is using it in production. |
After starting the first try to fix it I realized that fixing this issue would take very long because it would affect all accelerators and many traits because we merged in the implementation the kernel function signature already with the accelerator and work division. The fix will be complex and is touching a lot therefore it will not be ready for version 1.1.0. Fixing the problem will likely require 2 weeks of focused work. I will still fix it but assume it will not be ready soon. |
I doubt this argument. I am certain someone uses the API AND doesn't know it's broken :D But feel free to change it anyway. I think work division creation and task enqueuing should be separable. You can of course provide a combined API, but I may want to look at the computed workdiv before I schedule the kernel. So a 2-step API should be possible. I would name the 1-step API differently, and put the kernel sooner in the parameters, because it's a mandatory argument. Also, you will need the queue instead of the device. I think the function can return void then: ALPAKA_FN_HOST void enqueueWithValidWorkDiv(
TDev const& queue,
TKernel const& kernel,
TGridElemExtent const& gridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
TThreadElemExtent const& threadElemExtents = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
bool blockThreadMustDivideGridThreadExtent = true,
GridBlockExtentSubDivRestrictions gridBlockExtentSubDivRestrictions
= GridBlockExtentSubDivRestrictions::Unrestricted); You may also name it |
Only in examples or also in production code? |
Probably examples. |
Simeon what do you mean by the production code? Sorry for my ignorance :) |
Both functions using the device properties (
TApi::getDeviceProperties()
) to check if a work division is correct for the device.In the case of CUDA and ROCm this is not valid because
TApi::funcGetAttributes(&funcAttrs, kernelName)
must be used.If a kernel is using a high amount of registers the number of threads per block can be less than the maximal number of threads a device can handle.
The debug message shown for any kernel execution using
TApi::funcGetAttributes(&funcAttrs, kernelName)
and show therefore the correct restrictions.To fix this bug it is required to change the interface of
getValidWorkDiv()
because the kernel is required for the query.Two of my students found the issue during porting LULESH to alpaka:
Kernel 1 can use 1024 threads:
Kernel 2 used to many registers and can therefor run only with 512 threads
The text was updated successfully, but these errors were encountered: