Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getValidWorkDiv() and isValidWorkDiv() is broken #2222

Open
psychocoderHPC opened this issue Jan 15, 2024 · 7 comments
Open

getValidWorkDiv() and isValidWorkDiv() is broken #2222

psychocoderHPC opened this issue Jan 15, 2024 · 7 comments
Assignees
Labels

Comments

@psychocoderHPC
Copy link
Member

Both functions using the device properties (TApi::getDeviceProperties()) to check if a work division is correct for the device.
In the case of CUDA and ROCm this is not valid because TApi::funcGetAttributes(&funcAttrs, kernelName) must be used.
If a kernel is using a high amount of registers the number of threads per block can be less than the maximal number of threads a device can handle.
The debug message shown for any kernel execution using TApi::funcGetAttributes(&funcAttrs, kernelName) and show therefore the correct restrictions.

To fix this bug it is required to change the interface of getValidWorkDiv() because the kernel is required for the query.

Two of my students found the issue during porting LULESH to alpaka:

Kernel 1 can use 1024 threads:

Screenshot from 2024-01-12 20-33-29

Kernel 2 used to many registers and can therefor run only with 512 threads

Screenshot from 2024-01-12 20-45-19

@psychocoderHPC
Copy link
Member Author

suggestion for the interface change:

    template<
        typename TAcc,
        typename TDev,
        typename TGridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>,
        typename TThreadElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>>
    ALPAKA_FN_HOST auto getValidWorkDiv(
        [[maybe_unused]] TDev const& dev,
        [[maybe_unused]] TGridElemExtent const& gridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
        [[maybe_unused]] TThreadElemExtent const& threadElemExtents = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
        [[maybe_unused]] bool blockThreadMustDivideGridThreadExtent = true,
        [[maybe_unused]] GridBlockExtentSubDivRestrictions gridBlockExtentSubDivRestrictions
        = GridBlockExtentSubDivRestrictions::Unrestricted)
        -> WorkDivMembers<Dim<TGridElemExtent>, Idx<TGridElemExtent>>

to

    template<
        typename TAcc,
        typename TDev,
        typaname TKernel,
        typename TGridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>,
        typename TThreadElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>>
    ALPAKA_FN_HOST auto getValidWorkDiv(
        [[maybe_unused]] TDev const& dev,
        [[maybe_unused]] TGridElemExtent const& gridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
        [[maybe_unused]] TThreadElemExtent const& threadElemExtents = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
        TKernel const& kernel,
        [[maybe_unused]] bool blockThreadMustDivideGridThreadExtent = true,
        [[maybe_unused]] GridBlockExtentSubDivRestrictions gridBlockExtentSubDivRestrictions
        = GridBlockExtentSubDivRestrictions::Unrestricted)
        -> decltype(createTaskKernel<>(...))

Instead of returning a workdivision, we can return the kernel task directly because it makes no sense to pass the kernel and return a work division which is then used to create a task kernel.

@SimeonEhrig
Copy link
Member

In the VC today, we agreed to overwork the function like @psychocoderHPC suggest. The API break is okay, because the function does not work and because of this, nobody is using it in production.

@psychocoderHPC
Copy link
Member Author

After starting the first try to fix it I realized that fixing this issue would take very long because it would affect all accelerators and many traits because we merged in the implementation the kernel function signature already with the accelerator and work division.

The fix will be complex and is touching a lot therefore it will not be ready for version 1.1.0.

Fixing the problem will likely require 2 weeks of focused work. I will still fix it but assume it will not be ready soon.

@bernhardmgruber
Copy link
Member

In the VC today, we agreed to overwork the function like @psychocoderHPC suggest. The API break is okay, because the function does not work and because of this, nobody is using it in production.

I doubt this argument. I am certain someone uses the API AND doesn't know it's broken :D But feel free to change it anyway.

I think work division creation and task enqueuing should be separable. You can of course provide a combined API, but I may want to look at the computed workdiv before I schedule the kernel. So a 2-step API should be possible.

I would name the 1-step API differently, and put the kernel sooner in the parameters, because it's a mandatory argument. Also, you will need the queue instead of the device. I think the function can return void then:

    ALPAKA_FN_HOST void enqueueWithValidWorkDiv(
        TDev const& queue,
        TKernel const& kernel,
        TGridElemExtent const& gridElemExtent = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
        TThreadElemExtent const& threadElemExtents = Vec<Dim<TAcc>, Idx<TAcc>>::ones(),
        bool blockThreadMustDivideGridThreadExtent = true,
        GridBlockExtentSubDivRestrictions gridBlockExtentSubDivRestrictions
        = GridBlockExtentSubDivRestrictions::Unrestricted);

You may also name it enqueueWithAdjustedWorkDiv, to communicate that the workdiv is not validated, but actually changed to be valid.

@SimeonEhrig
Copy link
Member

I doubt this argument. I am certain someone uses the API AND doesn't know it's broken :D But feel free to change it anyway.

Only in examples or also in production code?

@bernhardmgruber
Copy link
Member

I doubt this argument. I am certain someone uses the API AND doesn't know it's broken :D But feel free to change it anyway.

Only in examples or also in production code?

Probably examples.

@mehmetyusufoglu
Copy link
Contributor

I doubt this argument. I am certain someone uses the API AND doesn't know it's broken :D But feel free to change it anyway.

Only in examples or also in production code?

Simeon what do you mean by the production code? Sorry for my ignorance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants