Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using gridDim.x in reduction (max/min/sum) cuda kernels #789

Open
coreylowman opened this issue May 19, 2023 · 0 comments
Open

Using gridDim.x in reduction (max/min/sum) cuda kernels #789

coreylowman opened this issue May 19, 2023 · 0 comments
Labels
expert Requires advanced knowledge of dfdx gpu Related to GPU support optimization

Comments

@coreylowman
Copy link
Owner

coreylowman commented May 19, 2023

Often, it is recommend to loop over items in a cuda kernel like:

for (unsigned int i = tid; i < n; i += blockDim.x * gridDim.x) {
   ...
}

This was recently done for most kernels in #787

However, this is not currently implemented for reductions.

Currently, each chunk to be reduced is spread across thread blocks. So the threads in a single block may be working on 1 or more chunks. How does this work with grid striding?

Additionally, how would shared memory be used?

@coreylowman coreylowman added gpu Related to GPU support optimization expert Requires advanced knowledge of dfdx labels Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expert Requires advanced knowledge of dfdx gpu Related to GPU support optimization
Projects
None yet
Development

No branches or pull requests

1 participant