New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhead of first reduction with CUDA backend #1558
Comments
This is expected behavior. The reducers need memory to function so the first reducer call causes allocations and initialization in internal memory pools (device, device zeroed, and pinned pools). You should be able to see this happening if you profile with something like nsight systems. |
Thank you for the detailed explanation and suggestions @MrBurmark! The
On a slight tangent, I was thinking a bit more about this the other day and I was wondering if hiding initial overheads like this justifies as a use case for providing methods, such as, Thank you again for all your help. |
You can't use that policy with
vs
|
Thanks @MrBurmark -- it's always complicated with The example you provided is exactly what I was looking for! |
@gzagaris several years ago, we considered adding RAJA::initialize and RAJA::finalize methods. Kokkos does that, for example. We didn't really see a strong need at the time and thought RAJA would be more flexible without it. We will reconsider and let you know. |
Sounds good, thanks @rhornung67! |
Hi everyone,
We've observed that the execution time of the first reduction is notably high (in some cases, slower than sequential). However, subsequent reductions do not exhibit this behavior, suggesting that there might be some overhead (perhaps internal initialization?) with the first reduction that is invoked by the application.
At your convenience, can you confirm if that is expected behavior with the current implementation and elaborate a bit on what is happening?
Steps To Reproduce
This produces the following output:
Please, let me know if you need any additional information. Thank you for all your time and help.
The text was updated successfully, but these errors were encountered: