std-indices: Capture in offload-friendly way #156
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, std-indices captured by reference. In an offload scenario, capture-by-value is generally preferred because if the reference points to stack memory on the host, offloaded kernels will encounter illegal memory accesses. This is also something that compilers generally cannot remedy using magic compiler transformations to make data GPU-accessible -- this only works for the heap.
The current code only works in an offload scenario, because the stream class itself is allocated on the heap (which compilers can then make GPU-accessible), and the kernels can then reference the
std::vector
objects for the data.As I've said, this relies on an implementation detail and may be brittle in case the architecture ever changes, or someone wishes to reuse babelstream code in a different context.
This PR therefore attempts to make things more robust by directly capturing data pointers by value.
On Intel iGPU, I see no substantial performance difference between the two versions in an offload scenario.