Skip to content

Why does the ordering of work items seem to make a big difference in performance for ElementWiseKernel? #647

Answered by 11Kclarke
11Kclarke asked this question in Q&A
Discussion options

You must be logged in to vote

Hey,
It makes sense to me now. Going to leave my explanation to encase it helps someone else. There's only 1 instruction reader per workgroup, so within a kernel, it's disadvantageous for branching unless each item in the workgroup will follow the same branch. While the input was ordered it meant items in the same workgroup were likely to follow the same branch. This saves much more time than splitting the more expensive branches across work groups.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@11Kclarke
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by 11Kclarke
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants