New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looping over high-dimensional arrays #438
Comments
Seems to follow the same iteration order as:
|
One thing to note that your So I am not surprised that |
Say what now? This isn't spilt up automatically? |
Well I thought I had documented that clearly, but I seem to not find it... Take a look at: the workgroupsize is also a tuple where you provide the dimensions of the workgroup. |
Ok. That explains everything. |
Understood @vchuravy! Thanks for clarifying. |
A bit more on this, it looks like if we try to evaluate the required workgroupsize = ntuple(j->j==argmax(size(R)) ? 64 : 1, length(size(R))) and then passing it to the kernel argument, this results in much slower kernels. On the other hand, hardcoding it to Edit: Actually, I have seen that the macro that generates the kernel is sometimes failing to produce the expected result of |
Together with @weymouth we are trying to create a kernel that loops over an n-dimensional array and applies a function to each element. While we can certainly achieve to do so, the speedup we observe when comparing
@kernel
("KA") and non-@kernel
("serial") implementations is very different depending of the array slice we want to access. This is probably related to Julia being C-major, but the difference is strikingly here and KA does not perform as well as the serial version.Here is a simple MWE that demonstrates this, and this has been run with
julia -t 1
to force a single thread and draw comparisons between KA and serial implementation. There is also an additional GPU test added for comparison, where the same issue is detected.The timings are:
Is there something wrong in the MWE? Could this be done differently? It would be nice to learn about this. Thanks!
The text was updated successfully, but these errors were encountered: