-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KernelAbstractions support #147
base: master
Are you sure you want to change the base?
Conversation
Ah, I guess while I'm here, I'll briefly explain the differences with CUDA syntactically:
The tricky thing about this PR was removing the CUDA dependency outside of the kernels. There is still one call in zygote.jl I gotta figure out: https://github.com/leios/Molly.jl/blob/KA_support/src/zygote.jl#L698 |
Great work so far. Making the code compatible with generic array types is a nice change, and having the kernels work on different devices would be a selling point of the package. I would be interested to see the performance of the kernels compared to the CUDA versions. Also whether it plays nicely with Enzyme. Good luck with the runtime errors. |
I think I can finish this up today or else early next week (emphasis on think), but to quickly answer the questions:
|
Great. Not urgent, but how well does KernelAbstractions.jl deal with warp-level code, e.g. |
That's a good question. We can probably expose the APIs available from CUDA, but I am not sure how AMDGPU deals with these. We would also just need to figure out what that corresponds to on parallel CPU. I think these are the tools we need: https://rocm.docs.amd.com/projects/rocPRIM/en/latest/warp_ops/index.html Ah, as an important note (that I somehow failed to mention before), an advantage of KA is that it also provides a parallel CPU implentation, so the kernels can be written once and executed everywhere. I didn't do that in this PR because that brings up design questions related to Molly internals. |
Yeah we can discuss that after this PR. I would be okay with switching if there was no performance hit. Judging from discussion on the linked PR there is not currently warp support in KA. It may be necessary to leave that CUDA kernel in and have a separate KA kernel for other backends until warp support comes to KA. |
Ok, so a couple of quick notes here:
|
That is okay.
I wouldn't worry about this. Currently I only merge stuff that I am able to maintain, or where I think I can skill up to the point of maintaining it. The changes here seem reasonable and worth merging once any errors and performance regressions are fixed. There is a wider question about whether KernelAbstractions.jl will continue to be maintained compared to CUDA.jl, but it seems to have decent traction now. |
Yeah, the plan is for KA to be used even within GPUArrays, so it's not going anywhere anytime soon. Speaking of which, the "correct" course of action for KA in Molly would be to get the KA in GPUArrays first and then use that to implement any missing features on the GPUArrays level. Would it be better for me to separate this PR then? Maybe one doing the generic Array stuff and then another with the KA support? |
I would try and get this PR working as is. Only if that becomes difficult would it be worth splitting out and merging the generic array support. If KA is here for the long haul then there is a benefit to switching the kernels even if only CUDA works currently. Because then when changes happen elsewhere, AMDGPU will work without any changes required in Molly. |
The KernelAbstractions branch now compiles, so I thought I would put forward a quick draft PR while I figure out all the runtime bugs.
Notes:
CUDA
has been removed and replaced withKernelAbstractions
andGPUArrays
. As an important note here, GPUArrays is not strictly necessary except to replicate the behavior of the boolean GPU flag (ieisa(a, AbstractGPUArray)
).