-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD GPU Support via an extension for AMDGPU
#3475
base: main
Are you sure you want to change the base?
Conversation
Work is based off CliMA#2949
Fixed to 0.9.1 since this is the only version I've installed and worked with for this patch.
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
I'll try to convert this to an extension. I'll do it in a single commit so that it's easily revertible. How does that sound @fluidnumerics-joe? |
I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind? |
I think it makes sense to keep going with the baroclinic adjustment case! To change the free surface you'll use free_surface = SplitExplicitFreeSurface(grid) as a keyword argument in the model constructor. I think the default parameters for it make sense but @simone-silvestri can confirm. We can also try with Btw if you paste the baroclinic adjustment script you are working with we can also check to make sure it's GPU compatible and possibly help simplify it further. |
I'll get profiling results posted soon. |
Nice! Yeah, since is commented out I think this whole script will run on GPUs! The animation at the end I think will be generated on the CPU by default. You can also omit that (unless you want a pretty movie) |
Just want to confirm some final steps with @navidcy and @glwagner here to wrap up this PR. At the moment, I believe we just need to put in a method that throws an error for |
I think we need to have an AMD-enabled CI? |
Is this something that is handled on the MIT side ? Only way I can help is through system procurement (we're a Supermicro reseller), or through an allocation on our systems. |
we could ask to the julia lab. They have some AMDs dedicated to CI there. Not sure if it is possible to use them |
Yeah, it’s something the Oceananigans dev team should sort out! :) |
# MOVE THIS IN EXTENSION | ||
# if child_architecture == ROCmGPU() | ||
# device_id = isnothing(devices) ? node_rank % length(AMDGPU.devices()) : devices[node_rank+1] | ||
# AMDGPU.device!(device_id) | ||
# end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this work? You can also use the switch_device!
method here instead of explicitly calling CUDA.device!
and AMDGPU.device!
src/Grids/zeros_and_ones.jl
Outdated
@@ -4,7 +4,7 @@ using Oceananigans.Architectures: CPU, GPU, AbstractArchitecture | |||
import Base: zeros | |||
|
|||
zeros(FT, ::CPU, N...) = zeros(FT, N...) | |||
zeros(FT, ::GPU, N...) = CUDA.zeros(FT, N...) | |||
zeros(FT, ::CUDAGPU, N...) = CUDA.zeros(FT, N...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have KernelAbstraction's allocation method here instead of having to put AMD's zeros
in an extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd need to think a bit on this given that Julia is still fairly new to me - I'm not seeing how we would pull the backend architecture required as input for the KernelAbstractions.zeros
api in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a device
function in Architectures.jl
that grabs the KA backend so this would probably suffice
zeros(FT, arch::AbstractArchitecture, N...) = KA.zeros(device(arch), FT, N...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this change in aa6a3aa
Only note here is the following warning thrown during precompilation
1 dependency had output during precompilation:
┌ Oceananigans
│ WARNING: using KernelAbstractions.GPU in module Grids conflicts with an existing identifier.
│ WARNING: using KernelAbstractions.CPU in module Grids conflicts with an existing identifier.
└
ext/OceananigansAMDGPUExt/Grids.jl
Outdated
module Grids | ||
|
||
using AMDGPU | ||
using ..Architectures: ROCmGPU | ||
|
||
import Base: zeros | ||
|
||
zeros(FT, ::ROCmGPU, N...) = AMDGPU.zeros(FT, N...) | ||
|
||
end # module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove all this extension by using KernelAbstractions instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navidcy - what do you think here ? You had put in some effort to moving the AMDGPU bits into the extension. Would you also be in favor of using KernelAbstractions instead ? @vchuravy do you foresee any potential limitations for future development, should we take the approach of using KernelAbstractions ?
I suspect that @simone-silvestri is hinting at using KernelAbstractions in all other places; correct me if I'm wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would welcome contributions to KA to add any missing functionality.
Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix). If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling. |
Just some comments at this point:
It appears the CliMA fork |
@simone-silvestri can you please help with this? I agree its critical to get this PR merged ASAP, it's already getting stale. I think we should contact Satori folks first directly or via @christophernhill . @Sbozzolo might be able to help if there are AMD machines on the caltech cluster. |
@fluidnumerics-joe let us know if you want help resolving conflicts |
I think the project.toml and manifest.toml would be best addressed on your side. I'll take a look at the |
|
It'd be best if you guys can take a look at the conflicts in |
@simone-silvestri can you help. I think he's talking about |
This PR replaces #3468 - editing is allowed by maintainers.