Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD GPU Support via an extension for AMDGPU #3475

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

fluidnumerics-joe
Copy link

This PR replaces #3468 - editing is allowed by maintainers.

fluidnumerics-joe and others added 21 commits February 8, 2024 21:14
Fixed to 0.9.1 since this is the only version I've installed and worked
with for this patch.
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>
@navidcy navidcy added the GPU 👾 Where Oceananigans gets its powers from label Feb 14, 2024
src/Architectures.jl Outdated Show resolved Hide resolved
@navidcy
Copy link
Collaborator

navidcy commented Feb 14, 2024

I'll try to convert this to an extension. I'll do it in a single commit so that it's easily revertible. How does that sound @fluidnumerics-joe?

@fluidnumerics-joe
Copy link
Author

I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?

@glwagner
Copy link
Member

glwagner commented Feb 17, 2024

I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?

I think it makes sense to keep going with the baroclinic adjustment case!

To change the free surface you'll use

free_surface = SplitExplicitFreeSurface(grid)

as a keyword argument in the model constructor. I think the default parameters for it make sense but @simone-silvestri can confirm.

We can also try with ExplicitFreeSurface() which is even simpler, but in that case we'll have to modify gravitational_acceleration and the time step to get something that can complete in a reasonable amount of time.

Btw if you paste the baroclinic adjustment script you are working with we can also check to make sure it's GPU compatible and possibly help simplify it further.

@fluidnumerics-joe
Copy link
Author

SplitExplicitFreeSurface works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl

I'll get profiling results posted soon.

@glwagner
Copy link
Member

SplitExplicitFreeSurface works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl

I'll get profiling results posted soon.

Nice! Yeah, since

https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/9a0c6fa5e3400949d0bb14b3f22b033b64f2d124/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl#L85

is commented out I think this whole script will run on GPUs! The animation at the end I think will be generated on the CPU by default. You can also omit that (unless you want a pretty movie)

@fluidnumerics-joe
Copy link
Author

Just want to confirm some final steps with @navidcy and @glwagner here to wrap up this PR. At the moment, I believe we just need to put in a method that throws an error for validate_free_surface when the architecture is the AMD GPU and the free surface type is implicit free surface. I'm working on putting this in through the extensions (I believe this is the correct spot) and testing this out. Is there anything else, you want to see to get this merged into main ?

@navidcy
Copy link
Collaborator

navidcy commented Feb 27, 2024

I think we need to have an AMD-enabled CI?

@fluidnumerics-joe
Copy link
Author

I think we need to have an AMD-enabled CI?

Is this something that is handled on the MIT side ? Only way I can help is through system procurement (we're a Supermicro reseller), or through an allocation on our systems.

@simone-silvestri
Copy link
Collaborator

we could ask to the julia lab. They have some AMDs dedicated to CI there. Not sure if it is possible to use them

@navidcy
Copy link
Collaborator

navidcy commented Feb 27, 2024

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Comment on lines +242 to +246
# MOVE THIS IN EXTENSION
# if child_architecture == ROCmGPU()
# device_id = isnothing(devices) ? node_rank % length(AMDGPU.devices()) : devices[node_rank+1]
# AMDGPU.device!(device_id)
# end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this work? You can also use the switch_device! method here instead of explicitly calling CUDA.device! and AMDGPU.device!

@@ -4,7 +4,7 @@ using Oceananigans.Architectures: CPU, GPU, AbstractArchitecture
import Base: zeros

zeros(FT, ::CPU, N...) = zeros(FT, N...)
zeros(FT, ::GPU, N...) = CUDA.zeros(FT, N...)
zeros(FT, ::CUDAGPU, N...) = CUDA.zeros(FT, N...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have KernelAbstraction's allocation method here instead of having to put AMD's zeros in an extension.

see:
https://github.com/JuliaGPU/KernelAbstractions.jl/blob/a85cf4958958aa29ed47bf20e532c1e040dc0433/examples/matmul.jl#L30

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd need to think a bit on this given that Julia is still fairly new to me - I'm not seeing how we would pull the backend architecture required as input for the KernelAbstractions.zeros api in this context.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a device function in Architectures.jl that grabs the KA backend so this would probably suffice

zeros(FT, arch::AbstractArchitecture, N...) = KA.zeros(device(arch), FT, N...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this change in aa6a3aa

Only note here is the following warning thrown during precompilation

  1 dependency had output during precompilation:
┌ Oceananigans
│  WARNING: using KernelAbstractions.GPU in module Grids conflicts with an existing identifier.
│  WARNING: using KernelAbstractions.CPU in module Grids conflicts with an existing identifier.
└  

Comment on lines 1 to 10
module Grids

using AMDGPU
using ..Architectures: ROCmGPU

import Base: zeros

zeros(FT, ::ROCmGPU, N...) = AMDGPU.zeros(FT, N...)

end # module
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove all this extension by using KernelAbstractions instead

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navidcy - what do you think here ? You had put in some effort to moving the AMDGPU bits into the extension. Would you also be in favor of using KernelAbstractions instead ? @vchuravy do you foresee any potential limitations for future development, should we take the approach of using KernelAbstractions ?

I suspect that @simone-silvestri is hinting at using KernelAbstractions in all other places; correct me if I'm wrong here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would welcome contributions to KA to add any missing functionality.

@fluidnumerics-joe
Copy link
Author

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).

If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.

@fluidnumerics-joe
Copy link
Author

fluidnumerics-joe commented Mar 14, 2024

Just some comments at this point:

  • At this point, we have the HydrostaticFreeSurface model working with the split explicit free surface. It would be great to find some time later on to figure out what was going on with the implicit free surface on AMD GPUs (is the issue isolated only to that architecture??) and get this resolved.
  • To get everything moved over to KernelAbstractions would constitute a rather large change, something I think @glwagner expressed an interest in avoiding. I'd vote in favor of pushing this change off for future PR's.
  • I'm wrapping up a profiling report that includes MI210 and A100 GPU performance; this report will include some recommendations should we be interested in performance improvements on GPU hardware (AMD and Nvidia). This kind of work could also constitute PR's further down the road.
  • The main outstanding issue seems to be that we need a platform for testing on AMD GPUs.

It appears the CliMA fork Project.toml and Manifest.toml have diverged; I'll take a look to see if I can fix.

@glwagner
Copy link
Member

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).

If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.

@simone-silvestri can you please help with this? I agree its critical to get this PR merged ASAP, it's already getting stale. I think we should contact Satori folks first directly or via @christophernhill . @Sbozzolo might be able to help if there are AMD machines on the caltech cluster.

@glwagner
Copy link
Member

@fluidnumerics-joe let us know if you want help resolving conflicts

@fluidnumerics-joe
Copy link
Author

@fluidnumerics-joe let us know if you want help resolving conflicts

I think the project.toml and manifest.toml would be best addressed on your side. I'll take a look at the src/Architectures.jl conflict.

@glwagner
Copy link
Member

glwagner commented Mar 14, 2024

@fluidnumerics-joe let us know if you want help resolving conflicts

I think the project.toml and manifest.toml would be best addressed on your side. I'll take a look at the src/Architectures.jl conflict.

Manifest.toml is fixed by deleting and regenerating with instantiate (we don't edit it manually). Project.toml is likely just a verison situation I can definitely fix that. the conflicts have to fixed with one commit so I can do it.

@fluidnumerics-joe
Copy link
Author

It'd be best if you guys can take a look at the conflicts in src/Architectures.jl; it's not clear to me what some decisions would influence elsewhere in the code - quite a bit has changed that you are probably more aware of. I'm happy to test the code once conflicts are resolved.

@glwagner
Copy link
Member

@simone-silvestri can you help. I think he's talking about on_architecture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extensions 🧬 GPU 👾 Where Oceananigans gets its powers from
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants