AMD GPU Support via an extension for `AMDGPU` #3475

fluidnumerics-joe · 2024-02-13T18:59:18Z

This PR replaces #3468 - editing is allowed by maintainers.

Work is based off CliMA#2949

@vchuravy

Thanks @vchuravy

Fixed to 0.9.1 since this is the only version I've installed and worked with for this patch.

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

src/Architectures.jl

navidcy · 2024-02-14T09:17:46Z

I'll try to convert this to an extension. I'll do it in a single commit so that it's easily revertible. How does that sound @fluidnumerics-joe?

fluidnumerics-joe · 2024-02-16T10:26:47Z

I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?

glwagner · 2024-02-17T18:14:50Z

I'm game to try. Should we modify the baroclinic adjustment problem or is there another benchmark you have in mind?

I think it makes sense to keep going with the baroclinic adjustment case!

To change the free surface you'll use

free_surface = SplitExplicitFreeSurface(grid)

as a keyword argument in the model constructor. I think the default parameters for it make sense but @simone-silvestri can confirm.

We can also try with ExplicitFreeSurface() which is even simpler, but in that case we'll have to modify gravitational_acceleration and the time step to get something that can complete in a reasonable amount of time.

Btw if you paste the baroclinic adjustment script you are working with we can also check to make sure it's GPU compatible and possibly help simplify it further.

fluidnumerics-joe · 2024-02-17T20:10:17Z

SplitExplicitFreeSurface works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl

I'll get profiling results posted soon.

glwagner · 2024-02-17T23:17:08Z

SplitExplicitFreeSurface works well here. For reference, the script I'm using is here : https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/main/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl

I'll get profiling results posted soon.

Nice! Yeah, since

https://github.com/FluidNumerics/oceananigans-on-amd-gpus/blob/9a0c6fa5e3400949d0bb14b3f22b033b64f2d124/benchmarks/baroclinic_adjustment/baroclinic_adjustment.jl#L85

is commented out I think this whole script will run on GPUs! The animation at the end I think will be generated on the CPU by default. You can also omit that (unless you want a pretty movie)

fluidnumerics-joe · 2024-02-27T16:51:55Z

Just want to confirm some final steps with @navidcy and @glwagner here to wrap up this PR. At the moment, I believe we just need to put in a method that throws an error for validate_free_surface when the architecture is the AMD GPU and the free surface type is implicit free surface. I'm working on putting this in through the extensions (I believe this is the correct spot) and testing this out. Is there anything else, you want to see to get this merged into main ?

navidcy · 2024-02-27T17:30:18Z

I think we need to have an AMD-enabled CI?

fluidnumerics-joe · 2024-02-27T17:37:39Z

I think we need to have an AMD-enabled CI?

Is this something that is handled on the MIT side ? Only way I can help is through system procurement (we're a Supermicro reseller), or through an allocation on our systems.

simone-silvestri · 2024-02-27T17:41:29Z

we could ask to the julia lab. They have some AMDs dedicated to CI there. Not sure if it is possible to use them

navidcy · 2024-02-27T17:41:33Z

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

ext/OceananigansAMDGPUExt/DistributedComputations.jl

src/Architectures.jl

simone-silvestri · 2024-02-27T17:58:12Z

src/DistributedComputations/distributed_architectures.jl

+        # MOVE THIS IN EXTENSION
+        # if child_architecture == ROCmGPU()
+        #     device_id = isnothing(devices) ? node_rank % length(AMDGPU.devices()) : devices[node_rank+1]
+        #     AMDGPU.device!(device_id)
+        # end


will this work? You can also use the switch_device! method here instead of explicitly calling CUDA.device! and AMDGPU.device!

simone-silvestri · 2024-02-27T18:00:30Z

src/Grids/zeros_and_ones.jl

@@ -4,7 +4,7 @@ using Oceananigans.Architectures: CPU, GPU, AbstractArchitecture
 import Base: zeros

 zeros(FT, ::CPU, N...) = zeros(FT, N...)
-zeros(FT, ::GPU, N...) = CUDA.zeros(FT, N...)
+zeros(FT, ::CUDAGPU, N...) = CUDA.zeros(FT, N...)


would be nice to have KernelAbstraction's allocation method here instead of having to put AMD's zeros in an extension.

see:
https://github.com/JuliaGPU/KernelAbstractions.jl/blob/a85cf4958958aa29ed47bf20e532c1e040dc0433/examples/matmul.jl#L30

I'd need to think a bit on this given that Julia is still fairly new to me - I'm not seeing how we would pull the backend architecture required as input for the KernelAbstractions.zeros api in this context.

we have a device function in Architectures.jl that grabs the KA backend so this would probably suffice

zeros(FT, arch::AbstractArchitecture, N...) = KA.zeros(device(arch), FT, N...)

Added this change in aa6a3aa

Only note here is the following warning thrown during precompilation

1 dependency had output during precompilation: ┌ Oceananigans │ WARNING: using KernelAbstractions.GPU in module Grids conflicts with an existing identifier. │ WARNING: using KernelAbstractions.CPU in module Grids conflicts with an existing identifier. └

simone-silvestri · 2024-02-27T18:00:51Z

ext/OceananigansAMDGPUExt/Grids.jl

+module Grids
+
+using AMDGPU
+using ..Architectures: ROCmGPU
+
+import Base: zeros
+
+zeros(FT, ::ROCmGPU, N...) = AMDGPU.zeros(FT, N...)
+
+end # module


We can remove all this extension by using KernelAbstractions instead

@navidcy - what do you think here ? You had put in some effort to moving the AMDGPU bits into the extension. Would you also be in favor of using KernelAbstractions instead ? @vchuravy do you foresee any potential limitations for future development, should we take the approach of using KernelAbstractions ?

I suspect that @simone-silvestri is hinting at using KernelAbstractions in all other places; correct me if I'm wrong here

I would welcome contributions to KA to add any missing functionality.

fluidnumerics-joe · 2024-03-13T21:26:17Z

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).

If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.

fluidnumerics-joe · 2024-03-14T14:47:41Z

Just some comments at this point:

At this point, we have the HydrostaticFreeSurface model working with the split explicit free surface. It would be great to find some time later on to figure out what was going on with the implicit free surface on AMD GPUs (is the issue isolated only to that architecture??) and get this resolved.
To get everything moved over to KernelAbstractions would constitute a rather large change, something I think @glwagner expressed an interest in avoiding. I'd vote in favor of pushing this change off for future PR's.
I'm wrapping up a profiling report that includes MI210 and A100 GPU performance; this report will include some recommendations should we be interested in performance improvements on GPU hardware (AMD and Nvidia). This kind of work could also constitute PR's further down the road.
The main outstanding issue seems to be that we need a platform for testing on AMD GPUs.

It appears the CliMA fork Project.toml and Manifest.toml have diverged; I'll take a look to see if I can fix.

glwagner · 2024-03-14T14:48:37Z

Is this something that is handled on the MIT side ?

Yeah, it’s something the Oceananigans dev team should sort out! :)

Curious to know if there's any movement on getting this resolved. I can offer some help in getting an allocation request in to Pawsey Supercomputing Centre - I mentioned to @navidcy that I have a solution for doing CI on systems with job schedulers (like Pawsey's Setonix).

If existing hardware systems at MIT are not available for this, I can also help with procurement, if needed. If you go this route, I can look into providing some time on our systems to get testing rolling.

@simone-silvestri can you please help with this? I agree its critical to get this PR merged ASAP, it's already getting stale. I think we should contact Satori folks first directly or via @christophernhill . @Sbozzolo might be able to help if there are AMD machines on the caltech cluster.

glwagner · 2024-03-14T14:49:06Z

@fluidnumerics-joe let us know if you want help resolving conflicts

fluidnumerics-joe · 2024-03-14T14:50:35Z

@fluidnumerics-joe let us know if you want help resolving conflicts

I think the project.toml and manifest.toml would be best addressed on your side. I'll take a look at the src/Architectures.jl conflict.

glwagner · 2024-03-14T15:03:24Z

@fluidnumerics-joe let us know if you want help resolving conflicts

I think the project.toml and manifest.toml would be best addressed on your side. I'll take a look at the src/Architectures.jl conflict.

Manifest.toml is fixed by deleting and regenerating with instantiate (we don't edit it manually). Project.toml is likely just a verison situation I can definitely fix that. the conflicts have to fixed with one commit so I can do it.

fluidnumerics-joe · 2024-03-14T15:04:35Z

It'd be best if you guys can take a look at the conflicts in src/Architectures.jl; it's not clear to me what some decisions would influence elsewhere in the code - quite a bit has changed that you are probably more aware of. I'm happy to test the code once conflicts are resolved.

glwagner · 2024-03-14T15:05:10Z

@simone-silvestri can you help. I think he's talking about on_architecture

fluidnumerics-joe and others added 21 commits February 8, 2024 21:14

Add initial draft of AMDGPU support

b525ab2

Work is based off CliMA#2949

Merge remote-tracking branch 'upstream/main'

3a3b277

Remove ROCKernels/CUDAKernels dependency and clean up

3f98b42

Separate types for ROCm or CUDA

b136b9e

A few more interface corrections

02d1059

Use GPU{D} struct with aliases for CUDA and ROCm GPUs

a93c12e

Compat for AMDGPU 0.8; resolve manifest

5a182f4

Thanks @vchuravy

Add compat for GPUArrays

7738002

Fixed to 0.9.1 since this is the only version I've installed and worked with for this patch.

Add constructor for GPU() to detect cuda or rocm

2dbf488

Make GPU{D} type extension of AbstractArchitectures (again)

64f54ac

Remove commented code

8078174

Update src/Utils/multi_region_transformation.jl

fc372d4

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

Update src/Utils/multi_region_transformation.jl

671d3f4

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

Fix code alignment

9526d31

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

Simplify arch_array{} with GPU{D}

08b4c3c

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

Simplify arch_array{} with GPU{D}

48d7770

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

Add a space

4654cd0

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

GPUArrays to 9.1.0

742e0c8

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

Simplify arch_array

b8ef245

Co-authored-by: Navid C. Constantinou <navidcy@users.noreply.github.com>

Combine

1c1c808

Clean up packages

7a5117f

navidcy added the GPU 👾 Where Oceananigans gets its powers from label Feb 14, 2024

navidcy added 6 commits February 14, 2024 08:35

bit stricter compat for AMDGPU

ed3ec85

resolve deps with julia v1.9

4912e81

add GPUArrays as dep

5557c39

fix doctest

04af0e5

resolve deps

3876f94

Merge branch 'main' into main

747e7f6

navidcy reviewed Feb 14, 2024

View reviewed changes

src/Architectures.jl Outdated Show resolved Hide resolved

Merge branch 'main' into main

2b0c3c8