Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup constructing a ConformalCubedSphereGrid on GPU #3579

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

navidcy
Copy link
Collaborator

@navidcy navidcy commented May 4, 2024

This PR removes a bunch of scalar operations introduced by #3575 on the GPU and instead performs operations on CPU and converts to CuArrays afterwards.

There are two issues that don't allow us to create the grid natively on the GPU but rather force us to create the grid on CPU and then convert it.

  1. The metrics for the OrthogonalSphericalShellGrid use scalar operations; see the functions that are called at

    fill_metric_halo_regions_x!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy)
    fill_metric_halo_regions_y!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy)
    fill_metric_halo_corner_regions!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy)

  2. When we create the ConformalCubedSphreGrid, to fill the metric and coordinate horizontal halos properly we use the same functionality that we use to fill halos for prognostic variables. To do that, we create Fields with the coordinate and metric values, call fill_halo_regions! on these fields, and then copy back the data from the Fields to the metrics. Copying the data from the metrics to the fields and vice-versa requires scalar operations, e.g., see

    for region in 1:number_of_regions($(grid))
    getregion($(Symbol(field₁)), region).data .= getregion($(grid), region).$(Symbol(field₁))
    getregion($(Symbol(field₂)), region).data .= getregion($(grid), region).$(Symbol(field₂))
    end
    if $(horizontal_topology) == FullyConnected
    fill_halo_regions!(($(Symbol(field₁)), $(Symbol(field₂))); signed = false)
    end
    for region in 1:number_of_regions($(grid))
    getregion($(grid), region).$(Symbol(field₁)) .= getregion($(Symbol(field₁)), region).data
    getregion($(grid), region).$(Symbol(field₂)) .= getregion($(Symbol(field₂)), region).data
    end

cc @siddharthabishnu

@navidcy navidcy added GPU 👾 Where Oceananigans gets its powers from cubed sphere 🧊🌎 labels May 4, 2024
@navidcy
Copy link
Collaborator Author

navidcy commented May 4, 2024

before this PR (on main):

julia> using Oceananigans, BenchmarkTools; using Oceananigans.Grids: conformal_cubed_sphere_panel

julia> @btime grid = conformal_cubed_sphere_panel(GPU(), size=(64, 64, 2), z=(0, 1), halo=(4, 4, 2), topology=(FullyConnected, FullyConnected, Bounded));
  665.220 ms (4637448 allocations: 139.26 MiB)

after this PR:

julia> using Oceananigans, BenchmarkTools; using Oceananigans.Grids: conformal_cubed_sphere_panel

julia> @btime grid = conformal_cubed_sphere_panel(GPU(), size=(64, 64, 2), z=(0, 1), halo=(4, 4, 2), topology=(FullyConnected, FullyConnected, Bounded));
  193.087 ms (4329325 allocations: 133.34 MiB)

@navidcy
Copy link
Collaborator Author

navidcy commented May 4, 2024

There are quite a few CUDA.@allowscalars in the ConformalCubedSphereGrid, e.g., here (and in other places):

for (field, LX, LY) in zip(fields, LXs, LYs)
expr = quote
$(Symbol(field)) = Field{$(Symbol(LX)), $(Symbol(LY)), Nothing}($(grid))
CUDA.@allowscalar begin
for region in 1:number_of_regions($(grid))
getregion($(Symbol(field)), region).data .= getregion($(grid), region).$(Symbol(field))
end
end
if $(horizontal_topology) == FullyConnected
fill_halo_regions!($(Symbol(field)))
end
CUDA.@allowscalar begin
for region in 1:number_of_regions($(grid))
getregion($(grid), region).$(Symbol(field)) .= getregion($(Symbol(field)), region).data
end
end
end # quote

Need to eliminate those also.

@navidcy
Copy link
Collaborator Author

navidcy commented May 4, 2024

So, I tried to have the ConformalCubedSphereGrid generated on CPU and convert to GPU at the end... Here:

# now convert to proper architecture
archs = tuple(fill(arch, number_of_regions(grid))...)
new_regional_objects = on_architecture.(archs, grid.region_grids.regional_objects)
grid.region_grids.regional_objects = new_regional_objects

But I'm having a bit trouble... it's something MultiRegion-related I think.

@simone-silvestri could you have a look?

julia> grid = ConformalCubedSphereGrid(GPU(), panel_size = (3, 3, 1), z = (0, 1), radius = 1)
ERROR: MethodError: Cannot `convert` an object of type 
  OrthogonalSphericalShellGrid{Float64,FullyConnected,FullyConnected,Bounded,OffsetArrays.OffsetArray{Float64,2,CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}},OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64},Int64}},Float64,NamedTuple{(, , :rotation),Tuple{Tuple{Float64,Float64},Tuple{Float64,Float64},Rotations.RotXY{Float64}}},GPU} to an object of type 
  OrthogonalSphericalShellGrid{Float64,FullyConnected,FullyConnected,Bounded,OffsetArrays.OffsetArray{Float64,2,Matrix{Float64}},OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64},Int64}},Float64,NamedTuple{(, , :rotation),Tuple{Tuple{Float64,Float64},Tuple{Float64,Float64},Rotations.RotXY{Float64}}},CPU}

Closest candidates are:
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:84

Stacktrace:
 [1] cvt1
   @ ./essentials.jl:468 [inlined]
 [2] macro expansion
   @ ./ntuple.jl:72 [inlined]
 [3] ntuple
   @ ./ntuple.jl:69 [inlined]
 [4] convert(::Type{…}, x::Tuple{…})
   @ Base ./essentials.jl:470
 [5] setproperty!(x::MultiRegionObject{…}, f::Symbol, v::Tuple{…})
   @ Base ./Base.jl:40
 [6] (ConformalCubedSphereGrid)(arch::GPU, FT::Type; panel_size::Tuple{…}, z::Tuple{…}, horizontal_direction_halo::Int64, z_halo::Int64, horizontal_topology::Type, z_topology::Type, radius::Int64, partition::CubedSpherePartition{…}, devices::Nothing)
   @ Oceananigans.MultiRegion ~/OC2.jl/src/MultiRegion/multi_region_cubed_sphere_grid.jl:331
 [7] top-level scope
   @ REPL[2]:1
 [8] top-level scope
   @ ~/.julia/packages/CUDA/02Uw6/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types.

The error stems from:

grid.region_grids.regional_objects = new_regional_objects

@glwagner
Copy link
Member

glwagner commented May 4, 2024

Can you outline the challenges of making this native GPU instead of CPU for future developers?

@simone-silvestri
Copy link
Collaborator

Hmmm, I think the grid is immutable so you cannot assign a new type to the grid but you need to ribuild it using the MultiRegionGrid internal constructor


new_regional_objects = on_architecture.(archs, grid.region_grids.regional_objects)

grid.region_grids.regional_objects = new_regional_objects
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you need to use this constructor

 grid = MultiRegionGrid{FT, region_topology...}(arch,
                                                   partition,
                                                   connectivity,
                                                   new_region_grids,
                                                   devices)

Comment on lines 327 to 330
# now convert to proper architecture
archs = tuple(fill(arch, number_of_regions(grid))...)

new_regional_objects = on_architecture.(archs, grid.region_grids.regional_objects)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure there is a on_architecture method for a MultiRegionObject but we should add it so that you can directly write

new_region_grids = on_architecture(arch, region_grids)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we should probably have an on_architecture method for a MultiRegionGrid directly, so we can avoid this and just write

return on_architecture(arch, grid)

@simone-silvestri
Copy link
Collaborator

Apparently there is already a method for MultiRegionGrid

function on_architecture(::CPU, mrg::MultiRegionGrid{FT, TX, TY, TZ}) where {FT, TX, TY, TZ}

but it's only for CPU.
For GPU we have to assume how many gpus we are using.
or we could add something like

on_architecture(::GPU, grid::MultiRegionGrid; devices = nothing)

and use the assign_devices function

function assign_devices(p::AbstractPartition, dev::Number)

to decide where each region lives

@navidcy
Copy link
Collaborator Author

navidcy commented May 5, 2024

Can you outline the challenges of making this native GPU instead of CPU for future developers?

Good point! Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cubed sphere 🧊🌎 GPU 👾 Where Oceananigans gets its powers from
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants