Speedup constructing a `ConformalCubedSphereGrid` on GPU #3579

navidcy · 2024-05-04T07:29:02Z

This PR removes a bunch of scalar operations introduced by #3575 on the GPU and instead performs operations on CPU and converts to CuArrays afterwards.

There are two issues that don't allow us to create the grid natively on the GPU but rather force us to create the grid on CPU and then convert it.

The metrics for the OrthogonalSphericalShellGrid use scalar operations; see the functions that are called at

Oceananigans.jl/src/Grids/orthogonal_spherical_shell_grid.jl

Lines 780 to 782 in 4c853b9

    
           fill_metric_halo_regions_x!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy) 
        
           fill_metric_halo_regions_y!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy) 
        
           fill_metric_halo_corner_regions!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy)

When we create the ConformalCubedSphreGrid, to fill the metric and coordinate horizontal halos properly we use the same functionality that we use to fill halos for prognostic variables. To do that, we create Fields with the coordinate and metric values, call fill_halo_regions! on these fields, and then copy back the data from the Fields to the metrics. Copying the data from the metrics to the fields and vice-versa requires scalar operations, e.g., see

Oceananigans.jl/src/MultiRegion/cubed_sphere_grid.jl

Lines 280 to 292 in 7f12be3

    
           for region in 1:number_of_regions($(grid)) 
        
               getregion($(Symbol(field₁)), region).data .= getregion($(grid), region).$(Symbol(field₁)) 
        
               getregion($(Symbol(field₂)), region).data .= getregion($(grid), region).$(Symbol(field₂)) 
        
           end 
        
           if $(horizontal_topology) == FullyConnected 
        
               fill_halo_regions!(($(Symbol(field₁)), $(Symbol(field₂))); signed = false) 
        
           end 
        
           for region in 1:number_of_regions($(grid)) 
        
               getregion($(grid), region).$(Symbol(field₁)) .= getregion($(Symbol(field₁)), region).data 
        
               getregion($(grid), region).$(Symbol(field₂)) .= getregion($(Symbol(field₂)), region).data 
        
           end

cc @siddharthabishnu

navidcy · 2024-05-04T08:19:23Z

before this PR (on main):

julia> using Oceananigans, BenchmarkTools; using Oceananigans.Grids: conformal_cubed_sphere_panel

julia> @btime grid = conformal_cubed_sphere_panel(GPU(), size=(64, 64, 2), z=(0, 1), halo=(4, 4, 2), topology=(FullyConnected, FullyConnected, Bounded));
  665.220 ms (4637448 allocations: 139.26 MiB)

after this PR:

julia> using Oceananigans, BenchmarkTools; using Oceananigans.Grids: conformal_cubed_sphere_panel

julia> @btime grid = conformal_cubed_sphere_panel(GPU(), size=(64, 64, 2), z=(0, 1), halo=(4, 4, 2), topology=(FullyConnected, FullyConnected, Bounded));
  193.087 ms (4329325 allocations: 133.34 MiB)

navidcy · 2024-05-04T08:51:31Z

There are quite a few CUDA.@allowscalars in the ConformalCubedSphereGrid, e.g., here (and in other places):

Oceananigans.jl/src/MultiRegion/multi_region_cubed_sphere_grid.jl

Lines 246 to 265 in 2752616

    
           for (field, LX, LY) in zip(fields, LXs, LYs) 
        
               expr = quote 
        
                   $(Symbol(field)) = Field{$(Symbol(LX)), $(Symbol(LY)), Nothing}($(grid)) 
        
                   CUDA.@allowscalar begin 
        
                       for region in 1:number_of_regions($(grid)) 
        
                           getregion($(Symbol(field)), region).data .= getregion($(grid), region).$(Symbol(field)) 
        
                       end 
        
                   end 
        
                   if $(horizontal_topology) == FullyConnected 
        
                       fill_halo_regions!($(Symbol(field))) 
        
                   end 
        
                   CUDA.@allowscalar begin 
        
                       for region in 1:number_of_regions($(grid)) 
        
                           getregion($(grid), region).$(Symbol(field)) .= getregion($(Symbol(field)), region).data 
        
                       end 
        
                   end 
        
               end # quote

Need to eliminate those also.

navidcy · 2024-05-04T10:45:10Z

So, I tried to have the ConformalCubedSphereGrid generated on CPU and convert to GPU at the end... Here:

Oceananigans.jl/src/MultiRegion/multi_region_cubed_sphere_grid.jl

Lines 327 to 332 in 9e04110

    
           # now convert to proper architecture 
        
           archs = tuple(fill(arch, number_of_regions(grid))...) 
        
           new_regional_objects = on_architecture.(archs, grid.region_grids.regional_objects) 
        
           grid.region_grids.regional_objects = new_regional_objects

But I'm having a bit trouble... it's something MultiRegion-related I think.

@simone-silvestri could you have a look?

julia> grid = ConformalCubedSphereGrid(GPU(), panel_size = (3, 3, 1), z = (0, 1), radius = 1)
ERROR: MethodError: Cannot `convert` an object of type 
  OrthogonalSphericalShellGrid{Float64,FullyConnected,FullyConnected,Bounded,OffsetArrays.OffsetArray{Float64,2,CUDA.CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}},OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64},Int64}},Float64,NamedTuple{(:ξ, :η, :rotation),Tuple{Tuple{Float64,Float64},Tuple{Float64,Float64},Rotations.RotXY{Float64}}},GPU} to an object of type 
  OrthogonalSphericalShellGrid{Float64,FullyConnected,FullyConnected,Bounded,OffsetArrays.OffsetArray{Float64,2,Matrix{Float64}},OffsetArrays.OffsetArray{Float64,1,StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64},Int64}},Float64,NamedTuple{(:ξ, :η, :rotation),Tuple{Tuple{Float64,Float64},Tuple{Float64,Float64},Rotations.RotXY{Float64}}},CPU}

Closest candidates are:
  convert(::Type{T}, ::T) where T
   @ Base Base.jl:84

Stacktrace:
 [1] cvt1
   @ ./essentials.jl:468 [inlined]
 [2] macro expansion
   @ ./ntuple.jl:72 [inlined]
 [3] ntuple
   @ ./ntuple.jl:69 [inlined]
 [4] convert(::Type{…}, x::Tuple{…})
   @ Base ./essentials.jl:470
 [5] setproperty!(x::MultiRegionObject{…}, f::Symbol, v::Tuple{…})
   @ Base ./Base.jl:40
 [6] (ConformalCubedSphereGrid)(arch::GPU, FT::Type; panel_size::Tuple{…}, z::Tuple{…}, horizontal_direction_halo::Int64, z_halo::Int64, horizontal_topology::Type, z_topology::Type, radius::Int64, partition::CubedSpherePartition{…}, devices::Nothing)
   @ Oceananigans.MultiRegion ~/OC2.jl/src/MultiRegion/multi_region_cubed_sphere_grid.jl:331
 [7] top-level scope
   @ REPL[2]:1
 [8] top-level scope
   @ ~/.julia/packages/CUDA/02Uw6/src/initialization.jl:206
Some type information was truncated. Use `show(err)` to see complete types.

The error stems from:

Oceananigans.jl/src/MultiRegion/multi_region_cubed_sphere_grid.jl

Line 332 in 9e04110

grid.region_grids.regional_objects = new_regional_objects

glwagner · 2024-05-04T22:52:48Z

Can you outline the challenges of making this native GPU instead of CPU for future developers?

simone-silvestri · 2024-05-04T23:32:11Z

Hmmm, I think the grid is immutable so you cannot assign a new type to the grid but you need to ribuild it using the MultiRegionGrid internal constructor

simone-silvestri · 2024-05-04T23:34:08Z

src/MultiRegion/multi_region_cubed_sphere_grid.jl

+
+    new_regional_objects = on_architecture.(archs, grid.region_grids.regional_objects)
+
+    grid.region_grids.regional_objects = new_regional_objects


here you need to use this constructor

grid = MultiRegionGrid{FT, region_topology...}(arch, partition, connectivity, new_region_grids, devices)

simone-silvestri · 2024-05-04T23:35:39Z

src/MultiRegion/multi_region_cubed_sphere_grid.jl

+    # now convert to proper architecture
+    archs = tuple(fill(arch, number_of_regions(grid))...)
+
+    new_regional_objects = on_architecture.(archs, grid.region_grids.regional_objects)


I am not sure there is a on_architecture method for a MultiRegionObject but we should add it so that you can directly write

new_region_grids = on_architecture(arch, region_grids)

also, we should probably have an on_architecture method for a MultiRegionGrid directly, so we can avoid this and just write

return on_architecture(arch, grid)

simone-silvestri · 2024-05-04T23:40:09Z

Apparently there is already a method for MultiRegionGrid

Oceananigans.jl/src/MultiRegion/multi_region_grid.jl

Line 233 in 3cf9581

    
           function on_architecture(::CPU, mrg::MultiRegionGrid{FT, TX, TY, TZ}) where {FT, TX, TY, TZ}

but it's only for CPU.
For GPU we have to assume how many gpus we are using.
or we could add something like

on_architecture(::GPU, grid::MultiRegionGrid; devices = nothing)

and use the assign_devices function

Oceananigans.jl/src/MultiRegion/multi_region_utils.jl

Line 26 in 3cf9581

function assign_devices(p::AbstractPartition, dev::Number)

to decide where each region lives

navidcy · 2024-05-05T07:29:39Z

Can you outline the challenges of making this native GPU instead of CPU for future developers?

Good point! Done

assert cuda-enabled gpu exists

d6e3776

navidcy added GPU 👾 Where Oceananigans gets its powers from cubed sphere 🧊🌎 labels May 4, 2024

navidcy added 2 commits May 4, 2024 10:49

minor simplification

8b02ace

remove scalar operations on GPU

2752616

navidcy requested review from siddharthabishnu and glwagner May 4, 2024 08:03

navidcy added 2 commits May 4, 2024 13:23

better comment

2d0d3c8

remove scalar operations

9e04110

navidcy requested a review from simone-silvestri May 4, 2024 10:51

simone-silvestri reviewed May 4, 2024

View reviewed changes

navidcy added 2 commits May 5, 2024 10:23

bit of code cleanup

4c853b9

file rename

7f12be3

navidcy and others added 3 commits May 5, 2024 10:29

Merge branch 'main' into ncc-sb/gpu-conformal-cubed-sphere-faster

88c44b4

Bug fix

56f4081

Merge main into gpu-conformal-cubed-sphere-faster

aa1e3ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup constructing a `ConformalCubedSphereGrid` on GPU #3579

Speedup constructing a `ConformalCubedSphereGrid` on GPU #3579

navidcy commented May 4, 2024 •

edited

navidcy commented May 4, 2024 •

edited

navidcy commented May 4, 2024

navidcy commented May 4, 2024

glwagner commented May 4, 2024

simone-silvestri commented May 4, 2024

simone-silvestri May 4, 2024 •

edited

simone-silvestri May 4, 2024

simone-silvestri May 4, 2024

simone-silvestri commented May 4, 2024

navidcy commented May 5, 2024

	fill_metric_halo_regions_x!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy)
	fill_metric_halo_regions_y!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy)
	fill_metric_halo_corner_regions!(metric, LX(), LY(), TX(), TY(), Nx, Ny, Hx, Hy)

	for region in 1:number_of_regions($(grid))
	getregion($(Symbol(field₁)), region).data .= getregion($(grid), region).$(Symbol(field₁))
	getregion($(Symbol(field₂)), region).data .= getregion($(grid), region).$(Symbol(field₂))
	end

	if $(horizontal_topology) == FullyConnected
	fill_halo_regions!(($(Symbol(field₁)), $(Symbol(field₂))); signed = false)
	end

	for region in 1:number_of_regions($(grid))
	getregion($(grid), region).$(Symbol(field₁)) .= getregion($(Symbol(field₁)), region).data
	getregion($(grid), region).$(Symbol(field₂)) .= getregion($(Symbol(field₂)), region).data
	end


		new_regional_objects = on_architecture.(archs, grid.region_grids.regional_objects)

		grid.region_grids.regional_objects = new_regional_objects

Speedup constructing a ConformalCubedSphereGrid on GPU #3579

Are you sure you want to change the base?

Speedup constructing a ConformalCubedSphereGrid on GPU #3579

Conversation

navidcy commented May 4, 2024 • edited

navidcy commented May 4, 2024 • edited

navidcy commented May 4, 2024

navidcy commented May 4, 2024

glwagner commented May 4, 2024

simone-silvestri commented May 4, 2024

simone-silvestri May 4, 2024 • edited

Choose a reason for hiding this comment

simone-silvestri May 4, 2024

Choose a reason for hiding this comment

simone-silvestri May 4, 2024

Choose a reason for hiding this comment

simone-silvestri commented May 4, 2024

navidcy commented May 5, 2024

Speedup constructing a `ConformalCubedSphereGrid` on GPU #3579

Speedup constructing a `ConformalCubedSphereGrid` on GPU #3579

navidcy commented May 4, 2024 •

edited

navidcy commented May 4, 2024 •

edited

simone-silvestri May 4, 2024 •

edited