Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocations for SummaryCallback with custom integrators #1877

Open
DanielDoehring opened this issue Mar 18, 2024 · 5 comments
Open

Allocations for SummaryCallback with custom integrators #1877

DanielDoehring opened this issue Mar 18, 2024 · 5 comments
Labels
performance We are greedy

Comments

@DanielDoehring
Copy link
Contributor

DanielDoehring commented Mar 18, 2024

We observe allocations for the custom integrators, see

@trixi_testset "elixir_hypdiff_nonperiodic.jl" begin
@test_trixi_include(joinpath(EXAMPLES_DIR, "elixir_hypdiff_nonperiodic.jl"),
l2=[0.8799744480157664, 0.8535008397034816, 0.7851383019164209],
linf=[1.0771947577311836, 1.9143913544309838, 2.149549109115789],
tspan=(0.0, 0.1),
coverage_override=(polydeg = 3,)) # Prevent long compile time in CI
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
let
t = sol.t[end]
u_ode = sol.u[end]
du_ode = similar(u_ode)
@test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 15000

@trixi_testset "elixir_advection_timeintegration.jl with parsani_ketcheson_deconinck_erk94" begin
@test_trixi_include(joinpath(EXAMPLES_DIR, "elixir_advection_timeintegration.jl"),
l2=[2.4976673477385313e-5],
linf=[0.0005534166916640881],
ode_algorithm=Trixi.ParsaniKetchesonDeconinck3Sstar94())
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
let
t = sol.t[end]
u_ode = sol.u[end]
du_ode = similar(u_ode)
@test (@allocated Trixi.rhs!(du_ode, u_ode, semi, t)) < 15000

and in the current PR #1871

https://github.com/trixi-framework/Trixi.jl/pull/1871/files#diff-4e3f4096536ca3da518b2f02b6f9197bc83ed9a7aec477b329951667dc1f830c

I observe that the allocations seem to be caused by the SummaryCallback.

For instance, for the example tree_1d_dgsem/elixir_hypdiff_nonperiodic.jl

the alloc check

t = sol.t[end]
u_ode = sol.u[end]
du_ode = similar(u_ode)
@allocated Trixi.rhs!(du_ode, u_ode, semi, t) # 7520

gives 7520 (as for the PERK2 test).

If one supplies the callbacks

callbacks = CallbackSet(#summary_callback, 
                        steady_state_callback,
                        analysis_callback, alive_callback,
                        save_solution,
                        stepsize_callback)

instead of

callbacks = CallbackSet(summary_callback, 
                        steady_state_callback,
                        analysis_callback, alive_callback,
                        save_solution,
                        stepsize_callback)

one obtains 0.

@DanielDoehring DanielDoehring added the performance We are greedy label Mar 18, 2024
@DanielDoehring
Copy link
Contributor Author

Introducing a @trixi_timeit timer() around the callbacks

        @trixi_timeit timer() "Step-Callbacks" begin
            # handle callbacks
            if callbacks isa CallbackSet
                foreach(callbacks.discrete_callbacks) do cb
                    if cb.condition(integrator.u, integrator.t, integrator)
                        cb.affect!(integrator)
                    end
                    return nothing
                end
            end
        end

one obtains for the example mentioned above:

 ────────────────────────────────────────────────────────────────────────────────────────
                Trixi.jl                        Time                    Allocations      
                                       ───────────────────────   ────────────────────────
           Tot / % measured:               6.04ms /  97.8%            347KiB /  97.3%    

 Section                       ncalls     time    %tot     avg     alloc    %tot      avg
 ────────────────────────────────────────────────────────────────────────────────────────
 main loop                          1   5.02ms   85.0%  5.02ms    303KiB   89.7%   303KiB
   Step-Callbacks                 544   2.77ms   46.8%  5.09μs    294KiB   87.1%     554B
     I/O                            6   1.61ms   27.2%   268μs   54.3KiB   16.0%  9.04KiB
       save solution                6   1.60ms   27.1%   266μs   43.8KiB   13.0%  7.30KiB
       ~I/O~                        6   7.98μs    0.1%  1.33μs   10.4KiB    3.1%  1.74KiB
       get element variables        6    226ns    0.0%  37.7ns     0.00B    0.0%    0.00B
       save mesh                    6    167ns    0.0%  27.8ns     0.00B    0.0%    0.00B
       get node variables           6    111ns    0.0%  18.5ns     0.00B    0.0%    0.00B
     ~Step-Callbacks~             544    639μs   10.8%  1.18μs    146KiB   43.1%     275B
     analyze solution               6    504μs    8.5%  84.0μs   94.2KiB   27.9%  15.7KiB
     calculate dt                 544   15.8μs    0.3%  29.0ns     0.00B    0.0%    0.00B
   rhs!                         2.72k   1.77ms   30.0%   651ns   6.61KiB    2.0%    2.49B
     ~rhs!~                     2.72k    982μs   16.6%   361ns   6.61KiB    2.0%    2.49B
     volume integral            2.72k    174μs    2.9%  64.0ns     0.00B    0.0%    0.00B
     source terms               2.72k    114μs    1.9%  41.9ns     0.00B    0.0%    0.00B
     boundary flux              2.72k    108μs    1.8%  39.7ns     0.00B    0.0%    0.00B
     interface flux             2.72k   82.2μs    1.4%  30.2ns     0.00B    0.0%    0.00B
     prolong2interfaces         2.72k   71.7μs    1.2%  26.4ns     0.00B    0.0%    0.00B
     prolong2boundaries         2.72k   67.2μs    1.1%  24.7ns     0.00B    0.0%    0.00B
     surface integral           2.72k   62.4μs    1.1%  22.9ns     0.00B    0.0%    0.00B
     Jacobian                   2.72k   61.9μs    1.0%  22.7ns     0.00B    0.0%    0.00B
     reset ∂u/∂t                2.72k   48.0μs    0.8%  17.6ns     0.00B    0.0%    0.00B
   ~main loop~                      1    324μs    5.5%   324μs   2.20KiB    0.7%  2.20KiB
   Runge-Kutta step             2.72k    155μs    2.6%  57.1ns     0.00B    0.0%    0.00B
 I/O                                2    783μs   13.3%   392μs   19.2KiB    5.7%  9.60KiB
   ~I/O~                            2    466μs    7.9%   233μs   11.9KiB    3.5%  5.95KiB
   save solution                    1    317μs    5.4%   317μs   7.30KiB    2.2%  7.30KiB
   get element variables            1    268ns    0.0%   268ns     0.00B    0.0%    0.00B
   save mesh                        1   28.0ns    0.0%  28.0ns     0.00B    0.0%    0.00B
   get node variables               1   18.0ns    0.0%  18.0ns     0.00B    0.0%    0.00B
 analyze solution                   1    105μs    1.8%   105μs   15.8KiB    4.7%  15.8KiB
 calculate dt                       1    302ns    0.0%   302ns     0.00B    0.0%    0.00B
 ────────────────────────────────────────────────────────────────────────────────────────

Any ideas why we have this for the custom integrators, but not for the ones from OrdinaryDiffEq.jl @ranocha ?

@ranocha
Copy link
Member

ranocha commented Mar 18, 2024

Can you check with @code_warntype or Cthulhu.jl whether there are any type instabilities?

@DanielDoehring
Copy link
Contributor Author

DanielDoehring commented Mar 20, 2024

Hm, output for

@code_warntype summary_callback.condition(integrator.u, integrator.t, integrator)

@code_warntype summary_callback.affect!(integrator)

is all blue.

But from adding some additional @trixi_timeit timer() in the callback sections I suppose that there is a type-instability in
cb.condition(integrator.u, integrator.t, integrator) which we take from OrdinaryDiffEq.jl, right?

@ranocha
Copy link
Member

ranocha commented Mar 20, 2024

Can you go one level higher with Cthulhu.jl and check where the type instability is introduced?

@DanielDoehring
Copy link
Contributor Author

Hm, unfortunately this gives also no new information compared to @code_warntype.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

No branches or pull requests

2 participants