Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added thermal fluid MTK and JuliaSimCompiler benchmark. #939

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

chriselrod
Copy link

These tests depend on a couple proprietary repositories:

  • JuliaSimCompilerRuntime.jl
  • JuliaSimCompiler.jl

@thazhemadam, can you set it up so we can run with these dependencies?
Does it taking more than adding

  JULIA_PKG_SERVER: https://internal.juliahub.com/

to the env in the appropriate workflows?

Additionally, it currently requires the JuliaSimCompiler#cbackendmultifunuse branch, but it will likely be merged shortly.
I would also suggest this branch of XSteam, to fix precompilation: hzgzh/XSteam.jl#2

It times and plots

  • structural_simplify + (for JuliaSimCompiler) the IRSystem
  • defining the ODEProblem + a call to f!.
  • Runtime (via @belapsed)

thermalfluid

Compile times

structural_simplify is much faster when using JuliaSimCompiler than MTK, and this is by far the slowest step of model building.
Waiting over 10 minutes with MTK vs a minute with JuliaSimCompiler has a substantial impact on productivity and iterative model development.

In terms of compiling the simplified model, Julia is substantially slower to compile than C, which is comparable to directly emitting LLVM IR.
Regardless, this time is fairly inconsequential compared to structural_simplify.

Runtimes

This code example uses a lot of registered functions.
I suspect these hurt the performance of the C backend, as they are handled as function pointers that are passed in as arguments to the C function.
There are many calls to elementary functions. The LLVM backend uses the variants that LLVM links in, equivalent to llvmcalling @llvm.pow from Julia, for example, instead of those that come with Julia.
On this compute, the @llvm versions tend to be faster in microbenchmarks. I'd have to look into why the LLVM backend falls behind the Julia backend in runtime performance.

The Julia backend using MTK produces slower code than the Julia or LLVM backends of JuliaSimCompiler, but faster than the C backend here (due to the registered functions).

I could put some of these comments in the document. I didn't mention times explicitly, which may be different on the server vs my desktop.

Note, this takes a long time to build. The largest structural_simplify took MTK more than 11 minutes!

We may want to cut off MTK above a certain number of states.

@thazhemadam
Copy link
Member

@thazhemadam, can you set it up so we can run with these dependencies?
Does it taking more than adding
JULIA_PKG_SERVER: https://internal.juliahub.com/
to the env in the appropriate workflows?

The JuliaHubRegistry can now be added for benchmarks that require it, as of #940.
Setting it up is as simple as just mentioning the path of the respective benchmark(s) in the JULIAHUBREGISTRY_BENCHMARK_TARGETS array here. Nothing additional is required.

JULIAHUBREGISTRY_BENCHMARK_TARGETS=()

@chriselrod
Copy link
Author

Does it require released versions of the packages?

@chriselrod
Copy link
Author

chriselrod commented May 9, 2024

My desktop is one of the antiques that features AVX(512) downclocking. Intel hasn't had that issue since ice lake, and AMD never did.

The Julia code isn't vectorized at all.
The LLVM code is barely vectorized at all, but that's enough to trigger a fairly steep downclock

julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads),(iTLB-load-misses,iTLB-loads),(cache-misses,cache-references)" begin
           foreachf(f_l, 100_000, du, u0, p, 0.0)
       end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles               2.85e+09   33.4%  #  3.8 cycles per ns
┌ instructions             3.82e+09   33.4%  #  1.3 insns per cycle
│ branch-instructions      4.58e+07   33.4%  #  1.2% of insns
└ branch-misses            2.97e+05   33.4%  #  0.6% of branch insns
┌ task-clock               7.51e+08  100.0%  # 750.5 ms
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
┌ L1-dcache-load-misses    2.73e+08   16.7%  # 33.3% of dcache loads
│ L1-dcache-loads          8.19e+08   16.7%
└ L1-icache-load-misses    3.76e+08   16.7%
┌ dTLB-load-misses         5.30e+03   16.7%  #  0.0% of dTLB loads
└ dTLB-loads               8.44e+08   16.7%
┌ iTLB-load-misses         3.64e+04   33.3%  # 26.2% of iTLB loads
└ iTLB-loads               1.39e+05   33.3%
┌ cache-misses             9.63e+04   33.3%  # 19.9% of cache refs
└ cache-references         4.83e+05   33.3%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

julia> @pstats "cpu-cycles,(instructions,branch-instructions,branch-misses),(task-clock,context-switches,cpu-migrations,page-faults),(L1-dcache-load-misses,L1-dcache-loads,L1-icache-load-misses),(dTLB-load-misses,dTLB-loads),(iTLB-load-misses,iTLB-loads),(cache-misses,cache-references)" begin
           foreachf(f_j, 100_000, du, u0, p, 0.0)
       end
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
╶ cpu-cycles               2.71e+09   33.3%  #  4.5 cycles per ns
┌ instructions             5.37e+09   33.4%  #  2.0 insns per cycle
│ branch-instructions      4.60e+08   33.4%  #  8.6% of insns
└ branch-misses            4.76e+06   33.4%  #  1.0% of branch insns
┌ task-clock               6.02e+08  100.0%  # 601.9 ms
│ context-switches         0.00e+00  100.0%
│ cpu-migrations           0.00e+00  100.0%
└ page-faults              0.00e+00  100.0%
┌ L1-dcache-load-misses    1.49e+08   16.7%  #  8.4% of dcache loads
│ L1-dcache-loads          1.77e+09   16.7%
└ L1-icache-load-misses    5.62e+08   16.7%
┌ dTLB-load-misses         0.00e+00   16.6%  #  0.0% of dTLB loads
└ dTLB-loads               1.73e+09   16.6%
┌ iTLB-load-misses         1.46e+04   33.2%  #  0.9% of iTLB loads
└ iTLB-loads               1.71e+06   33.2%
┌ cache-misses             1.67e+03   33.2%  #  6.2% of cache refs
└ cache-references         2.67e+04   33.2%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The Julia version requires 40% more instructions, but also executed at 50% more instructions per clock cycle, which is another issue that may require looking into.

The LLVM backend had 2.7k l1d cache misses/call vs 1.5k for the Julia backend.
That they should be so different is odd.

I'll try to make some improvements (that should be applicable to all backends).

Note that a lot of linear algebra code will use AVX512, so in practice this CPU is going to be downclocked anyway. Newer CPUs do not have this problem (and use less naive clock speed algorithms anyway).

@thazhemadam
Copy link
Member

Does it require released versions of the packages?

Yes, it will.

@ChrisRackauckas
Copy link
Member

What's the status here?

@chriselrod
Copy link
Author

chriselrod commented May 12, 2024

What's the status here?

I made one PR that really improved the LLVM backend's performance here.
I was going to try another idea to see if I can improve things further.

Then we can cut a release.
We need a release to be able to actually run the example with the LLVM and C backends.

@chriselrod
Copy link
Author

Current status using the latest JuliaSimCompiler master:
thermalfluid

@chriselrod
Copy link
Author

chriselrod commented May 12, 2024

For the last set of 3200 equations takes, the ODEProblem + f! compile time of the Julia backends of MTK and JuliaSimCompiler took 38.9s and 43.7s, respectively, while the LLVM backend took 2.5s.
The LLVM backend's runtime was also now better at 5.5e-6 vs 4.8e-6 for Julia and LLVM (MTK was 7.8e-6).

In other words, with JuliaSimCompiler the LLVM backend was 17.5x faster to compile than the Julia backend, and ran 1.145x faster.
Maybe I should write the Julia Expr from the julia backend to a file as an example for the compiler team to look at.

The structual_simplify(JuliaSimCompiler.IRSystem(sys)) took almost 50s, though, so that is the vast majority of the compile time.

Not sure what the relative priorities are, but I'd argue that we shouldn't worry too much about Symbolics until we're serious about ditching it (but if anyone in the open source community wants to contribute to improving it, that'd still be nice).
It leaves way more performance on the table than the other parts of the code, but that's precisely because it's so much more awkward to improve.

I have a couple more ideas that should improve the ODEProblem + f! compile times and f! runtimes that I'd be happy to work on this week.

@ChrisRackauckas
Copy link
Member

I meant the status to getting this merged, i.e. the devops part 😅

@chriselrod
Copy link
Author

I meant the status to getting this merged, i.e. the devops part 😅

That requires a JuliaSimCompiler release.

@chriselrod
Copy link
Author

thermalfluid
On a M1 mac-mini.
This needs the latest commit to avoid a stackoverflow error.

@chriselrod
Copy link
Author

chriselrod commented May 14, 2024

Also, to give any readers an idea of just how bad working with Symbolics.jl is...

julia> irsys = @time @eval IRSystem(testbench);
 40.631866 seconds (92.43 M allocations: 4.919 GiB, 2.90% gc time, 0.00% compilation time)

julia> sys_jsir = @time @eval structural_simplify(irsys);
  2.939211 seconds (19.91 M allocations: 1.412 GiB, 7.14% gc time, 0.00% compilation time)

IRSystem + structural_simplify is way faster than just structural_simplify, but when running both, the conversion to an IRSystem is what takes almost all the time.

@chriselrod
Copy link
Author

chriselrod commented May 21, 2024

It is annoying that buildkite reports green whether it failed or not.
@thazhemadam have any idea why JuliaSimCompilerRuntime failed to install?

┌ Info: Instantiating
└   folder = "benchmarks/ModelingToolkit"
  Activating project at `/cache/build/exclusive-amdci1-0/julialang/scimlbenchmarks-dot-jl/benchmarks/ModelingToolkit`
    Updating registry at `/cache/julia-buildkite-plugin/depots/5b300254-1738-4989-ae0a-f4d2d937f953/registries/General.toml`
    Updating registry at `/cache/julia-buildkite-plugin/depots/5b300254-1738-4989-ae0a-f4d2d937f953/registries/JuliaComputingRegistry.toml`
    Updating registry at `/cache/julia-buildkite-plugin/depots/5b300254-1738-4989-ae0a-f4d2d937f953/registries/JuliaHubRegistry.toml`
ERROR: expected package `JuliaSimCompilerRuntime [9cbdfd5a]` to be registered
Stacktrace:
  [1] pkgerror(msg::String)
    @ Pkg.Types /cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/share/julia/stdlib/v1.10/Pkg/src/Types.jl:70
  [2] check_registered
    @ /cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1288 [inlined]
  [3] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}, level::Pkg.Types.UpgradeLevel; skip_writing_project::Bool, preserve::Nothing)
    @ Pkg.Operations /cache/julia-buildkite-plugin/julia_installs/bin/linux/x64/1.10/julia-1.10-latest-linux-x86_64/share/julia/stdlib/v1.10/Pkg/src/Operations.jl:1537
  [4] up(ctx::Pkg.Types.Context, pkgs::Vector{Pkg.Types.PackageSpec}; level::Pkg.Types.UpgradeLevel, mode::Pkg.Types.PackageMode, preserve::Nothing, update_registry::Bool, skip_writing_project::Bool, kwargs::@Kwargs{})

https://buildkite.com/julialang/scimlbenchmarks-dot-jl/builds/2366#018f9794-f0cf-475e-b63a-6bab9d60fd51/376-385
JuliaSimCompilerRuntime is registered in the JuliaHub registry.

Or, at least it is registered in the inernal package server, so that CI using github actions can install it by setting the JULIA_PKG_SERVER.
Does something different need to be done?

@chriselrod chriselrod closed this May 23, 2024
@chriselrod chriselrod reopened this May 23, 2024
@chriselrod
Copy link
Author

Buildkite isn't running?

@chriselrod chriselrod closed this May 23, 2024
@chriselrod chriselrod reopened this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants