Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic precompilation workload #1016

Draft
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

rikhuijzer
Copy link
Member

@rikhuijzer rikhuijzer commented Apr 18, 2023

This PR suggests adding a basic precompile workload now that caching works for inferred (since 1.8) and binary (since 1.9) code.

For people unfamiliar with SnoopPrecompile, the idea is to compile code during the precompilation phase which can then be used during later phases. More specifically, precompilation runs only when installing the package while later phases run every time that you restart Julia. In other words, the benefit of adding workloads to the precompilation phase is that more code will be compiled at an earlier point, which will then reduce the Time To First X (TTFX).

Below, the benchmark shows that this PR will increase the precompilation time by 13 - 2 = 11 seconds while it will reduce the TTFX for a basic workload by 10 - 2 = 8 seconds. Over time, it is expected that these benefits will add up if more and more package maintainers precompile larger parts of their codebase.

Benchmark

With Julia 1.9.0-rc2.

Before this PR

$ julia --project

(MLJ) pkg> precompile
Precompiling environment...
  1 dependency successfully precompiled in 2 seconds. 91 already precompiled.

julia> @time using MLJ
  1.725931 seconds (2.83 M allocations: 188.124 MiB, 3.06% gc time, 50.73% compilation time: 88% of which was recompilation)

julia> @time include("src/precompile.jl")
 10.005304 seconds (25.71 M allocations: 1.648 GiB, 4.45% gc time, 99.05% compilation time: <1% of which was recompilation)

After this PR

$ julia --project

(MLJ) pkg> precompile
Precompiling environment...
  1 dependency successfully precompiled in 13 seconds. 91 already precompiled.

julia> @time using MLJ
  2.004735 seconds (3.48 M allocations: 226.616 MiB, 3.51% gc time, 45.24% compilation time: 89% of which was recompilation)

julia> @time include("src/precompile.jl")
  3.062700 seconds (6.84 M allocations: 448.673 MiB, 3.79% gc time, 97.12% compilation time)

@codecov-commenter
Copy link

Codecov Report

Merging #1016 (2144ad7) into dev (ec999a1) will decrease coverage by 11.28%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##              dev    #1016       +/-   ##
===========================================
- Coverage   42.85%   31.57%   -11.28%     
===========================================
  Files           1        2        +1     
  Lines          28       38       +10     
===========================================
  Hits           12       12               
- Misses         16       26       +10     
Impacted Files Coverage Δ
src/precompile.jl 0.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@rikhuijzer
Copy link
Member Author

Let's wait with this PR until https://github.com/JuliaLang/Precompiler.jl is published. They are currently discussing naming in issue 2.

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool. Thanks for taking this kind of improvement.!

My only question is whether it might be better to pitch in lower, ie, do this for the lower level packages first.

See my annotations to see what belongs to what.

If we're happy just pitching high, then another useful resource might be MLJTestIntegration.jl, which runs a battery of MLJ workflows on specified models. We could just run them on one or two.

schema(data)
y, X = unpack(data, ==(:MedV))
models(matching(X, y))
doc("DecisionTreeClassifier", pkg="DecisionTree")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 5 and 6 test MLJModels.jl functionality; MLJModels is independent of MLJBase

@@ -0,0 +1,32 @@
let
data = load_boston()
schema(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema lives in ScientificTypes.jl

@@ -0,0 +1,32 @@
let
data = load_boston()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from MLJBase.jl


regressor = machine(A(), X, y)
evaluate!(regressor; resampling=CV(; nfolds=2), measure=rms)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rest could go in MLJBase.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants