Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing algorithms code, refactoring acceleration directions #55

Merged
merged 8 commits into from
Nov 25, 2021

Conversation

lostella
Copy link
Member

@lostella lostella commented Nov 23, 2021

This PR:

  1. Optimizes the algorithms implementations by means of various fixes (fixing Further allocation savings in FBS, PANOC #21 to some extent)
  2. Refactors acceleration direction for algorithms that support them (line search ones)

Because of 2, this PR is slightly breaking.

Optimizations

Changes include:

  • Avoiding temporary allocations in favor of pre-allocated buffers (arrays) in the state
  • Remove occurrences of variables changing type within their scope
  • Making sure type parameters are all concrete (mainly avoiding unions)
  • Removing type inference issues (and adding checks for it in tests)

Such changes result in a significant reduction in the allocations, as one can see from the benchmark, and consequently runtime. This is more apparent in some algorithm than others, and definitely more apparent on small problems than large ones (where overhead is less significant).

Refactoring

Acceleration strategies now have different "styles". Each style says something about how the underlying object should be used. For example:

  • NesterovStyle denotes sequences of extrapolation coefficients
  • QuasiNewtonStyle denotes linear operators that can be updated

Wherever needed, algorithms can dispatch on the style, and use the underlying object the way they need to compute acceleration directions the way they need.

A good example to understand how this is used is in DRLS, which supports both the above styles.

A (good) side effect of this story is that one does not need to provide the size of the decision varable at "configuration time": that means one specifies L-BFGS with memory 5 as LBFGS(5) instead of LBFGS(x, 5) (where x was usually the initial iterate, only used here to allocate buffers of the appropriate size).

Other changes

The default max number of iteration for SFISTA was increased to 10 thousands, like other proximal gradient implementations.

@codecov
Copy link

codecov bot commented Nov 23, 2021

Codecov Report

Merging #55 (e3181da) into master (2a04316) will increase coverage by 0.33%.
The diff coverage is 96.00%.

❗ Current head e3181da differs from pull request most recent head ba692d1. Consider uploading reports for the commit ba692d1 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
+ Coverage   88.65%   88.99%   +0.33%     
==========================================
  Files          20       21       +1     
  Lines         802      854      +52     
==========================================
+ Hits          711      760      +49     
- Misses         91       94       +3     
Impacted Files Coverage Δ
src/ProximalAlgorithms.jl 100.00% <ø> (ø)
src/algorithms/douglas_rachford.jl 95.65% <ø> (ø)
src/algorithms/primal_dual.jl 63.55% <ø> (ø)
src/accel/traits.jl 75.00% <75.00%> (ø)
src/algorithms/zerofpr.jl 92.00% <86.36%> (-3.72%) ⬇️
src/accel/nesterov.jl 90.90% <90.90%> (+2.02%) ⬆️
src/algorithms/panoc.jl 95.65% <97.05%> (-0.86%) ⬇️
src/algorithms/drls.jl 95.00% <97.29%> (+0.17%) ⬆️
src/accel/anderson.jl 100.00% <100.00%> (+6.66%) ⬆️
src/accel/broyden.jl 100.00% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a04316...ba692d1. Read the comment docs.

@lostella
Copy link
Member Author

There are some type inference errors on Julia 1.5, I'll look into those

@lostella
Copy link
Member Author

lostella commented Nov 23, 2021

y::Tx = similar(x)
r::Tx = similar(x)
z::Tx = similar(x)
res::Tx = similar(x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this completely safe? Sometimes similar returns NaNs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be safe, as long as the array is initialized before it's used, otherwise there's a problem in Julia Base!

The thing is that similar is faster than zero or copy (I can get numbers about how faster). Actually, I should check if there's other places where similar could be used!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this out:

julia> x = zeros(10); any([any(isnan.(similar(x))) for i = 1:1000])
true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just hate bugs involving 'NaN's. 😆 Is there really a significant speed up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll check, I’m just obsessing a bit over getting rid of any superfluous operation 😅

Copy link
Member Author

@lostella lostella Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For n = 1000 and using Float64, similar is about 4x faster than zero (I'm looking at the median). The time ratio gets larger as n increases (which makes sense)

julia> n = 1000; x = randn(n);

julia> @benchmark zero($x)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
 Range (min  max):  424.631 ns  42.357 μs  ┊ GC (min  max):  0.00%  97.30%
 Time  (median):       1.062 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):     2.132 μs ±  5.725 μs  ┊ GC (mean ± σ):  50.82% ± 17.80%

  ▅█                                                           ▁
  ██▇▅▅▇▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▃▃▄▄▅▅▅▅▄▅▅▅▆▆▇█▇▇▆▇▇ █
  425 ns        Histogram: log(frequency) by time      33.9 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

julia> @benchmark similar($x)
BenchmarkTools.Trial: 5065 samples with 883 evaluations.
 Range (min  max):  155.805 ns  10.300 μs  ┊ GC (min  max):  0.00%  96.60%
 Time  (median):     283.980 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.112 μs ±  1.958 μs  ┊ GC (mean ± σ):  74.49% ± 34.68%

  █▇▁                                ▂▂▁▁▁                     ▁
  ███▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▆▆▆▄▄████████▇▇▇█▇█▇██▇▇█▇▇▇▇▇▆ █
  156 ns        Histogram: log(frequency) by time      7.48 μs <

 Memory estimate: 7.94 KiB, allocs estimate: 1.

x_d::Tx = zero(x)
xbar_prev::Tx = similar(x)
d::Tx = similar(x)
x_d::Tx = similar(x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before

Copy link
Collaborator

@nantonel nantonel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some very minor comments, feel free to discard them!

  • Terminology could be kept consistent using *Acceleration on the acceleration types
  • I prefer using zero instead of similar. similar sometimes gives NaNs. I'm sure this doesn't happen as you write in-place those vector afterwards. However sometimes you change the code assume for example the vector is full of zeros and get some annoying bugs.

nantonel
nantonel previously approved these changes Nov 24, 2021
@lostella
Copy link
Member Author

@nantonel I decided to double down on turning zero into similar: while I understand that it may be the source of some headache, it’s probably the tests job to surface any issue with non-initialized arrays. And if an array doesn’t get properly initialized, it might even be better to have some NaN or Inf or whatever spread around and fail a test, than silently have an algorithm apparently “work” (but likely not work as intended).

One more approval and I’ll merge this! Thanks for the review!

Copy link
Collaborator

@nantonel nantonel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! As I told you mine were only very minor suggestions 😉

@lostella lostella merged commit 60d4958 into JuliaFirstOrder:master Nov 25, 2021
@lostella lostella deleted the algo-improvements-20211108 branch November 25, 2021 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants