-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizing algorithms code, refactoring acceleration directions #55
Optimizing algorithms code, refactoring acceleration directions #55
Conversation
Codecov Report
@@ Coverage Diff @@
## master #55 +/- ##
==========================================
+ Coverage 88.65% 88.99% +0.33%
==========================================
Files 20 21 +1
Lines 802 854 +52
==========================================
+ Hits 711 760 +49
- Misses 91 94 +3
Continue to review full report at Codecov.
|
There are some type inference errors on Julia 1.5, I'll look into those |
y::Tx = similar(x) | ||
r::Tx = similar(x) | ||
z::Tx = similar(x) | ||
res::Tx = similar(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this completely safe? Sometimes similar returns NaNs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be safe, as long as the array is initialized before it's used, otherwise there's a problem in Julia Base
!
The thing is that similar
is faster than zero
or copy
(I can get numbers about how faster). Actually, I should check if there's other places where similar
could be used!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check this out:
julia> x = zeros(10); any([any(isnan.(similar(x))) for i = 1:1000])
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just hate bugs involving 'NaN's. 😆 Is there really a significant speed up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll check, I’m just obsessing a bit over getting rid of any superfluous operation 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For n = 1000
and using Float64
, similar
is about 4x faster than zero
(I'm looking at the median). The time ratio gets larger as n
increases (which makes sense)
julia> n = 1000; x = randn(n);
julia> @benchmark zero($x)
BenchmarkTools.Trial: 10000 samples with 198 evaluations.
Range (min … max): 424.631 ns … 42.357 μs ┊ GC (min … max): 0.00% … 97.30%
Time (median): 1.062 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.132 μs ± 5.725 μs ┊ GC (mean ± σ): 50.82% ± 17.80%
▅█ ▁
██▇▅▅▇▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▃▃▄▄▅▅▅▅▄▅▅▅▆▆▇█▇▇▆▇▇ █
425 ns Histogram: log(frequency) by time 33.9 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
julia> @benchmark similar($x)
BenchmarkTools.Trial: 5065 samples with 883 evaluations.
Range (min … max): 155.805 ns … 10.300 μs ┊ GC (min … max): 0.00% … 96.60%
Time (median): 283.980 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.112 μs ± 1.958 μs ┊ GC (mean ± σ): 74.49% ± 34.68%
█▇▁ ▂▂▁▁▁ ▁
███▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▆▆▆▄▄████████▇▇▇█▇█▇██▇▇█▇▇▇▇▇▆ █
156 ns Histogram: log(frequency) by time 7.48 μs <
Memory estimate: 7.94 KiB, allocs estimate: 1.
x_d::Tx = zero(x) | ||
xbar_prev::Tx = similar(x) | ||
d::Tx = similar(x) | ||
x_d::Tx = similar(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some very minor comments, feel free to discard them!
- Terminology could be kept consistent using
*Acceleration
on the acceleration types - I prefer using
zero
instead ofsimilar
.similar
sometimes gives NaNs. I'm sure this doesn't happen as you write in-place those vector afterwards. However sometimes you change the code assume for example the vector is full of zeros and get some annoying bugs.
@nantonel I decided to double down on turning One more approval and I’ll merge this! Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! As I told you mine were only very minor suggestions 😉
This PR:
Because of 2, this PR is slightly breaking.
Optimizations
Changes include:
Such changes result in a significant reduction in the allocations, as one can see from the benchmark, and consequently runtime. This is more apparent in some algorithm than others, and definitely more apparent on small problems than large ones (where overhead is less significant).
Refactoring
Acceleration strategies now have different "styles". Each style says something about how the underlying object should be used. For example:
Wherever needed, algorithms can dispatch on the style, and use the underlying object the way they need to compute acceleration directions the way they need.
A good example to understand how this is used is in DRLS, which supports both the above styles.
A (good) side effect of this story is that one does not need to provide the size of the decision varable at "configuration time": that means one specifies L-BFGS with memory 5 as
LBFGS(5)
instead ofLBFGS(x, 5)
(wherex
was usually the initial iterate, only used here to allocate buffers of the appropriate size).Other changes
The default max number of iteration for SFISTA was increased to 10 thousands, like other proximal gradient implementations.