Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function + two-argument method to reducers #35017

Closed
wants to merge 1 commit into from

Conversation

baggepinnen
Copy link
Contributor

@baggepinnen baggepinnen commented Mar 5, 2020

Motivation

The following style of code is very common in practice

sum(a.-b)

While it is easy to apply a function to all arguments without creating a temporary array and causing allocations, by means of the method sum(f,x), it is slightly more cumbersome to do so when the function takes two arguments. This PR adds such a method to all reduction functions (such as sum,maximum etc.) accepting two arrays, implemented simply by means of zipping the two arrays and calling the reducer(f,x) method.

sum(-, a, b)

Common combinations like sum((a.-b).^2) can easily be realized as sum(abs2 ∘ -, a, b).

mapreduce(-, +, a, b) does infact solve exactly this problem, but it is horribly slow compared to both alternatives above.

Some benchmarks

In the table below, "Eager" denotes sum(a .- b) for various lengths of vectors. "Functional" denotes sum(-, a, b). The middle colums are the timings and the "Reduction" column are the relative timings functional/eager meaning that values below 1 are faster for the method in this PR.
Naturally, the eager method allocates linear memory whereas the functional allocates constant 15.596 ms (2 allocations: 48 bytes)
As for the timings, the functional approach is about 2x faster for very small arrays and very large arrays, whereas results are more even for medium sized arrays.

Length Eager [ns] Functional [ns] Mapreduce [ns] Reduction
1 34.33 15.28 1023.0 0.445
3 35.83 18.98 1040.0 0.5296
10 43.78 28.28 1061.0 0.6458
30 87.04 53.33 1226.0 0.6128
100 126.9 142.5 1370.0 1.122
300 544.9 376.1 1990.0 0.6901
1000 994.4 1199.0 3232.0 1.206
3000 3260.0 3548.0 6793.0 1.088
10000 11520.0 11770.0 18980.0 1.022
30000 38410.0 35230.0 55850.0 0.9172
100000 142400.0 117200.0 193400.0 0.8232
300000 590500.0 380500.0 695400.0 0.6443
1.0e6 2.245e6 1.269e6 2.282e6 0.5653

timings

Benchmark code

lengths = [1,3,10,30,100,300,1000,3000,10_000,30_000,100_000,300_000,1_000_000]

res1 = map(lengths) do len
    a,b, = randn(len),randn(len)
    @benchmark sum($a .- $b)
end

res2 = map(lengths) do len
    a,b, = randn(len),randn(len)
    @benchmark sum(-, $a, $b)
end

res3 = map(lengths) do len
    a,b, = randn(len),randn(len)
    @benchmark mapreduce(-, +, $a, $b)
end

f1 = plot(lengths, memory.([res1 res2 res3]), lab=["Eager" "Functional" "Mapreduce"], title="Memory", yscale=:log10, xscale=:log10)
f2 = plot(lengths, time.([res1 res2 res3]), lab=["Eager" "Functional" "Mapreduce"], title="Time", yscale=:log10, xscale=:log10)
plot(f1,f2, legend=:topleft) |> display

using PrettyTables
times = time.([res1 res2 res3])
quotient = times[:,2]./times[:,1]
T = pretty_table(round.([lengths times quotient], sigdigits=4), ["Length", "Eager", "Functional", "Mapreduce", "Reduction"], tf=markdown)

@@ -651,6 +651,8 @@ for (fname, _fname, op) in [(:sum, :_sum, :add_sum), (:prod, :_prod,
# User-facing methods with keyword arguments
@inline ($fname)(a::AbstractArray; dims=:) = ($_fname)(a, dims)
@inline ($fname)(f, a::AbstractArray; dims=:) = ($_fname)(f, a, dims)
@inline ($fname)(f, a::AbstractArray, b::AbstractArray; dims=:) =
($_fname)(((a,b),)->f(a,b), zip(a,b), dims)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, why don't make it more general? sum(f, args...) as _sum(args->f(args...), zip(args...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, why don't make it more general? sum(f, args...) as _sum(args->f(args...), zip(args...))

This could even replace the single argument method, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but I will benchmark that first to ensure there's not a performance hit for the zip.

@baggepinnen
Copy link
Contributor Author

Here's an imperfect regexp to see how common this is in practice rg 'sum\([^)]*?-[^(]*?\)'

@baggepinnen
Copy link
Contributor Author

Replacing the 1-arg method with a zip-only method caused quite a bit of regression on sum(abs2, x)
timings_1

@mbauman
Copy link
Sponsor Member

mbauman commented Mar 5, 2020

Just as an alternative, you can also do sum(Base.splat(-), zip(a, b)). This appears to be just as fast as the definition you've written in this PR:

julia> a,b = rand(10), rand(10);

julia> @btime sum(-, $a, $b)
  19.606 ns (2 allocations: 48 bytes)
-1.6367587009468854

julia> @btime sum(Base.splat(-), zip($a, $b))
  19.506 ns (2 allocations: 48 bytes)
-1.6367587009468854

@tkf
Copy link
Member

tkf commented Mar 5, 2020

FYI there is #31020 which is more composable IMHO.

@baggepinnen
Copy link
Contributor Author

Thanks for your comments.

@mbauman
Copy link
Sponsor Member

mbauman commented Mar 5, 2020

We've talked about reducers taking multiple arguments (and what it means) in the context of any and all in the past: #20181

@tkf
Copy link
Member

tkf commented Mar 5, 2020

The discussion on the syntax is in #19198. I think it's reasonable to take time to decide the syntax. Meanwhile, we can write sum(@~ a .- b) with @~ from LazyArrays.jl.

@baggepinnen
Copy link
Contributor Author

Closing this in favor of lazy broadcasting

@baggepinnen baggepinnen closed this Mar 6, 2020
@baggepinnen baggepinnen deleted the patch-2 branch May 4, 2022 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants