Add function + two-argument method to reducers #35017

baggepinnen · 2020-03-05T13:34:41Z

Motivation

The following style of code is very common in practice

sum(a.-b)

While it is easy to apply a function to all arguments without creating a temporary array and causing allocations, by means of the method sum(f,x), it is slightly more cumbersome to do so when the function takes two arguments. This PR adds such a method to all reduction functions (such as sum,maximum etc.) accepting two arrays, implemented simply by means of zipping the two arrays and calling the reducer(f,x) method.

sum(-, a, b)

Common combinations like sum((a.-b).^2) can easily be realized as sum(abs2 ∘ -, a, b).

mapreduce(-, +, a, b) does infact solve exactly this problem, but it is horribly slow compared to both alternatives above.

Some benchmarks

In the table below, "Eager" denotes sum(a .- b) for various lengths of vectors. "Functional" denotes sum(-, a, b). The middle colums are the timings and the "Reduction" column are the relative timings functional/eager meaning that values below 1 are faster for the method in this PR.
Naturally, the eager method allocates linear memory whereas the functional allocates constant 15.596 ms (2 allocations: 48 bytes)
As for the timings, the functional approach is about 2x faster for very small arrays and very large arrays, whereas results are more even for medium sized arrays.

Length	Eager [ns]	Functional [ns]	Mapreduce [ns]	Reduction
1	34.33	15.28	1023.0	0.445
3	35.83	18.98	1040.0	0.5296
10	43.78	28.28	1061.0	0.6458
30	87.04	53.33	1226.0	0.6128
100	126.9	142.5	1370.0	1.122
300	544.9	376.1	1990.0	0.6901
1000	994.4	1199.0	3232.0	1.206
3000	3260.0	3548.0	6793.0	1.088
10000	11520.0	11770.0	18980.0	1.022
30000	38410.0	35230.0	55850.0	0.9172
100000	142400.0	117200.0	193400.0	0.8232
300000	590500.0	380500.0	695400.0	0.6443
1.0e6	2.245e6	1.269e6	2.282e6	0.5653

Benchmark code

lengths = [1,3,10,30,100,300,1000,3000,10_000,30_000,100_000,300_000,1_000_000]

res1 = map(lengths) do len
    a,b, = randn(len),randn(len)
    @benchmark sum($a .- $b)
end

res2 = map(lengths) do len
    a,b, = randn(len),randn(len)
    @benchmark sum(-, $a, $b)
end

res3 = map(lengths) do len
    a,b, = randn(len),randn(len)
    @benchmark mapreduce(-, +, $a, $b)
end

f1 = plot(lengths, memory.([res1 res2 res3]), lab=["Eager" "Functional" "Mapreduce"], title="Memory", yscale=:log10, xscale=:log10)
f2 = plot(lengths, time.([res1 res2 res3]), lab=["Eager" "Functional" "Mapreduce"], title="Time", yscale=:log10, xscale=:log10)
plot(f1,f2, legend=:topleft) |> display

using PrettyTables
times = time.([res1 res2 res3])
quotient = times[:,2]./times[:,1]
T = pretty_table(round.([lengths times quotient], sigdigits=4), ["Length", "Eager", "Functional", "Mapreduce", "Reduction"], tf=markdown)

johnnychen94 · 2020-03-05T13:56:16Z

base/reducedim.jl

@@ -651,6 +651,8 @@ for (fname, _fname, op) in [(:sum,     :_sum,     :add_sum), (:prod,    :_prod,
        # User-facing methods with keyword arguments
        @inline ($fname)(a::AbstractArray; dims=:) = ($_fname)(a, dims)
        @inline ($fname)(f, a::AbstractArray; dims=:) = ($_fname)(f, a, dims)
+        @inline ($fname)(f, a::AbstractArray, b::AbstractArray; dims=:) =
+            ($_fname)(((a,b),)->f(a,b), zip(a,b), dims)


If so, why don't make it more general? sum(f, args...) as _sum(args->f(args...), zip(args...))

Good point!

If so, why don't make it more general? sum(f, args...) as _sum(args->f(args...), zip(args...))

This could even replace the single argument method, right?

It could, but I will benchmark that first to ensure there's not a performance hit for the zip.

baggepinnen · 2020-03-05T20:07:32Z

Here's an imperfect regexp to see how common this is in practice rg 'sum\([^)]*?-[^(]*?\)'

baggepinnen · 2020-03-05T20:31:59Z

Replacing the 1-arg method with a zip-only method caused quite a bit of regression on sum(abs2, x)

mbauman · 2020-03-05T20:49:49Z

Just as an alternative, you can also do sum(Base.splat(-), zip(a, b)). This appears to be just as fast as the definition you've written in this PR:

julia> a,b = rand(10), rand(10);

julia> @btime sum(-, $a, $b)
  19.606 ns (2 allocations: 48 bytes)
-1.6367587009468854

julia> @btime sum(Base.splat(-), zip($a, $b))
  19.506 ns (2 allocations: 48 bytes)
-1.6367587009468854

tkf · 2020-03-05T21:07:16Z

FYI there is #31020 which is more composable IMHO.

baggepinnen · 2020-03-05T21:21:41Z

Thanks for your comments.

sum(Base.splat(-), zip(a, b)) is not really any more conventient to write than sum(((a,b),)->a-b, zip(a,b))
If sum(Broadcast.broadcasted(-, a, b)) was fast, that would be awesome. It would still require syntax for making the broadcast lazy. RFC: Use @: to construct a broadcasted object #31088 seems to have stalled, so does Implement for f.(args...) syntax #31553?

mbauman · 2020-03-05T21:33:30Z

We've talked about reducers taking multiple arguments (and what it means) in the context of any and all in the past: #20181

tkf · 2020-03-05T22:25:18Z

It would still require syntax for making the broadcast lazy. RFC: Use @: to construct a broadcasted object #31088 seems to have stalled, so does Implement for f.(args...) syntax #31553?

The discussion on the syntax is in #19198. I think it's reasonable to take time to decide the syntax. Meanwhile, we can write sum(@~ a .- b) with @~ from LazyArrays.jl.

baggepinnen · 2020-03-06T00:17:40Z

Closing this in favor of lazy broadcasting

Add function + two-argument method to reducers

ad42225

johnnychen94 reviewed Mar 5, 2020

View reviewed changes

baggepinnen closed this Mar 6, 2020

baggepinnen deleted the patch-2 branch May 4, 2022 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function + two-argument method to reducers #35017

Add function + two-argument method to reducers #35017

baggepinnen commented Mar 5, 2020 •

edited

johnnychen94 Mar 5, 2020

baggepinnen Mar 5, 2020

dkarrasch Mar 5, 2020

baggepinnen Mar 5, 2020

baggepinnen commented Mar 5, 2020

baggepinnen commented Mar 5, 2020

mbauman commented Mar 5, 2020

tkf commented Mar 5, 2020

baggepinnen commented Mar 5, 2020

mbauman commented Mar 5, 2020

tkf commented Mar 5, 2020

baggepinnen commented Mar 6, 2020

Add function + two-argument method to reducers #35017

Add function + two-argument method to reducers #35017

Conversation

baggepinnen commented Mar 5, 2020 • edited

Motivation

Some benchmarks

Benchmark code

johnnychen94 Mar 5, 2020

Choose a reason for hiding this comment

baggepinnen Mar 5, 2020

Choose a reason for hiding this comment

dkarrasch Mar 5, 2020

Choose a reason for hiding this comment

baggepinnen Mar 5, 2020

Choose a reason for hiding this comment

baggepinnen commented Mar 5, 2020

baggepinnen commented Mar 5, 2020

mbauman commented Mar 5, 2020

tkf commented Mar 5, 2020

baggepinnen commented Mar 5, 2020

mbauman commented Mar 5, 2020

tkf commented Mar 5, 2020

baggepinnen commented Mar 6, 2020

baggepinnen commented Mar 5, 2020 •

edited