reimplement `afoldl` using recursion #54478

nsajko · 2024-05-15T17:59:28Z

The new implementation is more elegant and flexible, doing away with the code duplication and extreme amounts of constant-hardcoding.

FTR, this PR is part of a series of changes which (re)implement many of the operations on tuples using a new recursive technique. The ultimate goal is to hopefully increase the value of the loop-vs-recurse cutoff (Any32, sometimes hardcoded 32) for tuple operations.

Demanding type inference example:

julia> isconcretetype(Core.Compiler.return_type(foldl, Tuple{typeof((l,r) -> l => r),Tuple{Vararg{Int,30}}}))
true

Replace the hack implementation with something more elegant and flexible. FTR, this PR is part of a series of changes which (re)implement many of the operations on tuples using a new recursive technique. The ultimate goal is to hopefully increase the value of the loop-vs-recurse cutoff (`Any32`, sometimes hardcoded `32`) for tuple operations. As-is, this creates a performance regression for tuples with length just above the cutoff, e.g., 32-40. This shouldn't matter once the cutoff value is increased, IMO. Demanding type inference example: ```julia-repl julia> isconcretetype(Core.Compiler.return_type(foldl, Tuple{typeof((l,r) -> l => r),Tuple{Vararg{Int,30}}})) true ```

KristofferC · 2024-05-15T18:05:58Z

I don't really see how the current one is a "hack implementation". It seems very straightforward compared to what replaces it here at least...

nsajko · 2024-05-15T18:23:09Z

I don't really see how the current one is a "hack implementation".

Edited PR description to be more clear and less inflammatory.

very straightforward compared to what replaces it here at least...

Perhaps, but note that the first half of the change is not fold-specific, it will hopefully be reused for other implementations. See, e.g., #54479.

vtjnash · 2024-05-15T18:29:43Z

This shouldn't matter once the cutoff value is increased

This is going in the wrong direction. If the change is any good, it should perform better with a lower cutoff. Excessive unrolling as required by this PR is often a sign of an unreliable design

nsajko · 2024-05-15T18:34:29Z

Excessive unrolling as required by this PR

This replaces an implementation where the unrolling is literally hardcoded. So I don't see how it's fair to say that my implementation is the one that requires unrolling. And I can't find any regressions other than the mentioned one.

If the mentioned regression is really a problem, it'd be easy to just add another method for Any32, which would be similar or equal to the current afoldl implementation. EDIT: in the current state of the PR there should be no regressions.

nsajko · 2024-05-15T18:35:47Z

If the change is any good, it should perform better with a lower cutoff.

Also, I don't understand this. Surely increasing the cutoff value is a worthy goal of its own.

JeffBezanson · 2024-05-17T18:23:26Z

base/tuple.jl

+struct _TupleViewFront end
+struct _TupleViewTail end
+const _TupleView = Union{_TupleViewFront,_TupleViewTail}
+_tupleview_length_representation_impl(n::Int) = ((nothing for _ ∈ OneTo(n))...,)::Tuple{Vararg{Nothing}}


Is this really necessary? The decrement is done with arithmetic anyway, so would an integer just work?

I believe it's necessary, but not sure. Will check later.

JeffBezanson · 2024-05-17T18:25:47Z

We have been through a couple implementations of this function and it has been risky to touch it in the past. This needs more motivation; what is the big benefit we are after?

This improved design is much closer to the current behavior, it shouldn't introduce any regressions.

nsajko · 2024-05-18T14:11:14Z

This needs more motivation; what is the big benefit we are after?

My motivation, not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred. The current afoldl implementation approach would require a massive source code increase to accomplish this.

BTW I think the new commit could make this PR more palatable.

nsajko · 2024-05-18T17:42:20Z

The test that fails now is caused by some preexisting issue that was only triggered now, I think. With this PR, f allocates even though g doesn't:

function f()
    as = ntuple(_ -> rand(), Val(10))
    hypot(as...)
end

function g(as::Vararg{Float64,10})
    hypot(as...)
end

So giving the compiler extra information causes the compiler to generate worse code?

KristofferC · 2024-05-20T17:13:48Z

not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred

Okay but why? This has no inherent value so there is something missing here. Is it because you think the unrolled code will be faster to compile or execute.

martinholters · 2024-05-21T09:47:24Z

I think there are three objectives when it comes to implementations of functions acting on tuples like afold:

low run-time
low compile-time
inference precision.

Where it should be noted that 2. and 3. should also be examined for non-concrete argument types.

Ideally, any new implementation should improve at least one of those while not doing worse on any of them. Otherwise, it's a matter of discussion to find suitable trade-offs. I'm unclear how the proposed change fares in this regard.

nsajko added domain:collections Data structures holding multiple items, e.g. sets domain:fold sum, maximum, reduce, foldl, etc. labels May 15, 2024

nsajko marked this pull request as draft May 16, 2024 15:35

JeffBezanson reviewed May 17, 2024

View reviewed changes

improvement

e38b0de

This improved design is much closer to the current behavior, it shouldn't introduce any regressions.

nsajko marked this pull request as ready for review May 18, 2024 14:06

nsajko marked this pull request as draft May 18, 2024 15:15

update a test

61b517d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reimplement `afoldl` using recursion #54478

reimplement `afoldl` using recursion #54478

nsajko commented May 15, 2024 •

edited

KristofferC commented May 15, 2024

nsajko commented May 15, 2024

vtjnash commented May 15, 2024

nsajko commented May 15, 2024 •

edited

nsajko commented May 15, 2024

JeffBezanson May 17, 2024

nsajko May 18, 2024

JeffBezanson commented May 17, 2024

nsajko commented May 18, 2024

nsajko commented May 18, 2024

KristofferC commented May 20, 2024

martinholters commented May 21, 2024

reimplement afoldl using recursion #54478

Are you sure you want to change the base?

reimplement afoldl using recursion #54478

Conversation

nsajko commented May 15, 2024 • edited

KristofferC commented May 15, 2024

nsajko commented May 15, 2024

vtjnash commented May 15, 2024

nsajko commented May 15, 2024 • edited

nsajko commented May 15, 2024

JeffBezanson May 17, 2024

Choose a reason for hiding this comment

nsajko May 18, 2024

Choose a reason for hiding this comment

JeffBezanson commented May 17, 2024

nsajko commented May 18, 2024

nsajko commented May 18, 2024

KristofferC commented May 20, 2024

martinholters commented May 21, 2024

reimplement `afoldl` using recursion #54478

reimplement `afoldl` using recursion #54478

nsajko commented May 15, 2024 •

edited

nsajko commented May 15, 2024 •

edited