Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reimplement afoldl using recursion #54478

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

nsajko
Copy link
Contributor

@nsajko nsajko commented May 15, 2024

The new implementation is more elegant and flexible, doing away with the code duplication and extreme amounts of constant-hardcoding.

FTR, this PR is part of a series of changes which (re)implement many of the operations on tuples using a new recursive technique. The ultimate goal is to hopefully increase the value of the loop-vs-recurse cutoff (Any32, sometimes hardcoded 32) for tuple operations.

Demanding type inference example:

julia> isconcretetype(Core.Compiler.return_type(foldl, Tuple{typeof((l,r) -> l => r),Tuple{Vararg{Int,30}}}))
true

Replace the hack implementation with something more elegant and
flexible.

FTR, this PR is part of a series of changes which (re)implement many of
the operations on tuples using a new recursive technique. The ultimate
goal is to hopefully increase the value of the loop-vs-recurse cutoff
(`Any32`, sometimes hardcoded `32`) for tuple operations.

As-is, this creates a performance regression for tuples with length
just above the cutoff, e.g., 32-40. This shouldn't matter once the
cutoff value is increased, IMO.

Demanding type inference example:

```julia-repl
julia> isconcretetype(Core.Compiler.return_type(foldl, Tuple{typeof((l,r) -> l => r),Tuple{Vararg{Int,30}}}))
true
```
@nsajko nsajko added domain:collections Data structures holding multiple items, e.g. sets domain:fold sum, maximum, reduce, foldl, etc. labels May 15, 2024
@KristofferC
Copy link
Sponsor Member

I don't really see how the current one is a "hack implementation". It seems very straightforward compared to what replaces it here at least...

@nsajko
Copy link
Contributor Author

nsajko commented May 15, 2024

I don't really see how the current one is a "hack implementation".

Edited PR description to be more clear and less inflammatory.

very straightforward compared to what replaces it here at least...

Perhaps, but note that the first half of the change is not fold-specific, it will hopefully be reused for other implementations. See, e.g., #54479.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 15, 2024

This shouldn't matter once the cutoff value is increased

This is going in the wrong direction. If the change is any good, it should perform better with a lower cutoff. Excessive unrolling as required by this PR is often a sign of an unreliable design

@nsajko
Copy link
Contributor Author

nsajko commented May 15, 2024

Excessive unrolling as required by this PR

This replaces an implementation where the unrolling is literally hardcoded. So I don't see how it's fair to say that my implementation is the one that requires unrolling. And I can't find any regressions other than the mentioned one.

If the mentioned regression is really a problem, it'd be easy to just add another method for Any32, which would be similar or equal to the current afoldl implementation. EDIT: in the current state of the PR there should be no regressions.

@nsajko
Copy link
Contributor Author

nsajko commented May 15, 2024

If the change is any good, it should perform better with a lower cutoff.

Also, I don't understand this. Surely increasing the cutoff value is a worthy goal of its own.

@nsajko nsajko marked this pull request as draft May 16, 2024 15:35
struct _TupleViewFront end
struct _TupleViewTail end
const _TupleView = Union{_TupleViewFront,_TupleViewTail}
_tupleview_length_representation_impl(n::Int) = ((nothing for _ ∈ OneTo(n))...,)::Tuple{Vararg{Nothing}}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? The decrement is done with arithmetic anyway, so would an integer just work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's necessary, but not sure. Will check later.

@JeffBezanson
Copy link
Sponsor Member

We have been through a couple implementations of this function and it has been risky to touch it in the past. This needs more motivation; what is the big benefit we are after?

This improved design is much closer to the current behavior, it
shouldn't introduce any regressions.
@nsajko nsajko marked this pull request as ready for review May 18, 2024 14:06
@nsajko
Copy link
Contributor Author

nsajko commented May 18, 2024

This needs more motivation; what is the big benefit we are after?

My motivation, not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred. The current afoldl implementation approach would require a massive source code increase to accomplish this.

BTW I think the new commit could make this PR more palatable.

@nsajko nsajko marked this pull request as draft May 18, 2024 15:15
@nsajko
Copy link
Contributor Author

nsajko commented May 18, 2024

The test that fails now is caused by some preexisting issue that was only triggered now, I think. With this PR, f allocates even though g doesn't:

function f()
    as = ntuple(_ -> rand(), Val(10))
    hypot(as...)
end

function g(as::Vararg{Float64,10})
    hypot(as...)
end

So giving the compiler extra information causes the compiler to generate worse code?

@KristofferC
Copy link
Sponsor Member

not sure if it's enough motivation for you devs, is to make it possible to increase the cuttoffs where the tuple operations switch to looping from the current limits at around thirty-ish to a (few?) hundred

Okay but why? This has no inherent value so there is something missing here. Is it because you think the unrolled code will be faster to compile or execute.

@martinholters
Copy link
Member

I think there are three objectives when it comes to implementations of functions acting on tuples like afold:

  1. low run-time
  2. low compile-time
  3. inference precision.

Where it should be noted that 2. and 3. should also be examined for non-concrete argument types.

Ideally, any new implementation should improve at least one of those while not doing worse on any of them. Otherwise, it's a matter of discussion to find suitable trade-offs. I'm unclear how the proposed change fares in this regard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets domain:fold sum, maximum, reduce, foldl, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants