Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codegen: use compiler trampoline closing over method-instance #26565

Merged
merged 1 commit into from
Apr 4, 2018

Conversation

vtjnash
Copy link
Sponsor Member

@vtjnash vtjnash commented Mar 21, 2018

Uses a trampoline to reduce the number of jump checks to happen during dynamic dispatch. It used to be done this way initially, but that got lost for awhile after the function-types PR. This brings back the ability to directly call a method without going through the dispatch helper function.

(over)estimated performance jump is pretty good for certain cases: 17.31 ns => 13.82 ns

julia> @noinline f(a) = a
julia> g = []
julia> @noinline function unstable_no_alloc(n::Int)
                  for s = 1:n
                      f(g)
                  end
              end
@btime unstable_no_alloc(100_000)

@ararslan ararslan added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Mar 21, 2018
uint8_t compile_traced; // if set will notify callback if this linfo is compiled
jl_fptr_t fptr; // jlcall entry point with api specified by jlcall_api
jl_fptr_t unspecialized_ducttape; // if template can't be compiled due to intrinsics, an un-inferred fptr may get stored here, jlcall_api = JL_API_GENERIC
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay :)

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, technically I just made generated functions more broken but just cared less about it. Although, as long as we make sure there's none included as part of basecompiler, I think it should not have any impact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying I have to abandon my plans for the new optimizer to have a dependency on Cxx.jl? Drat!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just about to port codegen.cpp to LLVM.jl... /s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't let that stop you, @maleadt 😃

@vtjnash vtjnash force-pushed the jn/trampoline branch 3 times, most recently from a3fc225 to 0ad4b05 Compare March 23, 2018 15:16
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Mar 26, 2018

@nanosoldier runbenchmarks(ALL, vs = ":master")

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Mar 30, 2018

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 30, 2018

This should have been a pure merge commit benchmark but the memory regression indicates that #26435 (comment) is active in one of the benchmarks but not the other. I have seen similar things happen multiple times before. I think something is off with nanosoldier... @ararslan

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Mar 30, 2018

pure merge commit benchmark

that's not exactly possible – there's nothing for this commit to merge into, it's already on top of master

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 30, 2018

Was that a side mark or did the point of my post not come across (that the baseline benchmark seems to be wrong).

@ararslan
Copy link
Member

Nanosoldier lists the specific commits it's using in the report. Looking through Nanosoldier.jl, I don't see any reason to believe that the commits are being misreported, but maybe there's something I'm missing.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Mar 31, 2018

My thinking is that after #26435 (comment) (which the baseline should for sure include), the benchmark ["broadcast", "typeargs", "(\"tuple\", 10)"] started to allocate. In this PR, there is a Inf regression of memory allocation in that very same benchmark. An Inf memory regression is only possible if the benchmark doesn't allocate in the baseline but we know that it does. Therefore the baseline cannot be what we think it is. Am I missing something?

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Apr 2, 2018

Maybe actually an issue with that PR? A more recent nanosoldier run on a different PR reports an infinite memory improvement on the same benchmark: #26628

@KristofferC
Copy link
Sponsor Member

I don't see how two PRs against master could result in one giving infinity memory improvement (guaranteeing that it used to allocate before the PR) and one giving infinity memory regression (guaranteeing that it used to not allocate before the PR).

Replaces jlcall_api with inspection of the invoke closure
to see if it is a known entity
@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Apr 3, 2018

Let's just try that again. I've rebased so: @nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@vtjnash
Copy link
Sponsor Member Author

vtjnash commented Apr 4, 2018

I don't see any commonality between those two reports, so will merge soon.

@vtjnash vtjnash merged commit bce5bbe into master Apr 4, 2018
@vtjnash vtjnash deleted the jn/trampoline branch April 4, 2018 21:54
maleadt added a commit to JuliaGPU/CUDAnative.jl that referenced this pull request Apr 6, 2018
@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Apr 13, 2018

Unfortunately, this lovely feature is implicated as the worst offender in the #26767 compilation slow-down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants