-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Side traces #32
Comments
(One "magical" solution to give each call a separate root trace might be for |
Could not resist continuing this line of investigation a little bit. I made an empirical test case that repeatedly (100x) runs this same snippet of code: reduce(operator.add, 0, map(function(x) return x^2 end, range(10e6))) ... and uses the x86 Performance Monitoring Unit to measure how many instructions are executed per iteration. On each of the 100 runs the snippet is loaded with Code and results for 100 iterations is here: https://gist.github.com/lukego/5c64368bdfdcf7ffcc4dc067fc630182 Here is the summary as I see it:
Quite an undertaking to understand and optimize code with a tracing JIT! Solution will be to find a way to give each call a separate root trace. Then they should each take only 9 instructions like the first one. |
Proof-of-concept solution to issue luafun#32. Compiles the root trace for the foldl-loop separately for each calling function. This reduces excessive side tracing. For the specific test case from luafun#32 it resolves the problem perfectly and makes sure that each of the 100 traces has a nine-instruction loop. This solution is not perfect though: if a single function makes multiple calls to foldl then they will be compiled together. Could be nicer to find a way to compile each call-site separately. Just have to compute a better key to the foldl_impl table. For now this should be sufficient for evaluating whether the optimization is beneficial in practice. This change uses one very recent LuaJIT feature, the 'proto' field of debug information, so you need at least LuaJIT/LuaJIT@fb61f7c.
I am really happy to discover this project! I am also interested in writing code that specifically exploits the strengths of LuaJIT. I find your example on the README very powerful.
There is one optimization challenge on my mind that I think we share: how to prevent unwanted side-traces. I have written a little bit about this over at LuaJIT/LuaJIT#208 and had some success with optimizations - but not really nailed the issue closed.
Here is an illustration of how it seems to affect luafun. Consider this example program that is based on the README but with multiple operations:
The first call does indeed compile down to a tight loop of around 10 instructions. However, the second call compiles down to around 62 instructions - even though it is exactly the same code! The difference is that the second call is compiling to a side trace within the
reduce()
subroutine and on each iteration it re-enters the root trace and repeats all of the loop invariant code. So you end up with the first call compiling very tight while the subsequent ones are less efficient. There is a cumulative effect too: each call will create a new side-trace, and the side traces will be chained together, and each link in the chain will add around 9 more instructions.Just wondering: are you already aware of this issue? If not then you may be able to solve it in a neat way using the tips from @corsix in LuaJIT/LuaJIT#208. If you are already aware of this I would be very interested in your view e.g. if you already have some recommendations that you make to users to explain the best usage and generally set expectations.
Sadly it seems like the simplest workaround would be to avoid programming in a functional style :-) because the problem only arises when many different pieces of code are all executing the same loop statement (
for
,while
, etc) in the source code. If you would writefor ... end
every time then you have no problem because eachfor
bytecode will have a separate trace complex.The text was updated successfully, but these errors were encountered: