New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLVMDoubleVisitor init is slow, added example as benchmark #1612
base: master
Are you sure you want to change the base?
Conversation
Hi, I've run clang-format and found that the code formatting is good. Thanks for fixing the formatting. |
@ichumuh thanks for writing a test case. What is the difference in the llvm init? Is the only difference that in the first case you give it Jacobian that is using a cache, and in the second case it is not using a cache? Or is there another difference. Why would it be 6x slower? Any ideas? |
I'm only using cache in the 1. test and no cache in the 2. because its the faster way to calculate the jacobian, it doesn't influence llvm init. My goal was to highlight the similarity to #1592. |
What I don't understand is that just by switching to use a cache to compute the Jacobian it somehow makes the LLVM init slower. It seems the LLVM init sees exactly the same expression in both cases. Although in the first case, some subexpressions are reused (have higher reference count). We just need to debug this to see what is happening. |
@certik, the first 2 timings are for 1 expression and the last 2 are a different expression. llvm timing shouldn't have had |
@ichumuh, can you run callgrind on just the llvmvisitor call and post the the functions that take the most time? |
I usually use |
Looking at the png, I don't think there's much we can do. You can try reducing the optimization level, but that would affect the runtime. |
We're currently using essentially "-O3" here: symengine/symengine/llvm_double.cpp Line 294 in 4ea4ba7
We should probably use our option "opt_level" to set this in accordance. Perhaps also expose But as Isuru points out, it will have an effect on runtime. (If one really wanted, one could probably build a two-tier backend to do some latency hiding by compiling a llvm-callback in one thread and and using a LambdaDoubleVisitor in the main thread until the compilation is done) |
Alternatively we can precompute the Lambda and save the optimized LLVM IR, or perhaps even the compiled object code, so that one can precompute it (that will take some time), but then the second time it can get loaded immediately. |
This is already possible with the |
Reducing the optimization level does not significantly influence the runtime.
That's what I'm currently doing. So it seems like there is not much that can be done. Thanks for looking into it. You can close the PR if you'd like. |
@ichumuh I'm just curious: if you change LLVMDoubleVisitor to LambdaDoubleVisitor, what does the timings look like then? (how many evaluations do you perform compared to init?) (also: I wouldn't expect opt_level to help in its current state, I suspect we need to have it influence |
@bjodah with
But the resulting function takes too much time to evaluate. |
@ichumuh, can you try isuruf@ac7b9bb with the opt_level? |
(edit 2. without common subexpressions (without cache) now also uses llvm with opt level)
I was also trying to compare how much time |
There are two signatures for call, one takes two pointers where the non-const pointer is the "output array" (you need to allocate for it using e.g. malloc or making a std::vector and passing its |
@bjodah Thanks.
|
with
|
Great. So it looks like opt_level=0 is a win for your use case here? (am I reading your timings correctly?) I see the timings for call is in the sub millisecond range. You said you did "hundreds" of evaluations for each init? Perhaps time a few a loop of a few hundred calls then? (it would also make the comparison with the LambdaRealDoubleVisitor more fair?) |
Yes! Exposing |
- move expressions to separate visitor_expressions.h header - add SYMENGINE_BENCHMARK_VISITORS macro to generate benchmarks for all expressions add visitor_init benchmark to benchmark init() for - LambdaRealDoubleVisitor - LLVMDoubleVisitor - LLVMFloatVisitor add more expressions - two large expressions based on symengine#1612 - one expression copied from the llvm tests
- LambdaRealDoubleVisitor - LLVMDoubleVisitor - LLVMFloatVisitor refactor visitor benchmarks - move expressions to visitor_expressions.h - add SYMENGINE_BENCHMARK_VISITORS macro to generate benchmarks for all expressions add more expressions - two large expressions based on symengine#1612 - one expression with many intrinsic functions copied from test_lambda_double.cpp
- LambdaRealDoubleVisitor - LLVMDoubleVisitor - LLVMFloatVisitor refactor visitor benchmarks - move expressions to visitor_expressions.h - add SYMENGINE_BENCHMARK_VISITORS macro to generate benchmarks for all expressions add more expressions - two large expressions based on symengine#1612 - one expression with many intrinsic functions copied from test_lambda_double.cpp
Hi guys,
similar to diff on expressions with common sub expressions, llvm init is also very slow.
Since you managed to make diff super fast for my usecase, I was wondering if the same would be possible for llvm init?
The benchmark is an example, similar to
diff_cache
.My output is
Thanks!