Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks against Symbolics.jl #1973

Open
certik opened this issue Sep 11, 2023 · 21 comments
Open

Benchmarks against Symbolics.jl #1973

certik opened this issue Sep 11, 2023 · 21 comments

Comments

@certik
Copy link
Contributor

certik commented Sep 11, 2023

I want to compare the speed of SymEngine and Symbolics.jl. I don't know if I am using Symbolics.jl correctly, so I'll document what I do here. I am using Julia 1.8 on Apple M1 Max.

julia> using BenchmarkTools, Symbolics

julia> vars = @variables a,b,c,d,e,f,g,h,i

julia> x = ((a+b+c+1)^20)
(1 + a + b + c)^20

julia> @benchmark y = expand(x)
BenchmarkTools.Trial: 21 samples with 1 evaluation.
 Range (min … max):  238.961 ms … 244.956 ms  ┊ GC (min … max): 2.01% … 3.99%
 Time  (median):     240.229 ms               ┊ GC (median):    2.24%
 Time  (mean ± σ):   241.298 ms ±   2.024 ms  ┊ GC (mean ± σ):  2.72% ± 0.84%

  ▃       █▃▃                                                    
  █▁▁▇▁▁▁▁███▁▇▁▁▁▁▁▁▇▁▇▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▇▁▁▁▁▇▇▁▁▁▁▁▁▇▇ ▁
  239 ms           Histogram: frequency by time          245 ms <

 Memory estimate: 113.84 MiB, allocs estimate: 447939.

I then compiled SymEngine (default cmake), and applied the following diff:

diff --git a/benchmarks/expand1.cpp b/benchmarks/expand1.cpp
index 4471152e..0c6a897f 100644
--- a/benchmarks/expand1.cpp
+++ b/benchmarks/expand1.cpp
@@ -30,11 +30,12 @@ int main(int argc, char *argv[])
     RCP<const Basic> y = symbol("y");
     RCP<const Basic> z = symbol("z");
     RCP<const Basic> w = symbol("w");
-    RCP<const Basic> i60 = integer(60);
+    RCP<const Basic> i1 = integer(1);
+    RCP<const Basic> i60 = integer(20);
 
     RCP<const Basic> e, r;
 
-    e = pow(add(add(add(x, y), z), w), i60);
+    e = pow(add(add(add(x, y), z), i1), i60);
 
     std::cout << "Expanding: " << *e << std::endl;
 

And run it:

$ ./expand1 
Expanding: (1 + x + y + z)**20
3ms
number of terms: 1770

So I am getting 3ms. I don't know if I am running both benchmarks correctly. It seems SymEngine is 80x faster, which seems that I probably do something wrong.

I asked the Julia community for help here: https://discourse.julialang.org/t/benchmarking-symbolics-jl-against-symengine-jl/103744.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

I also tried:

$ ./expand1
Expanding: (1 + x + y + z)**100
204ms
number of terms: 176850

But had to kill the Julia version after about a minute.

@rikardn
Copy link
Contributor

rikardn commented Sep 11, 2023

This is interesting! Are you thinking of comparing to SymEngine.jl or plain C++ symengine? Although outside of your scope here it would also be interesting to benchmark the various symengine wrappers to see how much overhead the various language wrappers add.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

Good idea, I was comparing against C++ because I know how to compile it correctly. I am using gmp, but I think Flint would be even faster. I just used the SymEngine.jl and got:

julia> using BenchmarkTools, SymEngine

julia> vars = @vars a,b,c,d,e,f,g,h,i
(a, b, c, d, e, f, g, h, i)

julia> x = ((a+b+c+1)^20)
(1 + a + b + c)^20

julia> @benchmark y = expand(x)
BenchmarkTools.Trial: 2595 samples with 1 evaluation.
 Range (min … max):  1.477 ms … 97.622 ms  ┊ GC (min … max): 0.00% … 0.11%
 Time  (median):     1.561 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.926 ms ±  5.876 ms  ┊ GC (mean ± σ):  0.02% ± 0.01%

             ▂ ▂▃▂▄▆▃▅▄▃▇▅▄▄▆▅▅▆▇▄▃▅▆▄▇█▅▅▄▅▃▇▃▅▄▁▁           
  ▁▁▃▃▅▆▄▆▅▆▇██████████████████████████████████████▇█▆▅▅▄▃▂▂ ▆
  1.48 ms        Histogram: frequency by time        1.64 ms <

 Memory estimate: 186.95 KiB, allocs estimate: 17407.

So I am getting 1.5ms compared to C++'s 3ms, but it's the same order of magnitude, so probably it's correct. I then tried:

julia> x = ((a+b+c+1)^100)
(1 + a + b + c)^100

julia> @benchmark y = expand(x)
BenchmarkTools.Trial: 20 samples with 1 evaluation.
 Range (min … max):  208.862 ms … 347.852 ms  ┊ GC (min … max): 0.00% … 9.87%
 Time  (median):     262.638 ms               ┊ GC (median):    0.05%
 Time  (mean ± σ):   257.937 ms ±  39.495 ms  ┊ GC (mean ± σ):  1.37% ± 3.09%

  █                      ▁▁                                      
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇██▄▄▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  209 ms           Histogram: frequency by time          348 ms <

 Memory estimate: 34.97 MiB, allocs estimate: 1899009.

Which is consistent with C++ as well (the fastest is 209ms, compared to 204ms in C++).

So probably my numbers above are correct.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

Here is another benchmark, which just benchmarks multiplying out two large expressions.

SymEngine:

julia> using BenchmarkTools, SymEngine

julia> vars = @vars a,b,c,d,e,f,g,h,i
(a, b, c, d, e, f, g, h, i)

julia> x = expand(((a+b+c+1)^20));

julia> y = expand(((a+b+c+1)^15));

julia> @benchmark z = expand(x*y)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  2.294 s … 5.234 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.764 s            ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.764 s ± 2.079 s  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                     █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.29 s        Histogram: frequency by time       5.23 s <

 Memory estimate: 91.75 MiB, allocs estimate: 6013109.

vs Julia:

julia> using BenchmarkTools, Symbolics

julia> vars = @variables a,b,c,d,e,f,g,h,i
9-element Vector{Num}:
 a
 b
 c
 d
  e
  f
 g
 h
 i

julia> x = expand(((a+b+c+1)^20));

julia> y = expand(((a+b+c+1)^15));

julia> @benchmark z = expand(x*y)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 7.566 s (1.70% GC) to evaluate,
 with a memory estimate of 2.02 GiB, over 4750560 allocations.

So Julia is now only "7.566 / 2.294 = 3.3" times slower, which is more reasonable.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

There is also an issue with running global scope things in Julia, that there might be some unnecessary boxing/unboxing. So I tried this for Julia:

julia> using BenchmarkTools, Symbolics

julia> function f()
         vars = @variables a,b,c,d,e,f,g,h,i
         x = expand(((a+b+c+1)^20));
         y = expand(((a+b+c+1)^15));
         z = expand(x*y)
       end
f (generic function with 1 method)

julia> @benchmark f()
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 7.945 s (1.45% GC) to evaluate,
 with a memory estimate of 2.20 GiB, over 6091435 allocations.

And SymEngine:

julia> using BenchmarkTools, SymEngine

julia> function f()
                vars = @vars a,b,c,d,e,f,g,h,i
                x = expand(((a+b+c+1)^20));
                y = expand(((a+b+c+1)^15));
                z = expand(x*y)
              end
f (generic function with 1 method)

julia> @benchmark f()
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  2.902 s …   3.016 s  ┊ GC (min … max): 0.01% … 0.00%
 Time  (median):     2.959 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.959 s ± 80.544 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                       █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.9 s          Histogram: frequency by time        3.02 s <

 Memory estimate: 92.02 MiB, allocs estimate: 6038323.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

I also tried Julia 1.9.3:

@benchmark f()
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 8.149 s (2.02% GC) to evaluate,
 with a memory estimate of 2.24 GiB, over 6822738 allocations.

@YingboMa
Copy link

Does SymEngine use a polynomial data structure to expand and convert it back to an unstructured expression tree or does it exclusively work on the unstructured expression tree?

@YingboMa
Copy link

Here is Nemo's performance:

julia> using Nemo, BenchmarkTools

julia> R, (a, b, c) = PolynomialRing(ZZ, [:a, :b, :c]);

julia> p = a+b+c+1;

julia> @btime $p^20;
  258.569 μs (25 allocations: 61.62 KiB)

@isuruf
Copy link
Member

isuruf commented Sep 11, 2023

Does SymEngine use a polynomial data structure to expand and convert it back to an unstructured expression tree or does it exclusively work on the unstructured expression tree?

It looks at the expression tree and expands the power of addition expressions faster.

For eg: instead of doing (a + b)^2 = a^2 + a*b + b*a + b^2 it does (a + b)^2 = a^2 + 2*a*b + b^2 and similar for higher powers.

@isuruf
Copy link
Member

isuruf commented Sep 11, 2023

However, it doesn't do the same optimizations that Flint/Nemo does.

For eg SymEngine does

(a^2 + a + 1)^2 = a^4 + a^2 + 1 + 2a^3 + 2a^2 + 2a

and simplifies the expression to

(a^2 + a + 1)^2 = a^4 + 3 a^2 + 1 + 2a^3 + 2a

@YingboMa
Copy link

Does it use the multinomial theorem to expand power of sums?

@isuruf
Copy link
Member

isuruf commented Sep 11, 2023

Yes

@YingboMa
Copy link

Ah, so that explains it. We don't have that specialization at all. For some polynomial operations, we do a round trip to multivariate polynomial representations and back. @shashi is more familiar with this process.

@isuruf
Copy link
Member

isuruf commented Sep 11, 2023

For some polynomial operations, we do a round trip to multivariate polynomial representations and back.

SymEngine does not have that at all.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

@YingboMa I recommend to focus on this benchmark: #1973 (comment) which makes it irrelevant how the power expansion is done. It benchmarks how quickly you can expand two arbitrary, long, expressions.

@isuruf
Copy link
Member

isuruf commented Sep 11, 2023

@certik, that benchmark has two expands with a power. So, not exactly comparable.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

@isuruf I don't follow. The benchmark does z = expand(x*y), where both "x" and "y" is already expanded (no matter how). And it benchmarks multiplying those two and expanding it. It does not expand with a power.

@isuruf
Copy link
Member

isuruf commented Sep 11, 2023

@certik, the benchmark time includes computing x and y which are powers.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

@isuruf I don't think it does. The following lines:

julia> x = expand(((a+b+c+1)^20));

julia> y = expand(((a+b+c+1)^15));

julia> @benchmark z = expand(x*y)

First expand the powers, so x and y contain expanded powers. I checked it by printing "x" and "y" and they are indeed expanded.

@isuruf
Copy link
Member

isuruf commented Sep 11, 2023

Ah, you are running the benchmark like that. Thanks.

@certik
Copy link
Contributor Author

certik commented Sep 11, 2023

Here is a differentiation benchmark. Symbolics.jl:

julia> using BenchmarkTools, Symbolics

julia> vars = @variables a,b,c,d,e,f,g,h,i,x

julia> function ff()
       expr = sin(sin(sin(sin(x))));
       for i = 1:10
           expr = Symbolics.derivative(expr, x)
       end
       end
ff (generic function with 1 method)

julia> @benchmark ff()
BenchmarkTools.Trial: 17 samples with 1 evaluation.
 Range (min … max):  289.565 ms … 299.344 ms  ┊ GC (min … max): 6.15% … 9.44%
 Time  (median):     295.178 ms               ┊ GC (median):    8.53%
 Time  (mean ± σ):   294.804 ms ±   2.812 ms  ┊ GC (mean ± σ):  8.28% ± 0.82%

  ▁      ▁      ▁ ▁  ▁  ▁  ▁       ▁ █ ▁      ▁ ▁▁   ▁     ▁  ▁  
  █▁▁▁▁▁▁█▁▁▁▁▁▁█▁█▁▁█▁▁█▁▁█▁▁▁▁▁▁▁█▁█▁█▁▁▁▁▁▁█▁██▁▁▁█▁▁▁▁▁█▁▁█ ▁
  290 ms           Histogram: frequency by time          299 ms <

 Memory estimate: 278.14 MiB, allocs estimate: 3843662.

SymEngine.jl

julia> using BenchmarkTools, SymEngine

julia> vars = @vars a,b,c,d,e,f,g,h,i,x

julia> expr = sin(sin(sin(sin(x))));

julia> @benchmark diff(expr, x, 10)
BenchmarkTools.Trial: 557 samples with 1 evaluation.
 Range (min … max):  6.661 ms … 62.706 ms  ┊ GC (min … max): 0.00% … 0.33%
 Time  (median):     8.474 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.982 ms ±  5.555 ms  ┊ GC (mean ± σ):  0.02% ± 0.03%

  ▁       ▄▃ ▁▃▁▁▄ ▃▁▂ ▁▁▁▂▂▁ █▄ ▃ ▃▅▃▃▅▁                     
  █▄▃▃▄▃▃▇███████████████████▇███████████▇▆▃▃▃▃▃▃▃▁▁▃▁▃▁▁▁▁▃ ▅
  6.66 ms        Histogram: frequency by time          11 ms <

 Memory estimate: 511.17 KiB, allocs estimate: 33272.

julia> function ff()
              expr = sin(sin(sin(sin(x))));
              for i = 1:10
                  expr = diff(expr, x)
              end
              end
ff (generic function with 1 method)

julia> @benchmark ff()
BenchmarkTools.Trial: 567 samples with 1 evaluation.
 Range (min … max):  6.522 ms … 62.355 ms  ┊ GC (min … max): 0.00% … 0.22%
 Time  (median):     8.326 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.820 ms ±  5.487 ms  ┊ GC (mean ± σ):  0.02% ± 0.02%

            ▁▆▅▄▃▃▁ ▂▁▄▁ ▃▆ ▁▄▃ ▆▁▁▇█▁▇  ▁▇ ▁ ▅▅▅▆            
  ▇▆▄▃▃▄▃▃▃▅███████▇████▇██████▇███████▇█████▇████▅▇▅▃▁▃▅▃▃▃ ▅
  6.52 ms        Histogram: frequency by time          10 ms <

 Memory estimate: 511.55 KiB, allocs estimate: 33292.

On this particular benchmark, SymEngine seems 35x faster. I am probably doing something wrong. If anyone can figure it out, that would be great. Once we have representative benchmarks that the Julia community agrees that they are fair, we'll add it to our benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants