GSoC 2019 Application SHIKSHA RAWAT : Benchmarks and performance
Sympy, being a Computer Algebra System offers great functionality and extensive support for symbolic computing. But speed is also an important feature. As of now it is also difficult to figure out the bottlenecks for sympy.
My work in Sympy during summer would be to figure out these bottlenecks, creating benchmarks for them and then improving overall performance of sympy.
NAME: Shiksha Rawat
Email ID : shiksharawat01@gmail.com
Phone No. : +91 9893715599
University: Birla Institute Of Technology,Mesra,Ranchi,India.
I am a second-year undergraduate student of Computer Science and Engineering at Birla Institute Of Technology, Mesra, Ranchi, India.
Being a student of computer science department, my primary interest has always been in Mathematics and Computer Science. I have 2 years of programming experience in Python. I have also developed some projects using python.
I am reasonably comfortable with Git and Github. I have been using these over the past months for contributing to Sympy and otherwise making me quite comfortable with them.
GitHub Profile : https://github.com/shiksha11
Because of my keen interest in benchmarking and profiling codes, I went through the issues with label "Performance". I found https://github.com/sympy/sympy/issues/16249 issue intriguing. Below, I have described how I found the bottleneck for slow performance of Min/Max and further how I tried to substitute it with a code having better complexity.
Going through the comments and part of the codebase involved, I deduced that performance is being affected by the high complexity(O(n**2)) of the function(_find_localzeros) which was being called during the computation of the output.I observed that the high complexity is because the function is comparing every pair of arguments
Input : I tried to find the Maximum value among 50 numbers.
The results I got after profiling are:
3553 function calls in 0.006 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
642 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
1 0.000 0.000 0.006 0.006 <ipython-input-58-f01740fe9027>:1(new1)
50 0.000 0.000 0.003 0.000 <ipython-input-58-f01740fe9027>:2(<genexpr>)
642 0.001 0.000 0.002 0.000 <ipython-input-58-f01740fe9027>:33(_is_connected)
1 0.001 0.001 0.006 0.006 <ipython-input-58-f01740fe9027>:6(_find_localzeros)
1 0.000 0.000 0.006 0.006 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 basic.py:121(__hash__)
1 0.000 0.000 0.000 0.000 basic.py:96(__new__)
1 0.000 0.000 0.003 0.003 libmpf.py:410(from_float)
1 0.000 0.000 0.000 0.000 libmpf.py:64(dps_to_prec)
1 0.000 0.000 0.000 0.000 numbers.py:1109(_hashable_content)
1 0.000 0.000 0.000 0.000 numbers.py:1367(__hash__)
48 0.000 0.000 0.000 0.000 numbers.py:2008(__new__)
25 0.000 0.000 0.000 0.000 numbers.py:2205(__hash__)
1 0.000 0.000 0.000 0.000 numbers.py:739(__hash__)
1 0.000 0.000 0.003 0.003 numbers.py:954(__new__)
49 0.000 0.000 0.003 0.000 sympify.py:76(sympify)
1 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x10993d6a8}
1 0.000 0.000 0.006 0.006 {built-in method builtins.exec}
642 0.001 0.000 0.001 0.000 {built-in method builtins.hasattr}
26 0.000 0.000 0.000 0.000 {built-in method builtins.hash}
1330 0.000 0.000 0.000 0.000 {built-in method builtins.id}
55 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
1 0.000 0.000 0.000 0.000 {built-in method builtins.max}
1 0.000 0.000 0.000 0.000 {built-in method builtins.round}
1 0.003 0.003 0.003 0.003 {built-in method gmpy2._mpmath_create}
1 0.000 0.000 0.000 0.000 {built-in method math.frexp}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
26 0.000 0.000 0.000 0.000 {method 'update' of 'set' objects}
Analyzing the results I found that _is_connected is being called a large number of times(for comparing)and this number increases manifold when the number of values passed as argument will increase.
I divided the _find_localzeros function into subparts and deduced that the number of comparisons can be reduced if the arguments can be sorted. But this approach can be used only for numerical values and not symbolic. As the complexity of the built-in function of python for sorting is O(n*logn). I tried to use that for performance improvement. This approach helped me to get rid of _is_connected function for numerical values.
I did profiling using the same input as in the above case and got the following result.
2680 function calls in 0.005 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
20 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
50 0.000 0.000 0.000 0.000 <ipython-input-59-9afdc0d4cbe0>:16(<genexpr>)
1 0.000 0.000 0.005 0.005 <ipython-input-59-9afdc0d4cbe0>:3(new1)
50 0.000 0.000 0.000 0.000 <ipython-input-59-9afdc0d4cbe0>:4(<genexpr>)
1 0.000 0.000 0.005 0.005 <ipython-input-59-9afdc0d4cbe0>:7(_find_localzeros)
1 0.000 0.000 0.005 0.005 <string>:1(<module>)
17 0.000 0.000 0.000 0.000 basic.py:121(__hash__)
4 0.000 0.000 0.002 0.000 basic.py:583(is_comparable)
4 0.000 0.000 0.000 0.000 basic.py:617(<listcomp>)
4 0.000 0.000 0.000 0.000 basic.py:630(func)
5 0.000 0.000 0.000 0.000 basic.py:96(__new__)
94 0.000 0.000 0.000 0.000 boolalg.py:313(__nonzero__)
114 0.000 0.000 0.000 0.000 boolalg.py:376(__nonzero__)
208 0.000 0.000 0.000 0.000 boolalg.py:406(<lambda>)
4 0.000 0.000 0.000 0.000 evalf.py:1265(<lambda>)
4 0.000 0.000 0.000 0.000 evalf.py:1303(evalf)
4 0.000 0.000 0.000 0.000 evalf.py:1363(evalf)
4 0.001 0.000 0.002 0.000 expr.py:1743(as_real_imag)
12 0.000 0.000 0.000 0.000 libmpf.py:330(from_int)
1 0.000 0.000 0.000 0.000 libmpf.py:410(from_float)
12 0.000 0.000 0.001 0.000 libmpf.py:549(mpf_cmp)
12 0.000 0.000 0.001 0.000 libmpf.py:601(mpf_lt)
5 0.000 0.000 0.000 0.000 libmpf.py:64(dps_to_prec)
7 0.001 0.000 0.001 0.000 libmpf.py:677(mpf_add)
7 0.000 0.000 0.001 0.000 libmpf.py:772(mpf_sub)
4 0.000 0.000 0.000 0.000 numbers.py:1088(_new)
1 0.000 0.000 0.000 0.000 numbers.py:1109(_hashable_content)
4 0.000 0.000 0.000 0.000 numbers.py:1124(_as_mpf_val)
4 0.000 0.000 0.003 0.001 numbers.py:1333(__lt__)
17 0.000 0.000 0.000 0.000 numbers.py:1367(__hash__)
4 0.000 0.000 0.000 0.000 numbers.py:1745(__eq__)
8 0.000 0.000 0.000 0.000 numbers.py:1802(__lt__)
8 0.000 0.000 0.000 0.000 numbers.py:2001(_as_mpf_val)
48 0.000 0.000 0.000 0.000 numbers.py:2008(__new__)
4 0.000 0.000 0.000 0.000 numbers.py:2159(__eq__)
3 0.000 0.000 0.000 0.000 numbers.py:2169(__gt__)
201 0.000 0.000 0.001 0.000 numbers.py:2178(__lt__)
257 0.001 0.000 0.001 0.000 numbers.py:2205(__hash__)
4 0.000 0.000 0.000 0.000 numbers.py:2546(__nonzero__)
17 0.000 0.000 0.000 0.000 numbers.py:739(__hash__)
8 0.000 0.000 0.000 0.000 numbers.py:80(mpf_norm)
1 0.000 0.000 0.000 0.000 numbers.py:954(__new__)
4 0.000 0.000 0.000 0.000 sympify.py:13(__init__)
428 0.000 0.000 0.001 0.000 sympify.py:375(_sympify)
526 0.001 0.000 0.001 0.000 sympify.py:76(sympify)
5 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x10993d6a8}
1 0.000 0.000 0.000 0.000 {built-in method builtins.all}
1 0.000 0.000 0.005 0.005 {built-in method builtins.exec}
44 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
258 0.000 0.000 0.000 0.000 {built-in method builtins.hash}
87 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
3 0.000 0.000 0.000 0.000 {built-in method builtins.len}
13 0.000 0.000 0.000 0.000 {built-in method builtins.max}
4 0.000 0.000 0.000 0.000 {built-in method builtins.min}
5 0.000 0.000 0.000 0.000 {built-in method builtins.round}
1 0.000 0.000 0.004 0.004 {built-in method builtins.sorted}
13 0.000 0.000 0.000 0.000 {built-in method gmpy2._mpmath_create}
15 0.000 0.000 0.000 0.000 {built-in method gmpy2._mpmath_normalize}
7 0.000 0.000 0.000 0.000 {built-in method gmpy2.bit_length}
8 0.000 0.000 0.000 0.000 {built-in method gmpy2.mpz}
1 0.000 0.000 0.000 0.000 {built-in method math.frexp}
1 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
16 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
It could be deduced from the results of profiling that not only the function calls have decreased by a large number in the substituted code but also the time has decreased from 0.006 to 0.005 seconds and this has been done for 50 values only. This difference will be even higher when the number of values in the argument will increase.
###BENCHMARKING I benchmarked the current and the optimized code for input of 50 numbers. The results are:
Here in the graph x-axis: Array size
y- axis: time(seconds)
https://drive.google.com/file/d/195SsV2QK1-T7pcR7woLHbywAphkF2dpZ/view?usp=sharing
More or less, the project can be broken into 3 stages:
STAGE 1
Figuring out the main bottlenecks for sympy : The best way to figure out these bottlenecks would be to first profile the current part of codebase involved in a particular issue and analyzing it.
STAGE 2
Designing A Substitute : A better approach with optimization has to be searched and its validity can be tested by profiling and comparing the output, with outputs of profiling of current code, as I did in the above case study.
STAGE 3
Improving performance : If appropriate substitute can be found then making the required changes and running the Travis CI build.
I think the most important part of this project would be correctly identifying the patches or files in the codebase which are making computation slower and then adding benchmarks for them. Since this will help in identifying the code which has to be substituted to improve performance.
I would like to break the process into smaller parts which will help me provide a better insight to solve the problems. Going through the issues with tags "Performance" in the sympy repository I found the following issues intriguing and I would like to benchmark and profile them and then find a substitute to improve performance.(following the steps of the above case study)
- solve is very slow for linear equation system
- Sum of many variable is slow
- Printing expressions in exceptions can be expensive
- Matrix multiplication is slow
My semester will end on 1st May; hence I am not including the time before 1st May 2019 in the timeline. Listed below is the tentative timeline I would follow:
Pre Selection Period + Community Bonding Period
May 2 - May 27
The goal in this period would be:
- Through analysis of the present benchmarking repository of sympy( https://github.com/sympy/sympy_benchmarks) and digging out the regressions which could be further investigated.I will add the issue which can be investigated in the list which I have prepared above.
Since the work I will do will always depend upon the results of profiling and benchmarking and the bottlenecks which I will find. So, the timeline I am writing below would be a bit uncertain and can change according to the results.
MAY 27 - JUNE 10(Week 1,2)
Profiling and benchmarking half the issues which I have in the list(of issues) above. Analyzing the results of the profiling to identify the bottleneck.(All the steps which I am writing here would be implemented the way I have described in my case study above)
June 20 - June 24 (Week 3,4) Add new benchmarks as needed on the basis of results obtained in the previous week in the sympy repository. And start substituting the bottlenecks with better algorithms (if possible).
Phase 1 Evaluation
June 24 - July 8 (Week 5, 6) Profiling and benchmarking the other half the issues which I have in the list(of issues) above. Analyzing the results of the profiling identifying the bottleneck.
July 8 - July 24 (Week 7,8) Add new benchmarks as needed on the basis of results obtained in the previous week in the sympy repository. Substitute the bottlenecks with better algorithms (if possible).
Phase 2 Evaluation
July 22 - August 19 (Week 9,10,11,12) I will try to profile and add required benchmarks for maximum possible regressions and issues in the previous weeks so in this period my main aim would be substituting the bottlenecks.
August 19 - August 26 (Week 13) Completion of any backlog before marking the completion of the project.
Final Submission
I have no prior plans or commitments during this summer. My end-semester exams will end on 1st May and the new semester starts around 15th July. (But I can manage academics and gsoc work in parallel, and will compensate in advance for this in the community bonding period) Hence I can give about 50 hours per week throughout the whole program.
The following are the lists of merged/open pull requests I have created(listed in chronological order)
#15744 Mod Function : Made changes to give simplified output when dividend is a multiple of divisor.(Closed)
#15842 Printing: Added root_noatation flag in latex.py and pretty.py. And updated docstrings and tests files according to the changes.(Merged)
#15901 Printing : Added root_noatation flag in mathml.py.(Merged)
#16509 Perfromance : Made changes to improve computation time when the number of arguments in Min/Max is large.(Open)
REFERENCES
-
https://stackoverflow.com/questions/5232052/what-are-the-different-code-benchmarking-techniques
-
https://www.quora.com/What-is-the-time-complexity-of-the-Python-built-in-sorted-function