Add content on performance (e.g. benchmarks, mention accelerators) #370

rgommers · 2020-11-26T22:47:37Z

Related to #308 (comment) (connect content to "key features" on front page).

Adding content on benchmarks and accelerators (i.e. Cython, Numba, Pythran, Transonic) was also just suggested in https://mail.python.org/pipermail/numpy-discussion/2020-November/081248.html

melissawm · 2020-11-27T19:37:19Z

Do you have pointers on how those benchmarks should look like? Is there a preferred set of problems to test code in? See, for example https://julialang.org/benchmarks/

rgommers · 2020-11-27T19:58:55Z

The idea of that Julia page is about right I think, short and with only a plot, no code. The main things I'd change from that:

fewer languages (really it's C, Fortran, Julia, R that matter)
keep pure Python and NumPy separate
add accelerators (Cython, Numba, Pythran, Transonic)
change it into a couple of sections, with one plot each (EDIT: I've got some reusable code for generating plots here). There must at least be a difference between vectorize-able problems and ones where NumPy isn't such a good fit

Off the top of my head I'm not sure about an existing widely used set of benchmarks to adopt.

paugier · 2020-11-27T21:35:11Z

Thanks @rgommers for opening this issue!

Few remarks:

I guess it's better to keep things reasonably simple in this page so that people can have a quick overview of what can be done. I wouldn't consider too many problems.
I think it is better to consider full problems of existing benchmark games (for example http://initialconditions.org/ or https://benchmarksgame-team.pages.debian.net/benchmarksgame/, code here) and not only tiny micro-benchmarks (like in https://julialang.org/benchmarks/) to see the code in quasi-real-life situations (meaning not only few functions defined in a Jupyter notebook). It would be interesting to also mention other aspects than elapsed times, for example readability, size of the files, technical difficulties, time of coding, maintainability, etc. Optimizing is always a balance.
One advantage of Python is that it's possible to go steps by steps from very simple implementations (sometimes not very efficient) to more complex (and more efficient) ones. It would be nice to be able to show that. The N-Body problem is a good example.
I don't think it is necessary to compare Transonic and Pythran. By default Transonic uses Pythran so both tools will have the same performance in the end. Transonic just makes Pythran easier to use for real life coding (except in Jupyter notebooks), with a Python API similar to Numba API and using Python type annotations. Transonic can also use Numba and Cython as backends but it's another story and I don't think it is necessary to go into such details for this page.
The N-Body problem can be a good example
- It's famous
- It has recently been used for an article published in Nature Astronomy against using Python
- One function is not vectorize-able
- We already have implementations in C++, Fortran and Julia,
- We already have a very efficient implementation in Python https://github.com/paugier/nbabel and the result is spectacular (see the figure)!
It's also interesting to give at least one example using OpenMP.
This article https://onlinelibrary.wiley.com/iucr/doi/10.1107/S1600576719008471 is very interesting and serious. It should be cited.
It would be good to send two important messages in terms of performance: (i) no premature optimization and (2) measure, don't guess. We can at least mention CProfile.
It would be good to also honestly present some limitations of this strategy of acceleration of Python codes.

rgommers · 2020-11-28T11:56:12Z

It would be interesting to also mention other aspects than elapsed times, for example readability, size of the files, technical difficulties, time of coding, maintainability, etc. Optimizing is always a balance.

That's a good point, yes.

It's also interesting to give at least one example using OpenMP.

I don't think I'd want to get into that, on the same page at least. Because then we'd also have to touch on other forms of parallelism (e.g. Dask, multiprocessing, asyncio).

This article https://onlinelibrary.wiley.com/iucr/doi/10.1107/S1600576719008471 is very interesting and serious. It should be cited.

Thanks, I wasn't aware of this article. It's really well-written.

It would be good to send two important messages in terms of performance: (i) no premature optimization and (2) measure, don't guess. We can at least mention CProfile.

I think the page really should focus on performance, rather than turning into a tutorial. So this can be one line to one paragraph, but it should link elsewhere for things like profiling.

rgommers · 2021-05-25T18:04:37Z

Adding links to the recent Nature correspondence by @paugier et al.:

Article: https://rdcu.be/ciO0J
Benchmarks: https://github.com/paugier/nbabel
Twitter thread: https://twitter.com/pierre_augier/status/1385325261189787650

rgommers added enhancement content labels Nov 26, 2020

rgommers mentioned this issue Dec 28, 2020

Tracking issue for new functionality and design ideas #266

Open

16 tasks

khushi-411 mentioned this issue Jul 31, 2021

Add content on performance benchmarks. #458

Closed

4 tasks

This was referenced Aug 11, 2021

Add content on performance benchmarks. #460

Closed

Add content on performance benchmarks. #461

Closed

This was referenced Sep 29, 2021

Implementation of the N-Body Problem. numpy/numpyorg-benchmarks#1

Merged

N-Body Problem: Added datasets, README for usage and for theory. numpy/numpyorg-benchmarks#2

Merged

Implementation of N-Body Problem. khushi-411/numpy-benchmarks#7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add content on performance (e.g. benchmarks, mention accelerators) #370

Add content on performance (e.g. benchmarks, mention accelerators) #370

rgommers commented Nov 26, 2020

melissawm commented Nov 27, 2020

rgommers commented Nov 27, 2020 •

edited

paugier commented Nov 27, 2020

rgommers commented Nov 28, 2020

rgommers commented May 25, 2021

Add content on performance (e.g. benchmarks, mention accelerators) #370

Add content on performance (e.g. benchmarks, mention accelerators) #370

Comments

rgommers commented Nov 26, 2020

melissawm commented Nov 27, 2020

rgommers commented Nov 27, 2020 • edited

paugier commented Nov 27, 2020

rgommers commented Nov 28, 2020

rgommers commented May 25, 2021

rgommers commented Nov 27, 2020 •

edited