Question: Why faster until 50k to 500k rows? #2

bscully27 · 2020-03-25T13:08:48Z

For most pandas functions, I expected numpy to outperform regardless of data size.
I'm curious about the technical details behind this observation. Any information would be appreciated.

Thanks!

yash-clear · 2020-12-28T20:14:32Z

@bscully27 > For most pandas functions, I expected numpy to outperform regardless of data size.

I'm curious about the technical details behind this observation. Any information would be appreciated.

Thanks!

Numpy is the fastest because it is C-compiled and stores data of same datatype (homogeneous arrays) and you get the benefits of principle of locality i.e., tendency of a processor to access the same set of memory locations repetitively over a short period of time. On the other hand pandas are flexible to store data of many datatypes which in turn decrease its performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Why faster until 50k to 500k rows? #2

Question: Why faster until 50k to 500k rows? #2

bscully27 commented Mar 25, 2020

yash-clear commented Dec 28, 2020

Question: Why faster until 50k to 500k rows? #2

Question: Why faster until 50k to 500k rows? #2

Comments

bscully27 commented Mar 25, 2020

yash-clear commented Dec 28, 2020