Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Koalas vs Pandas #2210

Open
psaraogi24 opened this issue Dec 3, 2021 · 3 comments
Open

Koalas vs Pandas #2210

psaraogi24 opened this issue Dec 3, 2021 · 3 comments
Labels
question Further information is requested

Comments

@psaraogi24
Copy link

Hi, I recently started switching from Pandas to Koalas dataframe.
But while calculating the execution time, I figured that Koalas is taking almost 6X time compared to Pandas.

I think I am missing something here. Can I get some help?

@psaraogi24
Copy link
Author

Can I also please get some sample functions where Koalas would perform better than Pandas?

@stepanlavrinenkoteck001
Copy link

Are you doing any type of sorting/ranking? Some of these operations can take longer, because they will be done on multiple partitions. Also, complex execution plan is another case of a slowdown. Check this best practise page out for some examples:
https://koalas.readthedocs.io/en/latest/user_guide/best_practices.html

@itholic itholic added the question Further information is requested label Dec 9, 2021
@itholic
Copy link
Contributor

itholic commented Dec 9, 2021

Thanks for trying the Koalas :-)
It's hard to simply say Koalas is faster or slower than pandas in specific function.
The performance depends on many factors such as amount of data, number of clusters, or how are you using functions in context as @stepanlavrinenkoteck001 mentioned.
For example, performance differences may occur depending on the amount of data even with the same function.
In general, pandas is faster than Koalas when the size of data is small enough to fit on a single core.
If you want to more detailed answer, could you give an example you are using where the Koalas is 6x slower?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants