Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about usage of cores and db-benchmark performance #3368

Open
sophia-wright-blue opened this issue Aug 9, 2023 · 3 comments
Open
Labels

Comments

@sophia-wright-blue
Copy link

hello - I have a general question about whether Dataframes.jl uses all of the physical cores available on the machine when executing code (the way polars does - https://www.pola.rs/) - I'd greatly appreciate it if someone could share any resources on tips to improve the performance of Dataframes.jl.

it'd also be super helpful to get some feedback on whether there is any way to improve the performance of Dataframes.jl in the recently updated db-benchmark:

https://duckdblabs.github.io/db-benchmark/

thank you

@bkamins
Copy link
Member

bkamins commented Aug 10, 2023

I have a general question about whether Dataframes.jl uses all of the physical cores available on the machine when executing code

For operations listed in https://dataframes.juliadata.org/stable/lib/functions/#Multithreading-support DataFrames.jl uses as many cores as you start your Julia process with.

it'd also be super helpful to get some feedback on whether there is any way to improve the performance of Dataframes.jl

Yes. However, it currently was not considered as top priority. Having said that:

  1. If someone is willing to work on this I can give information what needs to be done and in what parts of code.
  2. If you have some specific operation that you believe is slow for you we can work on it specifically to improve things - can you please indicate the case where you have a performance problem?

Also note that in the benchmarks you reference DuckDB not Polars is generally the fastest solution and we treat it as a reference benchmark.

@sophia-wright-blue
Copy link
Author

thank you for the detailed response @bkamins - my question was based on a discussion with a colleague about the db-benchmark - I will look into the multi-threading support and get back

I'm not sure about my bandwidth or capability to help with the source code to improve on the db-benchmark, but it's a very popular benchmark that does influence the usage of libraries, so it'd be great to see the Julia performance improve - thank you again!

@bkamins
Copy link
Member

bkamins commented Aug 10, 2023

Help with the code is always welcome. However, as I have commented, even sharing real-life examples that are slow in practice would help.

The point is that this benchmark is run on a large multi-core server, while probably typically people run their code on laptops /smaller servers that have a different performance characteristic (and this is the target we want to optimize for in the first place).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants