Question about usage of cores and db-benchmark performance #3368

sophia-wright-blue · 2023-08-09T23:12:14Z

hello - I have a general question about whether Dataframes.jl uses all of the physical cores available on the machine when executing code (the way polars does - https://www.pola.rs/) - I'd greatly appreciate it if someone could share any resources on tips to improve the performance of Dataframes.jl.

it'd also be super helpful to get some feedback on whether there is any way to improve the performance of Dataframes.jl in the recently updated db-benchmark:

https://duckdblabs.github.io/db-benchmark/

thank you

bkamins · 2023-08-10T01:47:19Z

I have a general question about whether Dataframes.jl uses all of the physical cores available on the machine when executing code

For operations listed in https://dataframes.juliadata.org/stable/lib/functions/#Multithreading-support DataFrames.jl uses as many cores as you start your Julia process with.

it'd also be super helpful to get some feedback on whether there is any way to improve the performance of Dataframes.jl

Yes. However, it currently was not considered as top priority. Having said that:

If someone is willing to work on this I can give information what needs to be done and in what parts of code.
If you have some specific operation that you believe is slow for you we can work on it specifically to improve things - can you please indicate the case where you have a performance problem?

Also note that in the benchmarks you reference DuckDB not Polars is generally the fastest solution and we treat it as a reference benchmark.

sophia-wright-blue · 2023-08-10T15:21:01Z

thank you for the detailed response @bkamins - my question was based on a discussion with a colleague about the db-benchmark - I will look into the multi-threading support and get back

I'm not sure about my bandwidth or capability to help with the source code to improve on the db-benchmark, but it's a very popular benchmark that does influence the usage of libraries, so it'd be great to see the Julia performance improve - thank you again!

bkamins · 2023-08-10T19:10:36Z

Help with the code is always welcome. However, as I have commented, even sharing real-life examples that are slow in practice would help.

The point is that this benchmark is run on a large multi-core server, while probably typically people run their code on laptops /smaller servers that have a different performance characteristic (and this is the target we want to optimize for in the first place).

bkamins added the question label Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about usage of cores and db-benchmark performance #3368

Question about usage of cores and db-benchmark performance #3368

sophia-wright-blue commented Aug 9, 2023

bkamins commented Aug 10, 2023

sophia-wright-blue commented Aug 10, 2023

bkamins commented Aug 10, 2023

Question about usage of cores and db-benchmark performance #3368

Question about usage of cores and db-benchmark performance #3368

Comments

sophia-wright-blue commented Aug 9, 2023

bkamins commented Aug 10, 2023

sophia-wright-blue commented Aug 10, 2023

bkamins commented Aug 10, 2023