Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #818

Open
zamazan4ik opened this issue Nov 26, 2023 · 1 comment

Comments

@zamazan4ik
Copy link

Hi!

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects (including many databases like PostgreSQL, ClickHouse, Redis, MongoDB, etc.) - the results are available here, database-related results could be checked here. That's why I think it's worth trying to apply PGO to HeavyDB to improve the database performance.

I can suggest the following things to do:

  • Evaluate PGO's results on HeavyDB.
  • If PGO helps to achieve better performance - add a note to HeavyDB's documentation about that. In this case, users and maintainers will be aware of another optimization opportunity for HeavyDB.
  • Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their workloads.
  • Optimize prebuilt HeavyDB binaries with PGO.

Here are some examples of how PGO is already integrated into other projects' build scripts:

Here are some examples how PGO-related documentation could look like in the project:

After PGO, I can suggest evaluating PLO with LLVM BOLT as an additional optimization step after PGO.

Below are listed some BOLT results:

I am not familiar with HeavyDB (yet) but I guess at first we can try to train PGO on the HeavyDB benchmarks and then compare before and after PGO performance with HeavyDB.

@cdessanti
Copy link
Contributor

Hi @zamazan4ik,

I'm sorry for the late response. Each time I tried to reply to this message, I've been distracted by something else.

I have carefully reviewed your work and, in particular, the project you completed using ClickHouse. I found it to be very interesting. I just wanted to let you know that our database employs the GPU as an accelerator for operations that require high bandwidth and parallelism. Specifically, it handles data aggregations, filters and joins, and generates LLVM code to optimize these operations. This can also be done for CPU execution. As a result, the CPU is mainly responsible for coordination and memory management.

I haven't had the chance to use PGO on the project yet due to time constraints. However, if you're interested in running benchmarks with HeavyDB, I'd be happy to guide you through the process of setting up a development environment, including DDLs and data for standard or internal benchmarking purposes.

Let me know how I can assist you. Meanwhile, have a nice weekend.
Candido

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants