Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboard to bencher, to start tracking benchmarks over time #1092

Open
joshka opened this issue May 11, 2024 · 4 comments
Open

Onboard to bencher, to start tracking benchmarks over time #1092

joshka opened this issue May 11, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@joshka
Copy link
Member

joshka commented May 11, 2024

Problem

Rather than having to run benchmarks ad hoc, It would be nice to see results of our benchmarks over time (particularly in CI) so we can more easily catch any regressions. https://bencher.dev/ seems to be a reasonable product for this with a free hosted tier or open source projects.

Solution

Onboard to bencher.dev

Alternatives

Do our own thing and carry the maintenance burden of that.

Additional context

Anticipating the obvious argument against this that CI is noisy, is covered in their docs: even a noisy signal easily shows when performance has drastically changed (see https://bencher.dev/docs/explanation/continuous-benchmarking/ for more info)

1000 words:
image

@joshka joshka added the enhancement New feature or request label May 11, 2024
@EdJoPaTo
Copy link
Member

Keep in mind that benchmarks are likely biased to the specific problem added in them and might not reflect user usage. It might miss something user code uses that might give a false sense on the benchmark.
This should not stop us from refactoring benchmarks. Having useful benchmarks is way more important than having historic benchmark comparison graphs for longer.

Also, the benchmarks are highly dependent on their environment like target triple. x64 has other benefits compared to different ARM platforms. Apple silicon for example has some instructions explicitly for having easier to run code written for x64 which is why Rosetta works explicitly well on Apple ARM. The Windows ARM comparability layer does not have this which is why it’s way slower. While it’s both ARM it’s hard to compare them because of stuff like that.

As with all benchmarks absolute numbers are only useful on the given target and can not be easily compared. Changes on the other hand can be compared. (Examples: M2 Performance Core with M2 Performance Core is comparable with absolute Numbers. M2 Perf to Efficiency core is not. M2 to random benchmark platform target is not. M2 to Raspberry Pi is not. The change percentages however are comparable at least for the target triple & glibc version.)

So this will only ever give a rough idea and changing a benchmark code will result in a new graph.
This should only ever be a hint at „there might be something of“.

@joshka
Copy link
Member Author

joshka commented May 11, 2024

Changes on the other hand can be compared

Yes. This is the entire point of bencher.

@orhun
Copy link
Sponsor Member

orhun commented May 16, 2024

Looks cool! I'm curious about what do we need to move forward with it? Should we reach out to them or can we just integrate it somehow?

@joshka
Copy link
Member Author

joshka commented May 16, 2024

Looks cool! I'm curious about what do we need to move forward with it? Should we reach out to them or can we just integrate it somehow?

Read the docs, sign up, do the tasks required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants