Onboard to bencher, to start tracking benchmarks over time #1092

joshka · 2024-05-11T01:56:57Z

Problem

Rather than having to run benchmarks ad hoc, It would be nice to see results of our benchmarks over time (particularly in CI) so we can more easily catch any regressions. https://bencher.dev/ seems to be a reasonable product for this with a free hosted tier or open source projects.

Solution

Onboard to bencher.dev

Alternatives

Do our own thing and carry the maintenance burden of that.

Additional context

Anticipating the obvious argument against this that CI is noisy, is covered in their docs: even a noisy signal easily shows when performance has drastically changed (see https://bencher.dev/docs/explanation/continuous-benchmarking/ for more info)

1000 words:

EdJoPaTo · 2024-05-11T08:55:25Z

Keep in mind that benchmarks are likely biased to the specific problem added in them and might not reflect user usage. It might miss something user code uses that might give a false sense on the benchmark.
This should not stop us from refactoring benchmarks. Having useful benchmarks is way more important than having historic benchmark comparison graphs for longer.

Also, the benchmarks are highly dependent on their environment like target triple. x64 has other benefits compared to different ARM platforms. Apple silicon for example has some instructions explicitly for having easier to run code written for x64 which is why Rosetta works explicitly well on Apple ARM. The Windows ARM comparability layer does not have this which is why it’s way slower. While it’s both ARM it’s hard to compare them because of stuff like that.

As with all benchmarks absolute numbers are only useful on the given target and can not be easily compared. Changes on the other hand can be compared. (Examples: M2 Performance Core with M2 Performance Core is comparable with absolute Numbers. M2 Perf to Efficiency core is not. M2 to random benchmark platform target is not. M2 to Raspberry Pi is not. The change percentages however are comparable at least for the target triple & glibc version.)

So this will only ever give a rough idea and changing a benchmark code will result in a new graph.
This should only ever be a hint at „there might be something of“.

joshka · 2024-05-11T10:29:35Z

Changes on the other hand can be compared

Yes. This is the entire point of bencher.

orhun · 2024-05-16T19:56:27Z

Looks cool! I'm curious about what do we need to move forward with it? Should we reach out to them or can we just integrate it somehow?

joshka · 2024-05-16T20:53:33Z

Looks cool! I'm curious about what do we need to move forward with it? Should we reach out to them or can we just integrate it somehow?

Read the docs, sign up, do the tasks required.

joshka added the enhancement New feature or request label May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onboard to bencher, to start tracking benchmarks over time #1092

Onboard to bencher, to start tracking benchmarks over time #1092

joshka commented May 11, 2024

EdJoPaTo commented May 11, 2024

joshka commented May 11, 2024

orhun commented May 16, 2024

joshka commented May 16, 2024

Onboard to bencher, to start tracking benchmarks over time #1092

Onboard to bencher, to start tracking benchmarks over time #1092

Comments

joshka commented May 11, 2024

Problem

Solution

Alternatives

Additional context

EdJoPaTo commented May 11, 2024

joshka commented May 11, 2024

orhun commented May 16, 2024

joshka commented May 16, 2024