Later benchmarks are notoriously slower, possibly due to CPU throttling #954

donquixote · 2021-12-06T00:10:38Z

Problem

I have a test case like this:

class MyBench {
  public function benchA1() {...}
  public function benchB1() {...}
  public function benchA1() {
    ... // Exact copy of benchA1().
  }
  public function benchB1() {
    ... // Exact copy of benchB1().
  }
}

When running all of them, the later (copied) version is notoriously slower than the previous version. E.g. A2 will be slower than A1, and B2 will be slower than B1. I saw differences higher than 10%, and this was reproducible.

I assume this is because the processor gets heated and will throttle when it hits the later benchmark cases.

This can lead to misleading results when comparing the performance of different algorithms.

Solution 1: Sleep

We can use the --sleep parameter or @Sleep annotribute, hoping that the processor will cool down.
I tried with --revs=1 --iterations=50 --sleep=10000, it seems to improve stability sometimes, but the problem does not fully go away.
Perhaps I need to use higher sleep numbers?

One problem with this is that a user won't see if the sleep number was "high enough", because usually there are no 1:1 copied benchmark methods.

Solution 2: Mix it up

In the past, I created a custom benchmark tool where I would break the strict ordering of benchmark cases.
E.g. I would run A, B, A, B, A, B instead of A, A, A, B, B, B.

While this makes a more "fair" comparison of A vs B in the current run, it does not help with differences to earlier runs, where the CPU might have been less heated.

Solution 3: Pre-stress the processor

Heat up the processor so that all the benchmark cases suffer from the same throttling.

I don't know if a processor will reach a fixed throttling level, or if it will throttle more and more.
I assume it depends on many factors.

Solution 4: Measure current raw CPU speed?

A reference operation could be used to measure the current CPU speed.
A report could show benchmark times divided by the duration for the reference operation.

Problems with this:

The divided number will be an imaginary unit, because the reference operation is completely arbitrary.
There might be high variation in the reference operation measurements.
Resources used for the reference operation might be different than for the benchmark operation, so the division might distort the results.

The text was updated successfully, but these errors were encountered:

dantleech · 2021-12-06T09:53:28Z

I don't see this behavior (Linux), e.g. 4 different bcrypt benches:

    benchBcrypt1............................I9 ✔ Mo47.757ms (±1.23%)
    benchBcrypt2............................R1 I5 ✔ Mo49.465ms (±1.70%)
    benchBcrypt3............................R1 I9 ✔ Mo47.414ms (±0.91%)
    benchBcrypt4............................R1 I9 ✔ Mo47.466ms (±0.45%)

which platform are you on?

Solution 2: Mix it up

Non-linear exceution is planned, but not sure when I'll get round to it as it's a pretty large refactoring.

Solution 3: Pre-stress the processor

@Warmup could help maybe?

donquixote · 2021-12-06T12:59:18Z

Hi

I don't see this behavior (Linux), e.g. 4 different bcrypt benches:

Interesting.
Can I reproduce this somehow?
(I would share my own example to reproduce, but I feel it is too custom-ish)

Typical output for me:


+--------------+----------------------------------------------------+---------+---------+---------+---------+--------+----------+
| benchmark    | subject                                            | memory  | min     | max     | mode    | rstdev | stdev    |
+--------------+----------------------------------------------------+---------+---------+---------+---------+--------+----------+
| ClassesBench | benchParseTokensFull (PHPUnit\Framework\TestCase)  | 9.706mb | 1.128ms | 1.193ms | 1.145ms | ±1.37% | 15.872μs |
| ClassesBench | benchParseTokensFull1 (PHPUnit\Framework\TestCase) | 9.706mb | 1.145ms | 1.220ms | 1.168ms | ±1.91% | 22.564μs |
| ClassesBench | benchParseTokensFull2 (PHPUnit\Framework\TestCase) | 9.706mb | 1.182ms | 1.265ms | 1.215ms | ±1.76% | 21.526μs |
| ClassesBench | benchParseTokensFull3 (PHPUnit\Framework\TestCase) | 9.706mb | 1.195ms | 1.267ms | 1.222ms | ±1.49% | 18.375μs |
+--------------+----------------------------------------------------+---------+---------+---------+---------+--------+----------+

which platform are you on?

Linux, PHP 7.4.
What else could I provide?

Non-linear exceution is planned, but not sure when I'll get round to it as it's a pretty large refactoring.

Cool. Although this only solves relative differences within the same run, not differences between runs.

@Warmup could help maybe?

Already using it. Perhaps I need to warm up stronger..

dantleech · 2021-12-06T17:09:41Z

This is interesting. At least on Linux perhaps CPUscaling can be controlled: https://askubuntu.com/questions/523640/how-i-can-disable-cpu-frequency-scaling-and-set-the-system-to-performance

For me I also note that my CPUs are running at different frequencies (cpufreq-info), and it might be possible for PHPBench to limit the benchmarks to a single processor with taskset (though maybe that's not so much of an issue if dynamic scaling is turned off)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Later benchmarks are notoriously slower, possibly due to CPU throttling #954

Later benchmarks are notoriously slower, possibly due to CPU throttling #954

donquixote commented Dec 6, 2021

dantleech commented Dec 6, 2021 •

edited

donquixote commented Dec 6, 2021

dantleech commented Dec 6, 2021 •

edited

Later benchmarks are notoriously slower, possibly due to CPU throttling #954

Later benchmarks are notoriously slower, possibly due to CPU throttling #954

Comments

donquixote commented Dec 6, 2021

Problem

Solution 1: Sleep

Solution 2: Mix it up

Solution 3: Pre-stress the processor

Solution 4: Measure current raw CPU speed?

dantleech commented Dec 6, 2021 • edited

donquixote commented Dec 6, 2021

dantleech commented Dec 6, 2021 • edited

dantleech commented Dec 6, 2021 •

edited

dantleech commented Dec 6, 2021 •

edited