Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Later benchmarks are notoriously slower, possibly due to CPU throttling #954

Open
donquixote opened this issue Dec 6, 2021 · 3 comments

Comments

@donquixote
Copy link

Problem

I have a test case like this:

class MyBench {
  public function benchA1() {...}
  public function benchB1() {...}
  public function benchA1() {
    ... // Exact copy of benchA1().
  }
  public function benchB1() {
    ... // Exact copy of benchB1().
  }
}

When running all of them, the later (copied) version is notoriously slower than the previous version. E.g. A2 will be slower than A1, and B2 will be slower than B1. I saw differences higher than 10%, and this was reproducible.

I assume this is because the processor gets heated and will throttle when it hits the later benchmark cases.

This can lead to misleading results when comparing the performance of different algorithms.

Solution 1: Sleep

We can use the --sleep parameter or @Sleep annotribute, hoping that the processor will cool down.
I tried with --revs=1 --iterations=50 --sleep=10000, it seems to improve stability sometimes, but the problem does not fully go away.
Perhaps I need to use higher sleep numbers?

One problem with this is that a user won't see if the sleep number was "high enough", because usually there are no 1:1 copied benchmark methods.

Solution 2: Mix it up

In the past, I created a custom benchmark tool where I would break the strict ordering of benchmark cases.
E.g. I would run A, B, A, B, A, B instead of A, A, A, B, B, B.

While this makes a more "fair" comparison of A vs B in the current run, it does not help with differences to earlier runs, where the CPU might have been less heated.

Solution 3: Pre-stress the processor

Heat up the processor so that all the benchmark cases suffer from the same throttling.

I don't know if a processor will reach a fixed throttling level, or if it will throttle more and more.
I assume it depends on many factors.

Solution 4: Measure current raw CPU speed?

A reference operation could be used to measure the current CPU speed.
A report could show benchmark times divided by the duration for the reference operation.

Problems with this:

  • The divided number will be an imaginary unit, because the reference operation is completely arbitrary.
  • There might be high variation in the reference operation measurements.
  • Resources used for the reference operation might be different than for the benchmark operation, so the division might distort the results.
@dantleech
Copy link
Member

dantleech commented Dec 6, 2021

I don't see this behavior (Linux), e.g. 4 different bcrypt benches:

    benchBcrypt1............................I9 ✔ Mo47.757ms (±1.23%)
    benchBcrypt2............................R1 I5 ✔ Mo49.465ms (±1.70%)
    benchBcrypt3............................R1 I9 ✔ Mo47.414ms (±0.91%)
    benchBcrypt4............................R1 I9 ✔ Mo47.466ms (±0.45%)

which platform are you on?

Solution 2: Mix it up

Non-linear exceution is planned, but not sure when I'll get round to it as it's a pretty large refactoring.

Solution 3: Pre-stress the processor

@Warmup could help maybe?

@donquixote
Copy link
Author

Hi

I don't see this behavior (Linux), e.g. 4 different bcrypt benches:

Interesting.
Can I reproduce this somehow?
(I would share my own example to reproduce, but I feel it is too custom-ish)

Typical output for me:


+--------------+----------------------------------------------------+---------+---------+---------+---------+--------+----------+
| benchmark    | subject                                            | memory  | min     | max     | mode    | rstdev | stdev    |
+--------------+----------------------------------------------------+---------+---------+---------+---------+--------+----------+
| ClassesBench | benchParseTokensFull (PHPUnit\Framework\TestCase)  | 9.706mb | 1.128ms | 1.193ms | 1.145ms | ±1.37% | 15.872μs |
| ClassesBench | benchParseTokensFull1 (PHPUnit\Framework\TestCase) | 9.706mb | 1.145ms | 1.220ms | 1.168ms | ±1.91% | 22.564μs |
| ClassesBench | benchParseTokensFull2 (PHPUnit\Framework\TestCase) | 9.706mb | 1.182ms | 1.265ms | 1.215ms | ±1.76% | 21.526μs |
| ClassesBench | benchParseTokensFull3 (PHPUnit\Framework\TestCase) | 9.706mb | 1.195ms | 1.267ms | 1.222ms | ±1.49% | 18.375μs |
+--------------+----------------------------------------------------+---------+---------+---------+---------+--------+----------+

which platform are you on?

Linux, PHP 7.4.
What else could I provide?

Non-linear exceution is planned, but not sure when I'll get round to it as it's a pretty large refactoring.

Cool. Although this only solves relative differences within the same run, not differences between runs.

@Warmup could help maybe?

Already using it. Perhaps I need to warm up stronger..

@dantleech
Copy link
Member

dantleech commented Dec 6, 2021

This is interesting. At least on Linux perhaps CPUscaling can be controlled: https://askubuntu.com/questions/523640/how-i-can-disable-cpu-frequency-scaling-and-set-the-system-to-performance

For me I also note that my CPUs are running at different frequencies (cpufreq-info), and it might be possible for PHPBench to limit the benchmarks to a single processor with taskset (though maybe that's not so much of an issue if dynamic scaling is turned off)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants