Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@BeforeMethod() does not get called before each call? #896

Open
Ocramius opened this issue Jul 24, 2021 · 16 comments
Open

@BeforeMethod() does not get called before each call? #896

Ocramius opened this issue Jul 24, 2021 · 16 comments

Comments

@Ocramius
Copy link
Contributor

I was working on laminas/laminas-servicemanager#93 today, and noticed that we still have the codebase sprinkled with blocks referencing #304

In fact, #304 has been closed as "old", but still applies today.

My assumption was that @BeforeMethods would be called before each call to said bench method: doesn't seem to be the case, and that leads to benchmarking warmed up caches too, which is a problem (especially if we're benchmarking said warmup).

In following example, I would expect 100 calls to mySetup(), but only 10 are occurring:

/** 
 * @BeforeMethods
 * @Revs(10)
 * @Iterations(10)
 */ 
class MyBench
{
    public function mySetup(): void
    {
        reset_all_the_things_here();
    }

    public function benchSomething(): void
    {
        // irrelevant
    }
}

What's the best way forward here? Is a change in @BeforeMethods viable? It would change the benchmark results for bench suites I've worked on so far, massively, but it would lead to more honest results.

@dantleech
Copy link
Member

dantleech commented Jul 24, 2021

The problem (as you know) is that then it introduces the call to a setUp from within the sample loop, which contaminates the time measurement. How would you offset it? You could run another benchmark for the clone and deduct it, but that benchmark also includes the overhead of microtime start/stop and X method calls to "setIUp" and the overhead of the for loop. The error margin could be such that you end up with a negative result.

@BeforeMethod is called once per iteration (i.e. sample), the code is then repeated X times consecutively to determine the net time, and then from that the average. Most of the time this is probably ok.

Trying to reset the state in a way that didn't end up reducing the percentage of time which is executing the code you are benchmarking is hard.

Possible solutions though:

  1. Don't care about it: include the call within the subject, accept that it's a constant overhead
  2. Can benchmark the "overhead" in another benchmark and deduct it in a report (the suggestion in Provide a way to reset state/caches at every loop #304 is now possible in master - something like mode(partition['result_time_avg']) - mode(frame[benchmark_name='Foobar' and subject_name="setupSomething"]["result_time_avg"])) as time
  3. Create a custom executor to try running benchmarks with calls to setup within the loop or similar (would need to base it on the remote executor with a custom template

@Ocramius
Copy link
Contributor Author

within the sample loop,

Shouldn't it be before measurements start?

In pseudo code:

loop {
    setup();
    start_measuring();
    method();
    collect_measurement();
}

@dantleech
Copy link
Member

dantleech commented Jul 24, 2021

i don't think so - because it further contaminates the loop. think of setUp for booting a DI container, loading fixtures etc. Even with micro "setup" calls it introduces 3 new method calls, and microtime can only report integers at microsecond resolution (ignore that we don't have hrtime).

your example is similar to running PHPBench with revs=1 (with the avoidable overhead of the for loop).

@Ocramius
Copy link
Contributor Author

So the correct workaround would be to run everything with revs=1? 🤔

What about warmup laps? Does @BeforeMethods get called once a warmup is complete?

@dantleech
Copy link
Member

dantleech commented Jul 26, 2021

So the correct workaround would be to run everything with revs=1?

Not really -the resolution would be too low - I would assume you are measuring at a microsecond resolution, that is thrown out by the calls to microtime and the for loop. The idea of the loop is to fill the "sampling space" as much as possible with work you want to benchmark to reduce the cost of the measurement itself.

It would be interesting to see if hrtime helps to solve this problem to any extent. Otherwise the solution probably is, as best as possible, benchmarking the per-revolution setup cost and deduct it (a new executor).

Would be interesting to try both.

What about warmup laps? Does @BeforeMethods get called once a warmup is complete?

BeforeMethods is called at the beginning of the iteration, before any call to the subject method:

// run before methods
foreach ($beforeMethods as $beforeMethod) {
$benchmark->$beforeMethod($parameters);
}
)

Worth thinking if you want to do heavy things like booting a DI container or preparing fixtures before every revolution. I think what we are talking could be a new feature rather than changing @BeforeMethods

@scorgn
Copy link

scorgn commented Nov 19, 2021

PHPUnit has the setUp method which runs on every test run. When you run a specific test, you want everything to be freshly set up each time it's ran. Even if it's running through the same test multiple times with different data, you still want to make sure you're re-setting up the test before each test run. If you couldn't undo changes from the last test run before each run then there would be certain things you wouldn't be able to test properly, unless you were to include the setup logic inside each test itself.

PHPUnit also lets you run bootstrapping logic before any tests are ran, so you can set up the application the way that you need only once at the beginning of the tests.

It seems like PHPBench is missing that feature, to be able to set up a benchmark before each run. Except in this case including the setup logic inside the benchmark methods themselves also has additional drawbacks that wouldn't be the case with PHPUnit.

Although maybe I'm misunderstanding the difference between revs and iterations and the solution really is to just have one rev with many iterations.

@Ocramius
Copy link
Contributor Author

Yeah, the reason why I haven't closed this issue yet is that:

  1. we really need a way to "reset" the SUT across runs
  2. we need to exclude anything that is considered "warmup" from the measurement

Right now, I think most bench suites that I have (which use phpbench/phpbench) are flawed because they assumed that @BeforeMethods would reset the SUT, which isn't the case. In fact, those tests would have radically different output, if phpbench behaved differently (closer to my expectation in OP).

If @BeforeMethods is not the solution (and changing it would be a BC break) we probably need a different concept that represents what we need, which is:

foreach ($repetition as $i) {
    reset_stuff();

    start_measuring();
    run_benchmark_method();
    stop_measuring();
}

@dantleech
Copy link
Member

dantleech commented Nov 19, 2021

My main concern (which may or may not be valid) is that it affects the information content of the sample.

start_measuring is non-zero stop_measuring is non-zero, they margin of error may overlap with the thing you are benchmarking. The reason for the for loop is to try and ensure that the time measured is your thing and not the instrumentation. Also (and I need to check) the grain is a microsecond so the longer the loop takes the better the accuracy.

The subject is reset via. BeforeMethods before it starts the revolutions (for loop), and the result is the Iteration.

You can run the loop with Revs=1 to achieve the result above (although could probably improve the template for that case). It would also be a bit slower (new process for each sample of 1 rev).

But in general, need to do some research to see what can be done or how correct my assumptions are.

@Ocramius
Copy link
Contributor Author

Ocramius commented Nov 19, 2021

start_measuring is non-zero stop_measuring is non-zero,

Agreed, but right now, we're not measuring anything valuable either :D

(new process for each sample of 1 rev)

That would then lead to things like autoloading overhead (and accidental measurement thereof), no?

@dantleech
Copy link
Member

dantleech commented Nov 19, 2021

Agreed, but right now, we're not measuring anything valuable either :D

Depends if you have state or not ?

That would then lead to things like autoloading overhead (and accidental measurement thereof), no?

@WarmUp can be employed for that

@Ocramius
Copy link
Contributor Author

Depends if you have state or not ?

There's always some degree of state, so re-creating (or re-using, which is where I would design two separate benchmarks to see the difference) the SUT is vital.

@Ocramius
Copy link
Contributor Author

Specifically, if you believe it is possible to achieve precise measurements as per current state, I would need some guidance on boesing/laminas-servicemanager@d6e7fe4 - perhaps this can be reduced to a documentation issue then?

@dantleech
Copy link
Member

to be honest i don't know. the original reasoning is that the grain is a microsecond, which is pretty useless at Revs=1 (it's 0 or 1 microsecond, compared with doing the same thing 1000 times to give you 0.655 microseconds). I don't know if that's correct now or then, or how accurate we can be with hrtime at this level.

the current execution template won't remove the for loop at revs=1, but that might not make much difference.

i'll need to investigate and experiment (and this is pretty much next priority)

@Ocramius
Copy link
Contributor Author

You don't need to make this a priority: at this point, we're just pillaging this issue :D

It's just a thing to be aware of, if you are tinkering with the design of phpbench/ in the near future. For now, clone $this->something is a sufficient workaround for me, although potentially unreliable and polluting the results.

@dantleech
Copy link
Member

well, it fits into general improvements around execution etc, and I genuinely want to be able to answer these questions 😅

@dantleech
Copy link
Member

so I did an experiment:

#944

outcome is that hrtime seems to remote the necessity of the loop at smaller scales. So we could add a new hrtime executor and provide an option to sample inside/outside the loop.

At that point running with Revs=1 and a before method woud work for you I think. We could also add an option to decide where the "before/after" methods get called, so that Revs=1000 could call the setup methods for each sampling.

@dantleech dantleech added this to To do in 1.3 Nov 21, 2021
@dantleech dantleech removed this from To do in 1.3 Nov 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants