Default to mean time, when for allocations #340

PallHaraldsson · 2023-10-22T17:12:13Z

No description provided.

PallHaraldsson · 2023-10-22T17:16:32Z

Note, idea take from @chriselrod's PR:

# macros are too awkward to work with, so we use a function
# mean times are much better for benchmarking than minimum
# whenever you have a function that allocates
function bmean(f)
  b = @benchmark $f()
  BMean(b)
end

chriselrod · 2023-10-22T17:27:53Z

In my opinion, when you're comparing codes with different amounts of allocations, mean is the most appropriate. GC produces a skewed distribution, and median misses those heavy tails.

If micro-optimzing a kernel, you should ideally have non-allocating code.

But many of us use BenchmarkTools for non-microbenchmarks.

PallHaraldsson · 2023-10-22T17:33:37Z

Yes, I accidentally used median, not mean. I meant to copy your idea as is. And I'm editing this blind. I've never made a macro or changed (or one that calls another macro...).

At least if people want this change, with mean, it would be good to get it confirmed. Or knowing if not so I can stop looking into this.

I've preferred the median in some cases (though usually the min), but now I'm thinking was I wrong, when would that be best? Is it only shown in @benchmark to notice it's different from the mean?

chriselrod · 2023-10-22T21:41:20Z

Yes, I accidentally used median, not mean. I meant to copy your idea as is. And I'm editing this blind. I've never made a macro or changed (or one that calls another macro...).

At least if people want this change, with mean, it would be good to get it confirmed. Or knowing if not so I can stop looking into this.

I've preferred the median in some cases (though usually the min), but now I'm thinking was I wrong, when would that be best? Is it only shown in @benchmark to notice it's different from the mean?

I've encountered resistance to the suggestion before. Many argue minimum is the best, because all noise is positive, slowing you down from the ideal.
However, whenever heap allocations are involved, that's not really correct.

This is especially the case when you have a GC, but it is even true for malloc/free, but to a much smaller degree (free is normally extremely fast, but every now and then can trigger some work to make the freed memory actually available outside of fast reuse caches).

Ignoring this cost, which sporadically results in extra time required, is wrong.

You probably don't want to measure things like page faults or (even worse) the OS deciding to randomly context switch you, which is why using the minimum is common advice.

I had a form of that matrix exp benchmark about a year ago for an important work-load, where I optimized it for Array so that the minimum time matched the minimum time for SArray!
This was great because SArray took like 3 minutes longer to compile.
Unfortunately, the Array version in practice actually took several times longer to run, because while minimums matched, the full multithreaded application choked on GC...

chriselrod · 2023-10-22T21:52:38Z

A short summary:

minimum assumes all noise is positive. Wrong when you incur costs that periodically show up as time, as then you can have negative noise as in "paid less of that cost than average".
median minimizes absolute error. The centering parameter of a Laplace/double exponential distribution is the sample median (the logpdf of this distribution sums the absolute value of errors).
mean minimizes square error. The centering parameter of a Gaussian distribution is the sample mean (the logpdf of this distribution sums the squares; this is often intuitive as it minimizes distance, while the central limit theorem explains why this one tends to work so well in practice).

What we care about generally is the expected value of the time.
If we run things a million times, the duration will be around a million times the expected value.

The expected value is generally best estimated by the sample mean, although distributional assumptions can change that (in which case the estimates disagreeing would probably be a reflection of that assumption being bad).
Some distributions don't have means, e.g. the Cauchy distribution is too variable and thus sample means have the same variability as individual samples. But I don't think we have that sort of extreme problem (but it can show up from things as benign as the ratio of two independent zero-mean gaussian variables).

PallHaraldsson · 2023-10-23T17:13:28Z

This works now, in case people want to merge this as is. I'm not sure how to increase to code coverage, I guess it would need to test the new (trivial) code path, and I sort of know how, but not exactly in the CI, in case anyone wants to help with that.

gdalle · 2023-10-23T17:40:04Z

Thank you for the contribution!
I'm a maintainer of BT but I really don't have the expertise necessary to debate on benchmarking methods.
However, as far as benchmarking interfaces are concerned, I don't really like the idea of outputting different things depending on what the benchmarked code does. I'll try to see if this has popped up in issues before.

gdalle · 2023-10-23T17:43:45Z

See previous discussions:

PallHaraldsson · 2023-10-23T17:44:21Z

I believe it's well-argued that mean should be shown (when and only when allocating), why I did the PR, but I'm also hesitant to show that only, since this is a different number then previously shown and expected by many why I also kept the old one. Then it's clear to users, and also when only that number (correctly(?!)) shown.

I tend to also use @benchmark also but many users would not, or think only one number would do.

Most often there are allocations or not, and when the same amount. I think different number of allocations (because of different code paths) can and should be handled in a separate PR if at all (it's not a very common case?).

Should you maybe merge, to master, and see if people object? It's always possible to revert, even after in stable version...

gdalle · 2023-10-23T17:54:09Z

I just remembered where the other lengthy discussion about this was:

Show mean time in @btime #258

gdalle · 2023-10-23T17:55:05Z

@KristofferC was firmly against the mean in @btime, and therefore asking to name this new mean-focused macro differently.

KristofferC · 2023-10-24T09:54:55Z

BenchmarkTools itself runs the GC explicitly (see gcscrub) so inferring things about the GC time consumption in a run is unlikely to be very useful. It's kind of random if the GC time will be included in the measurement or not. Also, changing the statistics based on whether an allocation happens is surprising and easily missed.

If you want to get the mean, use @benchmark or introduce a @bmean or something.

chriselrod · 2023-10-24T15:31:29Z

I do support host adding @bmean.
No need to change how any existing stuff works.

BenchmarkTools itself runs the GC explicitly (see gcscrub)

No idea how successful my solution is, but I do try to disable that:
https://github.com/chriselrod/ElrodConfig.jl/blob/a3f4303bea52489f1162d2a9a443c06fc90c4547/src/ElrodConfig.jl#L76

PallHaraldsson · 2023-10-25T11:36:32Z

If we step back a bit, then the purpose of benchmarking, timing, isn't usually the timing as a number itself. It's "can I improve it?". Or have I reached the optimal code. Thus you may want to track it over time, but rather I would like a macro that compares two versions of my code and tells me which is faster.

I do support host adding @bmean.
No need to change how any existing stuff works.

I think that's actually worse. It may be helpful to Elrod or other users that think (or know) that the mean is better (those users can use @benchmark), but most would be confused: should I use @btime or @bmean?

I want the best info for @btime, condensed into one line, whatever that may be.

If you have to show only one number I think we all agree the minimum is best, at least when not allocating.

Do we think it would be good to show one more number, and possibly in same format as when not allocating?

@btime sin(1)
  1.877 ns (mean is 5% slower; 0 allocations)

The max is an order of magnitude slower always (ranges from 9x-14x), and at first I dismissed it as useless info (why even show it?), but maybe it isn't, showing speed from cold cache? Is that sometimes useful? At least it gets calculated into the mean. I'm not saying we should show it directly. The mean is seemingly mostly immune to the max as an outlier. The median even more, and I'm not sure we shouldn't be showing that rather than the mean.

Do you like two numbers as timings, or the other as a fraction, as shown? Or do you insist one just one number?

[One other possible PR I'm not going to make would be showing e.g. 0–20 allocations, when there's a range. It's for rather unusual situations anyway, but the logic in this PR assumes 0 or not...]

Default to median time, when for allocations

00d6fbe

PallHaraldsson marked this pull request as draft October 22, 2023 17:16

Show mean for allocations

85d787e

PallHaraldsson changed the title ~~Default to median time, when for allocations~~ Default to mean time, when for allocations Oct 22, 2023

Fix to mean

45438d2

PallHaraldsson marked this pull request as ready for review October 22, 2023 18:33

Fix to mean

8b3cbb6

Update test

c055dca

PallHaraldsson marked this pull request as draft October 25, 2023 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default to mean time, when for allocations #340

Default to mean time, when for allocations #340

PallHaraldsson commented Oct 22, 2023

PallHaraldsson commented Oct 22, 2023

chriselrod commented Oct 22, 2023 •

edited

PallHaraldsson commented Oct 22, 2023 •

edited

chriselrod commented Oct 22, 2023 •

edited

chriselrod commented Oct 22, 2023

PallHaraldsson commented Oct 23, 2023

gdalle commented Oct 23, 2023

gdalle commented Oct 23, 2023

PallHaraldsson commented Oct 23, 2023 •

edited

gdalle commented Oct 23, 2023

gdalle commented Oct 23, 2023 •

edited

KristofferC commented Oct 24, 2023 •

edited

chriselrod commented Oct 24, 2023

PallHaraldsson commented Oct 25, 2023 •

edited

Default to mean time, when for allocations #340

Are you sure you want to change the base?

Default to mean time, when for allocations #340

Conversation

PallHaraldsson commented Oct 22, 2023

PallHaraldsson commented Oct 22, 2023

chriselrod commented Oct 22, 2023 • edited

PallHaraldsson commented Oct 22, 2023 • edited

chriselrod commented Oct 22, 2023 • edited

chriselrod commented Oct 22, 2023

PallHaraldsson commented Oct 23, 2023

gdalle commented Oct 23, 2023

gdalle commented Oct 23, 2023

PallHaraldsson commented Oct 23, 2023 • edited

gdalle commented Oct 23, 2023

gdalle commented Oct 23, 2023 • edited

KristofferC commented Oct 24, 2023 • edited

chriselrod commented Oct 24, 2023

PallHaraldsson commented Oct 25, 2023 • edited

chriselrod commented Oct 22, 2023 •

edited

PallHaraldsson commented Oct 22, 2023 •

edited

chriselrod commented Oct 22, 2023 •

edited

PallHaraldsson commented Oct 23, 2023 •

edited

gdalle commented Oct 23, 2023 •

edited

KristofferC commented Oct 24, 2023 •

edited

PallHaraldsson commented Oct 25, 2023 •

edited