[ENHANCEMENT] PromQL: use Kahan summation for sum() #14074

bboreham · 2024-05-09T13:32:52Z

This can give a more precise result, by keeping a separate running compensation value to accumulate small errors.
See https://en.wikipedia.org/wiki/Kahan_summation_algorithm

Possible improvement for #14052 (I won't call it a fix).

I'm re-using floatMean which is otherwise unused for sum(); it would be clearer to add a new field but would cost 8 bytes per output value.

Benchmarks show a bigger slow-down than I expected.

goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/promql
cpu: Intel(R) Core(TM) i7-14700K
                                                                                                          │ before.txt  │             after.txt             │
                                                                                                          │   sec/op    │   sec/op     vs base              │
RangeQuery/expr=sum(a_hundred),steps=1-28                                                                   142.2µ ± 1%   144.2µ ± 0%  +1.41% (p=0.009 n=6)
RangeQuery/expr=sum(a_hundred),steps=100-28                                                                 447.5µ ± 2%   460.3µ ± 1%  +2.85% (p=0.002 n=6)
RangeQuery/expr=sum(a_hundred),steps=1000-28                                                                2.670m ± 2%   2.760m ± 1%  +3.37% (p=0.002 n=6)
RangeQuery/expr=sum_without_(l)(h_hundred),steps=1-28                                                       1.637m ± 2%   1.653m ± 1%       ~ (p=0.240 n=6)
RangeQuery/expr=sum_without_(l)(h_hundred),steps=100-28                                                     5.081m ± 2%   5.352m ± 1%  +5.33% (p=0.002 n=6)
RangeQuery/expr=sum_without_(l)(h_hundred),steps=1000-28                                                    33.30m ± 2%   35.05m ± 0%  +5.25% (p=0.002 n=6)
RangeQuery/expr=sum_without_(le)(h_hundred),steps=1-28                                                      1.676m ± 2%   1.691m ± 1%  +0.89% (p=0.026 n=6)
RangeQuery/expr=sum_without_(le)(h_hundred),steps=100-28                                                    5.315m ± 2%   5.503m ± 1%  +3.54% (p=0.002 n=6)
RangeQuery/expr=sum_without_(le)(h_hundred),steps=1000-28                                                   35.13m ± 1%   36.76m ± 1%  +4.62% (p=0.002 n=6)
RangeQuery/expr=sum_by_(l)(h_hundred),steps=1-28                                                            1.654m ± 1%   1.685m ± 1%  +1.84% (p=0.002 n=6)
RangeQuery/expr=sum_by_(l)(h_hundred),steps=100-28                                                          5.349m ± 2%   5.476m ± 0%  +2.39% (p=0.002 n=6)
RangeQuery/expr=sum_by_(l)(h_hundred),steps=1000-28                                                         34.97m ± 1%   36.87m ± 1%  +5.44% (p=0.002 n=6)
RangeQuery/expr=sum_by_(le)(h_hundred),steps=1-28                                                           1.618m ± 1%   1.647m ± 0%  +1.79% (p=0.002 n=6)
RangeQuery/expr=sum_by_(le)(h_hundred),steps=100-28                                                         5.154m ± 1%   5.379m ± 1%  +4.35% (p=0.002 n=6)
RangeQuery/expr=sum_by_(le)(h_hundred),steps=1000-28                                                        33.34m ± 2%   35.18m ± 1%  +5.51% (p=0.002 n=6)
RangeQuery/expr=sum_without_(l)(rate(a_hundred[1m])),steps=1-28                                             174.5µ ± 1%   176.0µ ± 1%       ~ (p=0.065 n=6)
RangeQuery/expr=sum_without_(l)(rate(a_hundred[1m])),steps=100-28                                           897.6µ ± 1%   911.0µ ± 1%  +1.49% (p=0.015 n=6)
RangeQuery/expr=sum_without_(l)(rate(a_hundred[1m])),steps=1000-28                                          6.615m ± 1%   6.690m ± 0%  +1.14% (p=0.002 n=6)
RangeQuery/expr=sum_without_(l)(rate(a_hundred[1m]))_/_sum_without_(l)(rate(b_hundred[1m])),steps=1-28      361.2µ ± 1%   360.3µ ± 1%       ~ (p=0.699 n=6)
RangeQuery/expr=sum_without_(l)(rate(a_hundred[1m]))_/_sum_without_(l)(rate(b_hundred[1m])),steps=100-28    1.848m ± 1%   1.859m ± 1%       ~ (p=0.132 n=6)
RangeQuery/expr=sum_without_(l)(rate(a_hundred[1m]))_/_sum_without_(l)(rate(b_hundred[1m])),steps=1000-28   13.46m ± 1%   13.61m ± 1%  +1.14% (p=0.041 n=6)
geomean                                                                                                     3.017m        3.095m       +2.58%

This can give a more precise result, by keeping a separate running compensation value to accumulate small errors. See https://en.wikipedia.org/wiki/Kahan_summation_algorithm Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

beorn7

Thank you very much.

One day, we should apply all the Kahan tricks to histograms, too…

[ENHANCEMENT] PromQL: use Kahan summation for sum()

ea82b49

This can give a more precise result, by keeping a separate running compensation value to accumulate small errors. See https://en.wikipedia.org/wiki/Kahan_summation_algorithm Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

bboreham requested a review from roidelapluie as a code owner May 9, 2024 13:32

bboreham mentioned this pull request May 15, 2024

PromQL: Multiplication by 1 creates additional floating point errors (but should be a no-op) #14052

Open

beorn7 approved these changes May 15, 2024

View reviewed changes

beorn7 mentioned this pull request May 15, 2024

PromQL(histograms): Make more use of Kahan summation for native histograms #14105

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] PromQL: use Kahan summation for sum() #14074

[ENHANCEMENT] PromQL: use Kahan summation for sum() #14074

bboreham commented May 9, 2024 •

edited

beorn7 left a comment

[ENHANCEMENT] PromQL: use Kahan summation for sum() #14074

Are you sure you want to change the base?

[ENHANCEMENT] PromQL: use Kahan summation for sum() #14074

Conversation

bboreham commented May 9, 2024 • edited

beorn7 left a comment

Choose a reason for hiding this comment

bboreham commented May 9, 2024 •

edited