Use HDR histograms for calculating percentiles in thresholds and summary stats #763

na-- · 2018-09-12T09:27:28Z

Currently the Trend-based threshold checks and end-of-test summary stats rely on saving all of the relevant metric values in-memory. This is basically a large memory leak that can negatively affect long-running and/or HTTP-heavy load tests, as reported by this user on slack.

For the moment the best solution for solving those issues without any loss of functionality or and only a very tiny loss of precision appears to be HdrHistogram. Here's an excerpt from the description in the above website:

A Histogram that supports recording and analyzing sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.
[ ... ]
HDR Histogram is designed for recoding histograms of value measurements in latency and performance sensitive applications. [ ... ] The HDR Histogram maintains a fixed cost in both space and time. A Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.

This is the original Java library by Gil Tene. He also has some great talks (this and its earlier version for example) that explain some of the common pitfalls when people measure latency and why he built HdrHistogram. They're also a very strong argument why we should prioritize the arrival-rate based VU executor... 😄

This is an MIT-licensed Go implementation of HdrHistogram, though it seems to be dead - archived repo with no recent commits and unresolved issues and PRs. So we may need to fork that repo and maintain it or and re-implement the algorithm ourselves.

Another thing that HdrHistogram may help with is exposing summary stats in the teardown() function or outputting them in a JSON file at the end. This is something that a lot of users have requested - #647 and #351 and somewhat #355.

Most of the difficulty there lies in exposing the raw data to the JS runtime (and with HdrHistogram we can expose its API), and especially with implementing the stats calculation in the distributed execution environment (the current Load Impact cloud or the future native k6 cluster execution). Having trend metrics backed by HdrHistogram should allow us to avoid the need to schlep all of the raw metrics data between k6 instances (or require an external DB) at the end of a distributed test...

The text was updated successfully, but these errors were encountered:

na-- · 2018-09-12T10:02:55Z

Just noticed that the go-metrics library proposed in #429 is a Go port this Java metrics library, the original author of which was Coda Hale - same person who authored the dead Go library for HdrHistogram linked above. Some further investigation of that metrics library (and the topic in general) may offer other benefits, so it may be worth it do deal with both this issue and #429 at the same time.

na-- · 2019-07-08T12:03:10Z

@mstoykov found an active fork of the original HDR histogram repo: https://github.com/kanosaki/hdrhistogram

Sirozha1337 · 2019-07-10T08:07:51Z

I've looked at the HDR histogram and go-metrics libraries, but it seems that their implementations of a histogram can only store int64 values, which isn't appropriate for Trend metrics.

na-- · 2019-07-10T08:39:24Z

Hm that's a good point 😕 I think all of the internal k6 Trend metrics like http_req_* ones actually start as time.Duration (which is basically int64), but then they get converted to float64 when they are put in the stats.Sample struct.

It seems to me like the choices are:

convert the floats in the metric samples back to int64, losing some performance and accuracy in the process
refactor the k6 metrics so that the original time.Duration values aren't lost
figure out an alternative to HdrHistogram (previous discussion why that would be difficult here: Optimize memory consumption of Trend Metrics #1068 (comment))

And in all these scenarios, but especially in the first 2 ones, it's not clear what we should do with custom user-defined Trend metrics. After all, the HdrHistogram is a good fit for the normal Trend metrics that k6 emits, since they are positive numbers that are usually close to 0. But a user may track something completely different and potentially unsuitable for an HdrHistogram to handle accurately...

Sirozha1337 · 2019-07-10T09:26:32Z

What if we implement HDR histogram only for time based trend metrics?
If the user sets parameter isTime for their metric, then Hdr Histogram will be used, otherwise we will use the current method for collecting trend metrics.

na-- · 2019-07-10T09:31:10Z

Yeah, that would probably work in most cases, though we don't have a guarantee that the times users choose would be close to 0. The alternative would probably be to expose the implementation a bit and add a separate parameter if the sink for the Trend metric should be based on the HdrHistogram or not...

Sirozha1337 · 2019-07-11T09:08:36Z

Or we can use histogram with exponentially decaying samples from go-metrics package, provided all the values will be int64. This type of histogram doesn't depend on what values are stored in there and it's much easier to rewrite if we need to support float64.

na-- · 2020-07-20T14:03:52Z

#1064 (comment) pointed to another approach that deserves some investigation, before we start implementing things: https://github.com/tdunning/t-digest

Go versions: https://github.com/spenczar/tdigest, https://github.com/influxdata/tdigest

GavinRay97 · 2020-07-24T15:28:32Z

I'm nobody in particular, but this would be really cool. I've been using HDR Histograms to provide a consistent interface over different load-testing tools output stats/metrics to be able to chart them together coherently (e.g. k6 + wrk2 + autocannon etc).

You're able to get a really high degree of information density, and HDR Histogram has addWhileCorrectingForCoordinatedOmission and copyCorrectedForCoordinatedOmission which is unique.

Currently to do this, I have to write the stdout JSONL logs to a file, create a line-reader, parse them, and then post-run build up the histogram from the logs:

Gist to avoid spamming thread with unneccessary code:
https://gist.github.com/GavinRay97/b57094686f64ad4591c55eb7b9dd5cac

atombender · 2020-12-19T02:25:35Z

Any movement on this? I'm sad that we don't get a complete histogram for timings. k6 produces some random percentiles (0, 50, 90, 95, 100; why not 99?), but that's not sufficent to draw a complete latency chart.

We've been using hdrhistogram-go for a project, and it seems mature enough to use in k6.

Sirozha1337 · 2020-12-19T11:18:25Z

@atombender you can specify whatever percentile you want.
Via command line arguments:
k6 run --summary-trend-stats="p(42),p(99),p(99.9)" script.js
or options object:

export let options = {
  summaryTrendStats: ['p(42)', 'p(99)','p(99.9)']
};

atombender · 2020-12-20T04:47:56Z

@Sirozha1337 That option causes k6 to print percentiles, but they don't end up in the file specified with --summary-export:

$ k6 run k6.js --summary-export summary.json --summary-trend-stats="p(42),p(99),p(99.9)"

          /\      |‾‾| /‾‾/   /‾‾/
     /\  /  \     |  |/  /   /  /
    /  \/    \    |     (   /   ‾‾\
   /          \   |  |\  \ |  (‾)  |
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: ./tools/perfrunner/k6.js
     output: -

  scenarios: (100.00%) 1 scenario, 100 max VUs, 31s max duration (incl. graceful stop):
           * test: 200.00 iterations/s for 1s (maxVUs: 50-100, gracefulStop: 30s)


running (01.0s), 000/050 VUs, 201 complete and 0 interrupted iterations
test ✓ [======================================] 050/050 VUs  1s  200 iters/s

    ✓ status200

    checks.....................: 100.00% ✓ 201  ✗ 0
    data_received..............: 225 kB  220 kB/s
    data_sent..................: 122 kB  119 kB/s
    gradient_query_exec_time...: p(42)=1      p(99)=17      p(99.9)=18.6
    http_req_blocked...........: p(42)=4µs    p(99)=245µs   p(99.9)=568.4µs
    http_req_connecting........: p(42)=0s     p(99)=188µs   p(99.9)=213.8µs
    http_req_duration..........: p(42)=2.5ms  p(99)=17.93ms p(99.9)=20.05ms
    http_req_receiving.........: p(42)=37µs   p(99)=73µs    p(99.9)=82.8µs
    http_req_sending...........: p(42)=19µs   p(99)=72µs    p(99.9)=82.2µs
    http_req_tls_handshaking...: p(42)=0s     p(99)=0s      p(99.9)=0s
    http_req_waiting...........: p(42)=2.43ms p(99)=17.85ms p(99.9)=19.9ms
    http_reqs..................: 201     196.581631/s
    iteration_duration.........: p(42)=2.83ms p(99)=18.22ms p(99.9)=20.97ms
    iterations.................: 201     196.581631/s
    vus........................: 50      min=50 max=50
    vus_max....................: 50      min=50 max=50

$ grep "p\(" summary.json
            "p(90)": 16,
            "p(95)": 16
            "p(90)": 0.205,
            "p(95)": 0.215
            "p(90)": 0.156,
            "p(95)": 0.167
            "p(90)": 16.618,
            "p(95)": 17.174
            "p(90)": 0.055,
            "p(95)": 0.058
            "p(90)": 0.043,
            "p(95)": 0.047
            "p(90)": 0,
            "p(95)": 0
            "p(90)": 16.516,
            "p(95)": 17.128
            "p(90)": 17.058415,
            "p(95)": 17.508086

imiric · 2020-12-21T09:01:07Z

@atombender That issue is being tracked in #1611, and a fix will likely land in v0.30.0, planned for mid-January.

No updates yet for this issue as others have taken up higher priority. Most of the team is on vacation right now, but I'll discuss making this a priority for the upcoming releases.

na-- · 2022-03-04T11:42:54Z

The Go HDR histogram repo seems to have been moved and somewhat revived at https://github.com/HdrHistogram/hdrhistogram-go

However, it seems like it might be better to potentially go with another library, https://github.com/openhistogram/circonusllhist

I haven't read the paper that compares it with other histogram implementations (incl. HDR histograms) yet, just watched this YouTube presentation from the authors, it definitely deserves some investigation.

na-- added performance refactor labels Sep 12, 2018

na-- mentioned this issue Sep 12, 2018

Restrict unneccessary calculation of unused threshold/summary metrics #764

Closed

na-- added the high prio label Sep 12, 2018

na-- mentioned this issue Oct 4, 2018

How to add a standard deviation metric #515

Open

na-- added this to the v1.0.0 milestone Oct 5, 2018

na-- mentioned this issue Jun 30, 2019

Optimize memory consumption of Trend Metrics #1068

Closed

This was referenced Aug 28, 2019

Investigate telegraf integration in k6 #1064

Closed

Threshold improvements #1136

Closed

na-- mentioned this issue Oct 27, 2019

Track and improve k6 memory usage and performance #1167

Open

na-- mentioned this issue Dec 10, 2019

looks like there is memory leak in k6 #1267

Closed

na-- mentioned this issue Apr 27, 2020

Clustering and distributed execution #140

Open

na-- mentioned this issue Jan 27, 2021

Get memory usage down #1543

Closed

na-- mentioned this issue Feb 16, 2021

Proposal for adding a new standard http_req_failures metric #1828

Closed

na-- mentioned this issue Mar 2, 2021

API does not return detailed or per-request metrics #1875

Closed

yorugac mentioned this issue Oct 29, 2021

Improve metrics processing and aggregation grafana/xk6-output-prometheus-remote#2

Closed

na-- mentioned this issue Feb 3, 2022

Reduce memory usage for long duration tests #2367

Open

na-- mentioned this issue Feb 7, 2022

"sliding window" thresholds #2379

Open

na-- mentioned this issue Jun 15, 2022

PoC Time series grafana/xk6-output-prometheus-remote#27

Closed

codebien mentioned this issue Jun 29, 2022

Time Series data model #2580

Closed

na-- mentioned this issue Sep 26, 2022

HTTP metric including connection time #2692

Open

mstoykov mentioned this issue Oct 4, 2022

Allow tests to run for an infinite duration #2701

Open

na-- self-assigned this Nov 2, 2022

oleiade mentioned this issue Nov 8, 2022

Either renaming or dropping the Sink.Calc method #2760

Closed

na-- modified the milestones: v1.0.0, TBD Nov 9, 2022

na-- mentioned this issue Nov 29, 2022

Fix code comment #2785

Merged

na-- mentioned this issue Dec 9, 2022

[WIP] Proof of concept changes (test suites, HDR histograms) #2816

Draft

This was referenced Jul 19, 2023

Update xk6-output-prometheus-remote to its latest main commit #3210

Merged

Support thresholds and the end-of-test summary in distributed execution #3213

Open

Distributed Execution Proposal #3218

Open

andrewslotin unassigned na-- Aug 9, 2023

codebien removed this from the TBD milestone Sep 27, 2023

joanlopez mentioned this issue Dec 19, 2023

Prometheus remote write flush time grows #3498

Closed

na-- mentioned this issue Feb 22, 2024

Add a design doc for distributed execution and test suites #3217

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use HDR histograms for calculating percentiles in thresholds and summary stats #763

Use HDR histograms for calculating percentiles in thresholds and summary stats #763

na-- commented Sep 12, 2018 •

edited

na-- commented Sep 12, 2018

na-- commented Jul 8, 2019

Sirozha1337 commented Jul 10, 2019

na-- commented Jul 10, 2019

Sirozha1337 commented Jul 10, 2019

na-- commented Jul 10, 2019

Sirozha1337 commented Jul 11, 2019

na-- commented Jul 20, 2020

GavinRay97 commented Jul 24, 2020

atombender commented Dec 19, 2020

Sirozha1337 commented Dec 19, 2020

atombender commented Dec 20, 2020

imiric commented Dec 21, 2020

na-- commented Mar 4, 2022 •

edited

Use HDR histograms for calculating percentiles in thresholds and summary stats #763

Use HDR histograms for calculating percentiles in thresholds and summary stats #763

Comments

na-- commented Sep 12, 2018 • edited

na-- commented Sep 12, 2018

na-- commented Jul 8, 2019

Sirozha1337 commented Jul 10, 2019

na-- commented Jul 10, 2019

Sirozha1337 commented Jul 10, 2019

na-- commented Jul 10, 2019

Sirozha1337 commented Jul 11, 2019

na-- commented Jul 20, 2020

GavinRay97 commented Jul 24, 2020

atombender commented Dec 19, 2020

Sirozha1337 commented Dec 19, 2020

atombender commented Dec 20, 2020

imiric commented Dec 21, 2020

na-- commented Mar 4, 2022 • edited

na-- commented Sep 12, 2018 •

edited

na-- commented Mar 4, 2022 •

edited