Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory consumption of Trend Metrics #1068

Closed
Sirozha1337 opened this issue Jun 29, 2019 · 7 comments
Closed

Optimize memory consumption of Trend Metrics #1068

Sirozha1337 opened this issue Jun 29, 2019 · 7 comments

Comments

@Sirozha1337
Copy link
Contributor

Sirozha1337 commented Jun 29, 2019

Running a test that's 12 hours long crashes the program around 6 hours in, the memory consumption is tremendous, it can eat 64gb RAM. While searching for a memory leak, I've noticed, that the engine stores all samples for Trend metrics in memory until the end of the test. I think this is the main thing that leads to such a high memory consumption. I propose refactoring of the TrendSink, so that we wouldn't need to keep all the samples.

The Avg, Med and percentiles can be calculated by using Moving percentile algorithm, it only needs the last sample, new sample and current value. The only problem is that we have to know before-hand what percentiles we want to calculate, but looking at the current code there are only two percentiles in the output, they are 90th and 95th percentile.

@na--
Copy link
Member

na-- commented Jun 30, 2019

Yeah, we know about the Trend issues with long-running tests... My proposal to completely solve this was to refactor the TrendSink to use HdrHistograms (#763), and/or for k6 to do smarter threshold and summary value calculations (#764) to reduce their impact. Unfortunately, those tasks have gotten delayed a few times, because of other bugs or important features 😞

As a potential workaround, there's currently a way to completely avoid using any TrendSinks in k6, but it only works if you're not interested in the end-of-test summary stats and you don't have any thresholds. This is done by running k6 with --no-thresholds --no-summary and is usually only useful when you use an external output like InfluxDB or Load Impact Insights.

@Sirozha1337
Copy link
Contributor Author

Your proposal is great, it would fix the memory leak and won't cost you much data. The thing is, right now we have all the data in memory and the only things that we need from that data are averages and percentiles. Maybe we can use the Rolling Average algorithm to fix it now? And if we want to show more data in the summary, we will implement the Hdr Histogram for that.

@na--
Copy link
Member

na-- commented Jul 1, 2019

The averages are easy to calculate without storing all the values, that has never been the issue. Percentiles on the other hand are much trickier... I remember researching some memory-efficient percentile algorithms when I originally found the issue (I was searching for a memory leak 😅 ), but I then concluded that the HdrHistogram would be both easier to implement and more flexible. Can you share what specific moving percentile algorithm you recommend?

@Sirozha1337
Copy link
Contributor Author

Sirozha1337 commented Jul 1, 2019

I am talking about this algorithm: https://mjambon.com/2016-07-23-moving-percentile/

If xi<m_i−1 then
    m_i=m_{i−1}−δ/p
else if xi>mi−1 then
    m_i=m_{i−1}+δ/(1−p)
else xi=m_i−1 and keep the previous value:
    m_i=m_{i−1}

Here m_i is the current value of the percentile, p is the percentile itself and δ is a user chosen constant that expresses how fast the calculated percentile will converge to the real value, this constant matters only if you want to see realtime results, because in the end it converges to the real value anyway.

It seems much easier to implement than an Hdr Histogram, though not as flexible.

@na--
Copy link
Member

na-- commented Jul 1, 2019

Ah, I remembered something else that precluded the use of such a moving percentile algorithm...To use it, we need to 1) know the specific percentiles we need to calculate from the start and 2) refactor the code quite a bit so that the various Trend sinks "know" from the start that they need to calculate these specific percentiles.

The first part, knowing the needed percentiles at the start of the test, is probably the harder one... There are currently two things that may require percentiles - the end-of-test summary stats and the thresholds.

The summary stats at the end of the test, as you've said, it's relatively easy - we know what we need. It's p(90) and p(95) by default, though that can be changed by the summaryTrendStats option, e.g. k6 run --summary-trend-stats="min,med,avg,max,p(99),p(99.9),p(99.99)" script.js. But in either case, we can easily know them at the start of the test.

The thresholds are the problem here... For example, if we have this k6 script:

import http from "k6/http";

export let options = {
  thresholds: {
    http_req_duration: ["p(95)<500"],
  }
};

export default function () {
  http.get("https://httpbin.org/");
}

k6 won't actually parse the expression p(95)<500 properly, in a way that we'll know that we need p(95). Instead, in the local k6 execution, we actually evaluate the thresholds by executing them in a restricted JS runtime that has the sink as a local variable and p() is just a simple helper function there: https://github.com/loadimpact/k6/blob/bdd417e8a4206c6bbbef36d7ce7bea48f34a0804/stats/thresholds.go#L33-L35

I assume that this was done as a quick fix to get thresholds working, since there was already a JS runtime in k6 anyway... But it prevents us from properly reasoning about what percentile values we'll need at the start of the test... 🤦‍♂️ There's a separate issue to fix this (#961), but we can't use the simple moving percentiles algorithms before we do... So, the HdrHistogram seemed (and it still does) like the simpler solution we can just plug into the current k6 architecture (because with it we won't need to know the percentiles we require, among other benefits) and it will allow us to fix the thresholds architecture separately, at a later time.

@Sirozha1337
Copy link
Contributor Author

Ah, that's what I was missing. I didn't take into account that you can specify any percentile in the thresholds.
Looks like HdrHistogram is the best solution then. Sorry for wasting your time and thank you for explanation, now the code makes more sense to me.

@na--
Copy link
Member

na-- commented Jan 27, 2021

Closing this in favor of #763

@na-- na-- closed this as completed Jan 27, 2021
@na-- na-- removed this from the v1.0.0 milestone Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants