"sliding window" thresholds #2379

LaserPhaser · 2022-02-07T14:06:33Z

Feature Description

The current implementation of the threshold mechanism works only for absolute values.
For example:
When I set "autostop" for 5% errors, it means that I need to collect 5% of the errors from the whole run.
But usually, degradations happen when RPS became really high.
And if you get to this amount of RPS step by step, you need to wait for some time and it will be for example
100% of errors for the last 1 minute, which results in 10% of errors for the whole run.

As a numeric example:
we run the following configuration:
10 RPS for 1 minute - it will cause in a total of 60 requests - everything is 200 -ok
20 RPS for 1 minute - it will cause in a total of 120 requests - everything is 200 -ok
30 RPS for 1 minute - it will cause in a total of 180 requests - everything is 200 -ok
50 RPS for 1 minute - it will cause in a total of 300 requests - everything is 200 -ok
60 RPS for 1 minute - it will cause in a total of 360 requests - and here became crash on last 10 sec - so 300 RPS ok and 60 error

So finally we have
60+120+180+300+300 = 960 - "200 ok"
and 60 - "500 fails"

This 60 will be only about 5.8% of the total.

But for the last 10 seconds, it will be a 100% error rate.

Already existing or connected issues / PRs (optional)

No response

The text was updated successfully, but these errors were encountered:

na-- · 2022-02-07T15:53:48Z

This is somewhat of a duplicate of #1136, but it's much better explained (:blush:) and the other issue has become more of a catch-all that just collects various semi-related threshold improvement ideas, so I'll leave both open for now...

Implementing this efficiently will be quite complicated though. Sliding time windows are probably easy and efficient to implement for Counter metrics, but not so much for Trend ones... And I have no idea how HDR histograms (#763) will work with them 😕 The syntax might also look different from what you propose - there are other issues with the current threshold syntax and we might adopt a v2 syntax that resembles something like PromQL, for example... 🤷‍♂️ Still, it's definitely a very valid use case we need to address, so thank you for opening such a detailed issue.

For now, as a workaround in some situations, you can approach the problem from the opposite direction... Instead of setting thresholds for time windows, you can set the thresholds for specific tags (sub-metrics) and use the recently introduced ability to manually set VU-wide custom metric tags through the vu.tags property from k6/execution. You can set different tag values based on the current test execution time, e.g. here's how you can tag metrics based on the stage the script is currently in: #796 (comment) It's not the same and it's much less flexible than sliding time windows, but it's a viable workaround for some simpler cases.

LaserPhaser · 2022-02-10T08:43:27Z

@na-- maybe we can use https://pkg.go.dev/github.com/RussellLuo/slidingwindow#section-readme for example?
I think I can implement a sliding window for "rate" with this library as Proof of Concept of the feature.

na-- · 2022-02-10T12:40:31Z

maybe we can use https://pkg.go.dev/github.com/RussellLuo/slidingwindow#section-readme for example?

I am not sure this specific library could actually be used to calculate the sliding window thresholds for a Rate metric, it seems more like a rate-limiter implementation 😕 Maybe some of its internals can be reused, I don't know, but it doesn't matter all that much for now - that's probably the smallest potential problem I can see with this proposal. I don't want to dissuade you from trying to implement something like this, but there are a lot of issues and current in-progress work that surrounds these parts of k6 and that will probably prevent us from merging any such contribution soon, if ever... 😞

We are currently in the midst of some pretty big threshold refactoring (see #2356 and the connected issues, cc @oleiade), as the first step towards better thresholds. The problem is, we are still not sure about what steps 2, 3 and so on look like yet. We just know that there are plenty of deficiencies with the current thresholds, both in their capabilities and in their syntax, but we don't know exactly what the end goal looks like yet. For example, the syntax v2 might be PromQL-like, it might be something like what you propose (though rate[1m]<0.01 is probably better than rate<0.01[1m] 🤔 ), it might be something completely different 🤷‍♂️

Somewhat connected to the above, we are also in the middle of refactoring how we handle metrics and metric samples. Recently we introduced a metrics registry (#1832) and likely upcoming changes include the tracking of distinct time series (#1831), user control of which metrics and sub-metrics k6 actually emits (#1321), and refactoring in how we store metrics in-memory, likely including transitioning to something like HDR histograms (#763) for Trend metrics.

Finally, thresholds in k6 run are evaluated somewhat differently than thresholds in k6 cloud / distributed tests, since you have multiple streams of metrics to crunch. So, even if the local implementation looks easy, the cloud/distributed execution needs its own evaluation and/or better validation.

All of these things might introduce different tradeoffs and affect how we implement "sliding window" thresholds, and vice-versa. So, it's currently difficult to gauge if any one-off changes like the one you propose in this issue will be in the direction we want to go or in some different direction that ties our hands... 😞

srperf · 2023-09-18T16:34:18Z

I would do this with a custom metric.
Create a threshold against it.
During the executions, add the value as it is generated.
Every time we move from one time window to the next increase the metric in an order of magnitude and se the threshold as well. That way the previous values are not in the significative numbers for the threshold to take into account.
One Idea to add up to this situation.

The other I can think is to give the functionality to restart custom metrics and keep the threshold against that metric.

LaserPhaser added the feature label Feb 7, 2022

LaserPhaser changed the title ~~Thresholds for "sliding window"~~ "sliding window" thresholds Feb 7, 2022

na-- mentioned this issue Jun 15, 2022

PoC Time series grafana/xk6-output-prometheus-remote#27

Closed

na-- mentioned this issue Jan 4, 2023

Have parameter to abort if test doesn't reach target iterations per second #2837

Open

ppcano mentioned this issue May 9, 2023

Be more explicit about how Thresholds are evaluated grafana/k6-docs#1166

Open

codebien mentioned this issue Jan 5, 2024

Threshold improvements #1136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"sliding window" thresholds #2379

"sliding window" thresholds #2379

LaserPhaser commented Feb 7, 2022 •

edited

na-- commented Feb 7, 2022

LaserPhaser commented Feb 10, 2022

na-- commented Feb 10, 2022

srperf commented Sep 18, 2023

"sliding window" thresholds #2379

"sliding window" thresholds #2379

Comments

LaserPhaser commented Feb 7, 2022 • edited

Feature Description

Suggested Solution (optional)

Already existing or connected issues / PRs (optional)

na-- commented Feb 7, 2022

LaserPhaser commented Feb 10, 2022

na-- commented Feb 10, 2022

srperf commented Sep 18, 2023

LaserPhaser commented Feb 7, 2022 •

edited