[WIP] Add InterruptedIterations metric #1769

imiric · 2020-12-14T15:59:45Z

These are partial changes for #877, but I'm creating a draft PR to clear up a few things and get some feedback. It's probably easier to view this commit by commit.

Currently only scenarios 1 and 2 from this comment are working (cause: error and cause: fail). cause: interrupted is commented out as it needs a different approach, and I'm not sure how to handle the last ^C scenario (cause: signal would make sense for that).

I settled on using a Rate metric for interrupted_iterations, but opted to hack around actually emitting the 0 value in order to avoid an explosion of metric data, as this would be a duplicate of iterations, and given #1321 is still open we shouldn't exacerbate this problem. See e2b88ab.

imiric · 2020-12-14T16:15:24Z

Some UI questions:

fail('test') and throw 'test' currently look like this:

ERRO[0001] fail: test at github.com/loadimpact/k6/js/common.Bind.func1 (native)  executor=constant-vus scenario=default
ERRO[0001] test at file:///tmp/test-877-fail.js:5:11(8)  executor=constant-vus scenario=default

and the metric in the summary like this:

interrupted_iterations...: 57.14% ✓ 40  ✗ 30

I would like to make the fail case also show the correct file and line number. I haven't dug into bridge.go yet and not sure if it's possible with goja, but is this something that we want?
The way Rate metrics are currently rendered in the summary makes the ✓ and ✗ numbers kind of ambiguous. Does ✓ refer to the "successful"/"complete" iterations or how many were interrupted? And same for ✗, it could be interpreted both ways.
Currently ✓ is the Sink.Trues value, so these are the interrupted iterations, but not sure how we could make this clearer, besides using a Counter.

imiric · 2020-12-14T16:26:29Z

lib/executor/helpers.go

+			// TODO: Use enum?
+			// TODO: How to distinguish interruptions from signal (^C)?
+			// tags["cause"] = "duration"
+			// state.Samples <- stats.Sample{


This is where we would handle the 3rd scenario of executors interrupting the iteration, and while it worked in manual tests and TestMetricsEmission, it would cause other tests to hang (presumably because they were setting a nil channel), and I'm not sure if it wouldn't be racy, so I commented it out for now.

Is this the right place/approach, or would we have to receive two contexts here or something more complicated?

Digging through the context chains we're using, this shouldn't be racy since the Samples channel is closed only when the globalCtx is done, which happens after all metrics processing is finished, so it should be usable even if this VU "run context" is done. Yet some of our tests use the same context for both, e.g.:

https://github.com/loadimpact/k6/blob/420dd4161c280bdf0398af5e58f73c939057695a/api/v1/status_routes_test.go#L108

This explains why it works in real-world tests but not in the test suite. I'll see if I can fix the tests then.

This avoids outputting "GoError" as mentioned in #877 (comment)

This is definitely racy and causes hanging in tests, so it needs a different approach.

This isn't racy in real world tests since the Samples channel isn't closed until the global context is done.

codecov-io · 2020-12-15T12:20:19Z

Codecov Report

Merging #1769 (d52efc4) into master (420dd41) will increase coverage by 0.01%.
The diff coverage is 96.87%.

@@            Coverage Diff             @@
##           master    #1769      +/-   ##
==========================================
+ Coverage   71.47%   71.48%   +0.01%     
==========================================
  Files         178      178              
  Lines       13777    13829      +52     
==========================================
+ Hits         9847     9886      +39     
- Misses       3317     3329      +12     
- Partials      613      614       +1

Flag	Coverage Δ
ubuntu	`71.44% <96.87%> (+0.02%)`	⬆️
windows	`70.06% <96.87%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
js/runner.go	`80.62% <86.66%> (-0.49%)`	⬇️
core/engine.go	`91.08% <100.00%> (-1.88%)`	⬇️
js/common/util.go	`100.00% <100.00%> (ø)`
js/modules/k6/k6.go	`100.00% <100.00%> (ø)`
lib/errors.go	`91.30% <100.00%> (ø)`
lib/executor/helpers.go	`100.00% <100.00%> (+3.61%)`	⬆️
lib/testutils/minirunner/minirunner.go	`86.95% <100.00%> (+0.91%)`	⬆️
loader/loader.go	`79.28% <0.00%> (-3.58%)`	⬇️
lib/execution.go	`89.32% <0.00%> (-2.92%)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 420dd41...d52efc4. Read the comment docs.

imiric · 2020-12-15T16:37:01Z

core/engine.go

@@ -288,6 +288,14 @@ func (e *Engine) processMetrics(globalCtx context.Context, processMetricsAfterRu
 			if !e.NoThresholds {
 				e.processThresholds()
 			}
+			if iim, ok := e.Metrics[metrics.InterruptedIterations.Name]; ok {


This map lookup is racy according to the current TestSentReceivedMetrics failure. So I guess we'll need to rethink or abandon this hack...

I am pretty sure this breaks any output as you are only emitting the 1 to them ...

I guess it depends on how this metric is agreggated. If their system allows it they could factor it in with the iterations metric and get a percentage that way, which is kind of what we're doing here for the summary.

It makes no sense to me to emit the 0 value of this metric considering it's equivalent to iterations, and it would add a considerable amount of output data and affect the performance. A 5s/5VU test with an empty default and no sleep does 2115298 iterations on my machine and generates a 1.5 GB JSON file when emitting the 0 value. The same test without it manages to do 3203002 iterations and generate a 1.9 GB JSON file, but this is because it's doing ~50% more iterations.

Considering we already have complaints about outputting too much data and interest in #1321, I think we should rather work on #1321 before we consider adding another default metric.

Though we should probably rethink duplicating iterations...

How about instead of a new metric, we simply reused iterations and added an interrupted: "<cause>" tag to it? If the tag is omitted it's a complete iteration and if it's set then it was interrupted with the specific cause. WYT?

I agree with prioritizing #1321, but that will actually probably require quite the refactoring, IMO even for an MVP.
The reusage of the iterations was discussed (I think) and we decided against it based on the fact the primary usage for this will be to check that a percentage of the iterations aren't actually interrupted, which with the current thresholds isn't possible if the metric isn't rate... so :(.

I think that maybe we should just make another PR with the changes that are in this PR (the test fixes and the lib.Exception) and merge that to both have them and make this PR easier to rebase and then work on #1321 so we can merge this then.

The reusage of the iterations was discussed (I think) and we decided against it based on the fact the primary usage for this will be to check that a percentage of the iterations aren't actually interrupted, which with the current thresholds isn't possible if the metric isn't rate

Then maybe that should be a feature? I'll see what I can do.

mstoykov · 2020-12-16T11:50:36Z

js/common/bridge_test.go

@@ -348,7 +348,7 @@ func TestBind(t *testing.T) {
 		}},
 		{"Error", bridgeTestErrorType{}, func(t *testing.T, obj interface{}, rt *goja.Runtime) {
 			_, err := RunString(rt, `obj.error()`)
-			assert.Contains(t, err.Error(), "GoError: error")
+			assert.Contains(t, err.Error(), "error")


I think this can be moved to another PR that we can merge today and make this ... more focused ;)

Sure, will do.

mstoykov · 2020-12-16T11:54:04Z

lib/executor/helpers.go

+						tags["cause"] = er.Cause()
+					}
+					logger.Error(err.Error())
+				case fmt.Stringer:


This literally was worded as fmt.Stringer because I didn't want to add one more place where lib and lib/executor specifically depend on goja. So I think it is better if we don't actually start using it and also you need to fix the fact that now exceptions don't have "source=stacktrace" ;)

I do think now is a good time to actually add and error type (under lib), probably just an interface "Exception", that is implemented in the js package and wraps goja.Exception so that we don't use this directly.

Hhmm sure, I'll give that a try, thanks.

I managed to remove the goja dependency in d52efc4, but I'm not happy with how it turned out...

I wanted to treat all errors and exceptions consistently, avoid logging in runFn() and delegate that to getIterationRunner(), but this took an embarassing amount of trial/error and I'm still not sure all errors will be handled correctly. :-/

Some notes:

I didn't think making lib.Exception an interface was needed, as it's generic enough to have a single implementation.

This needs a lot more tests to ensure errors are rendered properly.

The source=stacktrace logger field doesn't make sense to me. If anything the source should be the same as the "cause" value, as "stacktrace" isn't really a "source".

Should all errors output the full stack trace? I was trying to keep it consistent with master, but maybe errors from fail() should only output a short single-line trace with the last stack frame.

This was motivated by wanting to remove the goja dependency from the lib/executor package, see #1769 (comment) The idea is for runFn to always return a lib.Exception error that can handle all JS errors and Go panics consistently. In practice I'm not sure if this hack is worth it as it might mess up handling of some errors...

na-- · 2022-06-15T16:09:25Z

We didn't merge this as it was because we realized we needed #1321 (or something like it) for this PR to not introduce performance problems. At this point it has so many merge conflicts that it'd be easier to start from scratch, so I'll close it to reduce noise.

Ivan Mirić added 2 commits December 14, 2020 15:11

Add InterruptedIterations metric

4448bbf

Add hack to avoid emitting 0 value for metric

e2b88ab

imiric requested review from mstoykov and na-- December 14, 2020 15:59

imiric commented Dec 14, 2020

View reviewed changes

Ivan Mirić added 7 commits December 15, 2020 13:11

Align MiniRunner state handling with js.Runner

645261a

Add metrics emission tests for InterruptedIterations

355ee1c

Treat errors from Throw() as plain goja values

82dae16

This avoids outputting "GoError" as mentioned in #877 (comment)

WIP Disable emitting InterruptedIterations on done context

ebd2d34

This is definitely racy and causes hanging in tests, so it needs a different approach.

Avoid cloning tags on each iteration

9b47ec8

Enable metric pushing when run context is done

b0b06c9

This isn't racy in real world tests since the Samples channel isn't closed until the global context is done.

Fix context usage in some tests, add metrics processing delay

c12f449

imiric force-pushed the feat/877-interr-iters-metric branch from 9697bab to c12f449 Compare December 15, 2020 12:12

imiric commented Dec 15, 2020

View reviewed changes

mstoykov reviewed Dec 16, 2020

View reviewed changes

imiric mentioned this pull request Dec 17, 2020

Treat errors from Throw() as plain goja values #1775

Merged

This was referenced Jan 19, 2021

Add JS summary handlers and some tests grafana/jslib.k6.io#20

Merged

k6 v0.21.0 on Windows: wrong iterations count #652

Closed

na-- removed their request for review December 2, 2021 14:41

na-- mentioned this pull request Jan 27, 2022

panics in k6 extension only skip an iteration instead of failing the test or iteration #2354

Closed

na-- closed this Jun 15, 2022

na-- deleted the feat/877-interr-iters-metric branch June 15, 2022 16:09

na-- mentioned this pull request Jun 15, 2022

Emit an errors metric and have a default error rate script abort threshold #877

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add InterruptedIterations metric #1769

[WIP] Add InterruptedIterations metric #1769

imiric commented Dec 14, 2020

imiric commented Dec 14, 2020

imiric Dec 14, 2020

imiric Dec 15, 2020

codecov-io commented Dec 15, 2020 •

edited

imiric Dec 15, 2020

mstoykov Dec 16, 2020

imiric Dec 16, 2020

mstoykov Dec 16, 2020

imiric Dec 16, 2020

mstoykov Dec 16, 2020

imiric Dec 16, 2020

imiric Dec 17, 2020

mstoykov Dec 16, 2020

imiric Dec 16, 2020

imiric Dec 22, 2020

na-- commented Jun 15, 2022

[WIP] Add InterruptedIterations metric #1769

[WIP] Add InterruptedIterations metric #1769

Conversation

imiric commented Dec 14, 2020

imiric commented Dec 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 15, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

na-- commented Jun 15, 2022

codecov-io commented Dec 15, 2020 •

edited