Track and improve k6 memory usage and performance #1167

na-- · 2019-09-20T09:00:30Z

We currently don't have a good idea how changes in k6 affects its performance. In particular, we knew that core-js used a lot of memory (#1036 (comment)), but we never systematically tracked that, so we lack any baseline to compare against if we ever want to update our currently bundled version. I did some very unscientific measurements with /usr/bin/time on my laptop, and I've shared the formatted results below. I used a lot of current and previous k6 versions to run a script with 200 VUs executing a million iterations of a default function that just makes a simple http.get() operation against a localhost Go server. Here are the results:

k6 version	Used Memory	RPS [1]	Times	Inputs and pagefaults
0.16.0	271700k	21739	0:47.80elapsed - 91.32user 21.64system (236%CPU)	0major+70139minor, 0inputs+0outputs
0.17.0	274496k	18185	0:57.17elapsed - 90.21user 23.45system (198%CPU)	0major+72619minor, 0inputs+0outputs
0.17.1	272884k	17857	0:58.07elapsed - 88.25user 24.53system (194%CPU)	0major+72161minor, 0inputs+0outputs
0.17.2	914512k [2]	17241	1:04.02elapsed - 104.05user 26.49system (203%CPU)	0major+314970minor, 0inputs+0outputs
0.18.0	905072k	15765	1:08.58elapsed - 109.52user 31.23system (205%CPU)	0major+305646minor, 0inputs+0outputs
0.18.2	904848k	15745	1:08.75elapsed - 110.31user 31.64system (206%CPU)	0major+307379minor, 0inputs+0outputs
0.19.0	987608k	14812	1:12.95elapsed - 118.03user 35.35system (210%CPU)	0major+339240minor, 0inputs+0outputs
0.20.0	1011588k	14419	1:15.08elapsed - 122.31user 33.90system (208%CPU)	103major+339912minor, 40232inputs+0outputs
0.21.0	1006776k	14170	1:16.20elapsed - 123.62user 33.52system (206%CPU)	121major+335868minor, 42240inputs+0outputs
0.21.1	988204k	14026	1:17.11elapsed - 124.65user 33.28system (204%CPU)	115major+334937minor, 41656inputs+0outputs
0.22.0	1915272k [3]	14910	1:17.75elapsed - 128.98user 33.49system (208%CPU)	0major+616641minor, 0inputs+0outputs
0.22.1	1928812k	14949	1:18.53elapsed - 129.59user 33.54system (207%CPU)	114major+622426minor, 29792inputs+0outputs
0.23.0	1940520k	14427	1:20.25elapsed - 131.79user 33.97system (206%CPU)	115major+627730minor, 29968inputs+0outputs
0.23.1	1950936k	15220	1:17.29elapsed - 128.74user 33.22system (209%CPU)	131major+615768minor, 34048inputs+0outputs
0.24.0	1904352k	14202	1:21.48elapsed - 133.21user 33.30system (204%CPU)	139major+570873minor, 35696inputs+0outputs
0.25.0	1096308k [4]	14050	1:16.88elapsed - 131.29user 32.66system (213%CPU)	145major+341927minor, 37144inputs+0outputs
0.25.1	1129756k	14882	1:12.81elapsed - 127.30user 31.59system (218%CPU)	10major+354609minor, 1832inputs+0outputs
latest master	1115896k	14760	1:13.01elapsed - 122.58user 31.42system (210%CPU)	0major+343543minor, 0inputs+0outputs
new executors	1168584k	14809	1:12.07elapsed - 132.81user 34.58system (232%CPU)	0major+349069minor, 0inputs+0outputs
new executors, no core-js [5]	346344k	16683	1:01.75elapsed - 101.36user 29.45system (211%CPU)	0major+63926minor, 0inputs+0outputs
new executors, no core-js, no babel [5]	337876k	17737	0:58.24elapsed - 101.83user 29.48system (225%CPU)	0major+62637minor, 0inputs+0outputs
new executors, core-js updated from 2.5.1 to 3.2.1	1295056k	15331	1:10.01elapsed - 129.55user 35.52system (235%CPU)	0major+385981minor, 0inputs+0outputs

[1] Not actually very representative, because these tests were run against a localhost server on my laptop, but still useful as a measure to track the overhead that the rest of k6 introduces. As a comparison, I created this simple script that just hammers a URL, and I got out ~42000 requests per second on my machine out of it.

I'm currently willing to chalk up most of the difference between this simple script and k6 to the overhead of JS execution, since when I also replaced the JS VUs with a lib.MiniRunnerVU so that each iteration didn't have any JS execution and just made a single request to the localhost server, I also got out ~40000 RPS out of k6.

And to repeat, none of that is representative or matters much in the real world, since when I tested a server on the local network, both k6 and the simple Go script achieved the same result of ~7000 RPS, likely because then the test was IO and not CPU bound. Probably the only use for tracking that metric is to monitor if it suddenly drops, since that would likely mean we introduced something that worsened the performance significantly.

[2] This memory increase was caused by the addition of core-js in every VU.

[3] Again, this was caused by a core-js and babel update

[4] The huge drop was caused by #1038 🎉 but again, core-js was responsible

[5] just commented out these lines, but useful to know, because we plan to allow the possibility of users choosing not to use core-js: #1049 (comment)

So yeah, in summary, we should have some sort of a CI task that monitors the memory consumption and performance of k6, so that we can spot sudden changes in either. Whether that's on every PR or once daily or something else needs more discussion and investigation. And we probably should avoid CircleCI and do this either in Azure Pipelines or GitHub Actions, whichever turns out to be better (#959). For the performance part, if we track it through the iterations per second metric, we'd need #355, which is currently in progress.

The text was updated successfully, but these errors were encountered:

ragnarlonn · 2019-09-20T09:44:27Z

Do you have any idea why the RPS numbers are dropping from 21k to 15-16k over time?

Btw, I just tested (with k6 0.25.1) if starting new repetitions involved any kind of large overhead. I compared a script doing a single http request per iteration with one doing a hundred:

export function default() {
  for (var i=0; i<100; i++) {
    http.get(URL);
  }
};

At least for k6 0.25.1, the difference doesn't seem to be big. Using the above script that loops through a hundred requests seems to result in 5-10% higher RPS numbers.

na-- · 2019-09-20T10:29:09Z

No, I currently don't have a good idea why the iterations per second (and thus, the RPS) count dropped from 21k to 15-16k. The biggest drop was between 0.16.0 and 0.17.0, and I tried to spot something suspicious in the diff, but there are just too many changes to tell that way, so we'd have to do it commit by commit... 😞

I initially thought it might be some scheduling change, since it turns out that k6 v0.16.0 was the last k6 version that still used iterations per VU. In v0.17.0 the iterations were switched to be shared across all VUs. But I tried the new per-vu-iterations executor from #1007, which I know doesn't have any unnecessary synchronization overhead, and I still got only 16.5-17.9k iterations per second (when I disabled core-js), so at best it's a slight improvement.

Regarding multiple requests per iteration, I also haven't tested that, I planned to explore it today. But I think http.batch() should have a better performance than a for loop with http.get().

na-- · 2019-09-20T12:30:10Z

Just to confirm @ragnarlonn's results, tweaking the script in various ways (for loops inside of the default function, http.batch(), reducing the number of VUs, bigger metricSamplesBufferSize, enabling discardResponseBodies) produces 5-15% improvements at best.

na-- · 2019-09-20T13:27:29Z

@mstoykov, I ran the same benchmark with an updated core-js library. I added it at the bottom of the table, as you can see, there's a ~600KB increase in memory usage per VU 😞

na-- · 2019-09-24T11:54:44Z

For posterity/comparison, here's an issue about a previous performance investigation: #540

hassanbakerrts · 2019-10-25T13:36:46Z

Just to add more statistics: I did a test with 1K VUs for 30 minutes and it needed 30 Gb to run. Otherwise, I have " dial tcp XX.XXX.XXX.XX:80: socket: too many open files "

na-- · 2019-10-25T14:05:50Z

@hassanbakerrts, can you give some more details about the script you ran? What was the size of the JS code you were executing (either in KB or lines of code)? Did you import some browserified library? Or open a big data file? I'm asking because 1k VUs taking 30GB of RAM means each VUs was ~30MB - very different from what we've observed with simple scripts. 😕 Also, which k6 version and OS did you use, and did you run the k6 docker container or a native k6 executable?

Regarding too many open files, you can increase that limit at the OS level: https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux

ragnarlonn · 2019-10-26T06:18:15Z

@na-- @hassanbakerrts Could it be results storage that take up the extra space? At least my tests have been 30-second ones, but if you run it for 30 minutes I guess it will collect a lot more results data.

na-- · 2019-10-27T17:14:42Z

@ragnarlonn, it could be, though usually that becomes a problem for longer running tests. Still, it mostly depends on the number of HTTP requests, so with 1k VUs and little sleep, it might be the cause... Here are the related issues: #763 and #1068

hassanbakerrts · 2019-10-29T13:02:45Z

@na-- @ragnarlonn I imported lodash during setup but not when VUs were running. My VUs code was to ask for four GET requests. There is a sleep(random(1,4)) where random returns seconds for each VU. We had to run it on ECU (AWS) cluster, I think the OS there is linux but not sure which one (those things were handled by DevOps team). I used docker image to run k6.

na-- · 2019-10-29T13:32:36Z

Ah, given that each k6 VU is its own separate runtime, just importing lodash would add significant memory overhead to all VUs, even if you only use in the setup() function. Since you imported in the global scope (also called init context in k6), each VU would load and evaluate that JS code, even if it doesn't subsequently use it. And unfortunately, you currently can't import lodash only in the setup function 😞

ragnarlonn · 2019-10-30T09:13:14Z

@na-- That sounds like a perhaps interesting feature to add: being able to import & use things in the setup() context only, which means it will not affect memory usage during the test. Would it be hard to implement? (although perhaps it would be necessary to stagger the execution of the setup function for all the VUs, as otherwise peak memory usage would not change. It would just not be constantly high anymore like it is if lodash is present all the time in each VU context)

na-- · 2019-10-30T09:36:02Z

@ragnarlonn, I though about this some yesterday, but there a big problem with having different imports/dependencies for setup() or anything else - cloud/distributed execution.

As you probably recall, one of the first things k6 cloud script.js does when you run it on your machine, is to execute the init context of your script once. This is done in order to get the exported script options, but also to build a .tar archive of your script and all of its dependencies (similar to k6 archive script.js). Crucially, it doesn't and can't execute setup() at that point - it's executed on a single machine after we start the test in our cloud. So, yes, it would be very hard to implement different imports for setup(), since in order to do it, we'd have to figure out some completely different way of defining setup than the current user-friendly export function setup() { /* ... */ }, and that way would have to allow explicit definition of dependencies without actually executing them.

Unless we can think of a clever and easy solution, I think we shouldn't spend a lot of time on this compared to the other memory-saving improvements we want to add to k6 - HDR histograms, reduction of core.js, goja improvements, shared read-only data files, streaming reading of big files, etc. The reason for that is that this seems like it won't deliver very big benefits, but would require substantial changes.

After all, I'd argue that most people who import hefty JS libraries don't use them only in their setup. And also, I forgot to mention yesterday, but I tried to run a script that imports lodash (this version) and running it with k6 run --paused --vus 1000 script.js takes only around 4GB of RAM. For comparison, running the exact same script without importing lodash takes only ~2.4GB... So the majority of the memory usage in the @hassanbakerrts's case wasn't caused by that library...

ragnarlonn · 2019-10-30T10:25:15Z

@na-- Oh yeah, I didn't think about the distributed execution...And I can't think of a simple and user-friendly way to do this either. The only thing that comes to mind is adding some special import statement to init() that allows you to specify which imports you want available in other contexts (i.e. export function init() { import setup::lodash.js }), which may not be very user-friendly/intuitive. Sounds like a wise decision to focus on the other improvements you mention.

na-- · 2019-10-30T10:37:28Z

The problem with having custom import/exports, or custom syntax of any kind, is that would be running afoul of ECMAScript parsers (specifically in our case, Babel). We could do it with an exported method that returns a simple JS string array with extra files that need to be included in any archive bundle, or have a property in the exported options, but that again doesn't solve all of the issues of how to then import one of these files only in the VU that's executing setup(). To be clear, none of the issues are unsolvable if we spend enough time and accept some sub-optimal UX (or rather, developer experience) at times, I just don't think this should be high on the optimization priority list right now.

na-- · 2019-11-14T08:51:29Z

Now that #1206 was merged in master, and somewhat prompted by https://community.k6.io/t/k6-cpu-use-during-a-test/303, I decided to check how performance differed for a simple script with the two compatibility modes. Here's the ultra-simple script I used:

var http = require("k6/http");
var sleep = require("k6").sleep;

exports.default = function () {
    http.get("https://httpbin.test.loadimpact.com/get");
    sleep(Math.random() * 2 + 3);
}

Running it with /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=extended produced:

100.54user 10.45system 1:53.56elapsed 97%CPU (0avgtext+0avgdata 9113532maxresident)k
0inputs+0outputs (0major+2869751minor)pagefaults 0swaps

and /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base produced:

21.59user 6.59system 1:01.27elapsed 46%CPU (0avgtext+0avgdata 1147084maxresident)k
0inputs+0outputs (0major+335435minor)pagefaults 0swaps

So, roughly an 8-fold memory consumption improvement (from ~9.11GB to ~1.14GB), much less CPU usage and a lot faster initialization.

The initialization time was actually the most noticeable difference, if you have RAM to spare. Running the same commands again, but with --paused, and immediately aborting k6 as soon as the status switched from init to paused (i.e. as soon as all VUs were initialized) produced very stark results. For /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=extended --paused:

72.90user 5.37system 0:54.90elapsed 142%CPU (0avgtext+0avgdata 7855780maxresident)k
0inputs+0outputs (0major+2502677minor)pagefaults 0swaps

And for /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base --paused:

1.80user 0.55system 0:02.28elapsed 103%CPU (0avgtext+0avgdata 483576maxresident)k
0inputs+0outputs (0major+147906minor)pagefaults 0swaps

Almost immediate initialization of 3500 distinct JS runtimes is quite impressive 😄 But yeah, for the context of this particular issue, we should have CI tasks that monitor both the extended (i.e. Babel + CoreJS) and base performance, since they're completely different beasts.

mstoykov · 2019-11-14T10:54:32Z

I decided to throw my test webpack packager for k6 for a test.
I did test both the simple script above (custom_main.js), the same script, but using es6 import/export syntax(main.js) and the webpack transform of it(dist/app.bundle.js) and here are the results:

$ /usr/bin/time k6 run --vus 3500 --duration 60s custom_main.js
http_reqs..................: 47070  784.498669/s
147.14user 21.68system 2:09.00elapsed 130%CPU (0avgtext+0avgdata 9650724maxresident)k
0inputs+0outputs (0major+3015214minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base custom_main.js
http_reqs..................: 47431  790.515908/s
50.68user 23.59system 1:01.78elapsed 120%CPU (0avgtext+0avgdata 1333832maxresident)k
168inputs+0outputs (4major+390294minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s main.js
http_reqs..................: 48057  800.949499/s
151.07user 20.54system 2:12.46elapsed 129%CPU (0avgtext+0avgdata 9279780maxresident)k
1416inputs+0outputs (2major+2909761minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s build/app.bundle.js
http_reqs..................: 45230  753.832637/s
142.08user 20.62system 2:11.20elapsed 124%CPU (0avgtext+0avgdata 9676844maxresident)k
63600inputs+0outputs (517major+3031793minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base build/app.bundle.js
http_reqs..................: 45808  763.461697/s
51.19user 17.97system 1:01.87elapsed 111%CPU (0avgtext+0avgdata 1432368maxresident)k
0inputs+0outputs (0major+417146minor)pagefaults 0swaps

Table:

test script	time elapsed	memory	rps
custom_main.js + corejs & babel	2:09.00	9650724	784
custom_main.js + base	1:01.78	1333832	790
main.js + corejs & babel	2:12.46	9279780	800
main.js + webpack + corejs & babel	2:11.20	9676844	753
main.js + webpack base	1:01.87	1432368	763

All the number fluctuate but they generally stay in the same ballpark and the RPS is apparently not affected in this particular case.

For me this confirms that the webpack setup I propose is now viable way to make k6 scripts much more performant when using --compatibility-mode=base

mstoykov · 2020-08-12T11:38:51Z

I tested the latest goja with some es6 support with and without corejs.
There is ... an increase in the binary which is definitely because of goja which also increased by more than 10k lines

29836   k6-noes6
31640   k6-es6
31552   k6-es6-nocorejs

The tested script was pretty simple:

import http from "k6/http";
import { sleep } from "k6";

export default function() {
    http.get("https://httpbin.test.loadimpact.com/get");
    sleep(Math.random() * 2 + 3);
}

With --vus 3500 --duration 60s
results:

k6	run time	kb memory per VU	CPU used	rps
k6-noes6	1:36.21	2871	270%	759
k6-es6	1:26.18	2673	232%	743
k6-es6-nocorejs	1:07.37	531	95%	734

The numbers move somewhat but are around those magnitudes. As the script always takes 1m05.2s (the sleep is apparently pretty unrandom ;) ), it looks like the no corejs version takes only 2 seconds to initialize while the others take 20 or 30 seconds respectively, which for 3500 VUs isn't bad, but isn't as instant.
The memory difference of 5x is probably the most significant benefit.
The CPU difference is probably (I haven't had time to extensively test this) because of the faster initialization

imiric · 2023-02-15T14:18:15Z

Hey all, an update on this old issue: we recently updated our "Running large tests" guide, and added benchmark results for v0.42.0 in our k6-benchmarks repository.

We plan to automate benchmarking soon:tm:, so that we can continually track k6 performance for all releases.

na-- added this to the v0.26.0 milestone Sep 20, 2019

na-- added ci high prio performance labels Sep 20, 2019

na-- modified the milestones: v0.26.0, v0.27.0 Oct 10, 2019

ppcano pinned this issue Oct 11, 2019

ppcano unpinned this issue Oct 14, 2019

imiric mentioned this issue Oct 17, 2019

Add JS compatibility mode option #1206

Merged

mstoykov mentioned this issue Nov 1, 2019

add benchmark for the cloud output #1224

Merged

mstoykov mentioned this issue Nov 28, 2019

Error loading certain minified JS libraries #1126

Closed

na-- mentioned this issue Dec 16, 2019

Fix issues with the linux package builds and the README #1276

Merged

mstoykov mentioned this issue May 4, 2020

Large-scale load testing article grafana/k6-docs#15

Merged

na-- modified the milestones: v0.27.0, v1.0.0 May 21, 2020

na-- mentioned this issue Jul 9, 2020

Get memory usage down #1543

Closed

mstoykov mentioned this issue Jul 16, 2020

Question: roadmap and some coordination questions dop251/goja#167

Open

mstoykov mentioned this issue Nov 9, 2020

Drop some corejs #1714

Closed

na-- modified the milestones: v1.0.0, TBD Nov 9, 2022

codebien removed this from the TBD milestone Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track and improve k6 memory usage and performance #1167

Track and improve k6 memory usage and performance #1167

na-- commented Sep 20, 2019 •

edited

ragnarlonn commented Sep 20, 2019

na-- commented Sep 20, 2019

na-- commented Sep 20, 2019

na-- commented Sep 20, 2019 •

edited

na-- commented Sep 24, 2019

hassanbakerrts commented Oct 25, 2019

na-- commented Oct 25, 2019

ragnarlonn commented Oct 26, 2019

na-- commented Oct 27, 2019

hassanbakerrts commented Oct 29, 2019

na-- commented Oct 29, 2019

ragnarlonn commented Oct 30, 2019

na-- commented Oct 30, 2019

ragnarlonn commented Oct 30, 2019

na-- commented Oct 30, 2019 •

edited

na-- commented Nov 14, 2019

mstoykov commented Nov 14, 2019 •

edited

mstoykov commented Aug 12, 2020 •

edited

imiric commented Feb 15, 2023

Track and improve k6 memory usage and performance #1167

Track and improve k6 memory usage and performance #1167

Comments

na-- commented Sep 20, 2019 • edited

ragnarlonn commented Sep 20, 2019

na-- commented Sep 20, 2019

na-- commented Sep 20, 2019

na-- commented Sep 20, 2019 • edited

na-- commented Sep 24, 2019

hassanbakerrts commented Oct 25, 2019

na-- commented Oct 25, 2019

ragnarlonn commented Oct 26, 2019

na-- commented Oct 27, 2019

hassanbakerrts commented Oct 29, 2019

na-- commented Oct 29, 2019

ragnarlonn commented Oct 30, 2019

na-- commented Oct 30, 2019

ragnarlonn commented Oct 30, 2019

na-- commented Oct 30, 2019 • edited

na-- commented Nov 14, 2019

mstoykov commented Nov 14, 2019 • edited

mstoykov commented Aug 12, 2020 • edited

imiric commented Feb 15, 2023

na-- commented Sep 20, 2019 •

edited

na-- commented Sep 20, 2019 •

edited

na-- commented Oct 30, 2019 •

edited

mstoykov commented Nov 14, 2019 •

edited

mstoykov commented Aug 12, 2020 •

edited