Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

looks like there is memory leak in k6 #1267

Closed
divfor opened this issue Dec 10, 2019 · 14 comments
Closed

looks like there is memory leak in k6 #1267

divfor opened this issue Dec 10, 2019 · 14 comments

Comments

@divfor
Copy link
Contributor

divfor commented Dec 10, 2019

image

top - 02:31:20 up 4 days, 28 min,  8 users,  load average: 2.73, 3.26, 3.18
Tasks: 450 total,   1 running, 237 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.8 us,  0.2 sy,  0.0 ni, 97.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 39468006+total, 31763724+free, 74487840 used,  2554968 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 35436342+avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  946 root      20   0 77.364g 0.068t  17620 S 101.3 18.4   3840:10 k6
@divfor
Copy link
Contributor Author

divfor commented Dec 10, 2019

In my overnight testing, the RES memory keeps increasing(see above), and k6 gets stuck per 30s - requests are sent out in each end of 30s - there is always a 100% core used by k6 in the rest time of 30s.
image

@cuonglm
Copy link
Contributor

cuonglm commented Dec 10, 2019

Hi @divfor, thanks for reporting.

Can you share your script that you use to test?

@cuonglm cuonglm added evaluation needed proposal needs to be validated or tested before fully implementing it in k6 performance labels Dec 10, 2019
@divfor
Copy link
Contributor Author

divfor commented Dec 10, 2019

import http from "k6/http";
import { check, sleep, fail } from "k6";
import { Trend, Rate, Counter, Gauge } from "k6/metrics";

export let TrendRTT = new Trend("RTT");
//export let RateContentOK = new Rate("Content OK");
//export let GaugeContentSize = new Gauge("ContentSize");
//export let CounterErrors = new Counter("Errors");

export let options = {
	hosts: {
		"s1.fred.com": "27.126.207.43",
		"s2.fred.com": "27.126.207.43",
		"s3.fred.com": "27.126.207.43",
		"s4.fred.com": "27.126.207.43",
		"s5.fred.com": "27.126.207.43",
		"s6.fred.com": "27.126.207.43",
		"ip6.fred.com": "2404:ae00:9::8",
	},
	vus: 4000,
        //duration: "10h",
	stages: [
		{duration: "1m", target: 4000},
		{duration: "24h", target: 4000},
		{duration: "10s", target: 10},
	],
	thresholds: {
	        //avg=23.89 min=0 med=21.22 max=70.96 p(90)=60.16 p(95)=65.33
		"RTT": [ "p(95)<30000", "p(90)<18000", "avg<10000", "med<10000", "min<3000"],
		//"Content OK": ["rate>0.95"],
		//"ContentSize": ["value<4000"],
		//"Errors": ["count<100"]
	},
	insecureSkipTLSVerify: true,
	noConnectionReuse: true,
	noVUConnectionReuse: false,
	discardResponseBodies: true,
	minIterationDuration: 1,
	setupTimeout: "30m",
	teardownTimeout: "10m",
	userAgent: "FredK6LoadTesting/1.0"
};

export default function() {
	let params = { timeout: 40000 };
	let reqs = [
		["GET", "http://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "http://s4.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s1.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s2.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s3.fred.com/string_10.txt", null, {timeout: 30000}],
		["GET", "https://s4.fred.com/string_10.txt", null, {timeout: 30000}],
	];
	let res = http.batch(reqs);
	for (var i = 0; i < reqs.length; i++) {
		check(res[i], {
			"is status 200": (r) => r.status === 200,
			//"url is correct": (r) => r.url === reqs[i]
    		});
		TrendRTT.add(res[i].timings.duration);
		//RateContentOK.add(contentOK);
		//GaugeContentSize.add(res.body.length);
		//CounterErrors.add(!contentOK);
	};
}

@cuonglm
Copy link
Contributor

cuonglm commented Dec 10, 2019

@divfor Thanks.

There's currently a bug in batch per host, we will fix it in #1259, and also improve the performance of http.batch.

I assume you use v0.25.1 for your test? If possible, can you try your test with fix in #1259.

@divfor
Copy link
Contributor Author

divfor commented Dec 10, 2019

ok...will try it.
btw, I tried v0.25.1 first and got this issue. then I move to --nic support version (https://github.com/ofauchon/k6/tree/feature/476_MultipleNics) to increase 4000 secondary IP addresses as source IP (./k6m run --nic eth1 http_mixed.js), but got same issue (when running for minutes, k6 got stuck and requests keep waving

@na--
Copy link
Member

na-- commented Dec 10, 2019

This is probably the same issue as #1068 / #763 - Trend metrics (either system or custom ones) constantly grow in memory. If you ran your script with --no-thresholds --no-summary and some external --output, k6 shouldn't leak memory. That said, 4000 VUs might be a bit too much for a single machine anyway, unless the machine is seriously beefy or you use the new --compatibility-mode (#1206 - in master, but not yet in an official release).

@na-- na-- added duplicate and removed evaluation needed proposal needs to be validated or tested before fully implementing it in k6 labels Dec 10, 2019
@divfor
Copy link
Contributor Author

divfor commented Dec 10, 2019

I have not tried #1259 but I use for(http.get()) instead of http.batch(), the issue still exists except the wave period gets shorter:

root@AXDDOS-AUTO-INST:~# while true; do netstat -antp | grep -c ESTA; sleep 1; done
1128
799
4132
1103
165
2120
646
3721
824
7
2058
423
4099
844
24
2049
425
3536
79
7
2076
368
3255
665
7
^C
root@AXDDOS-AUTO-INST:~#

@divfor
Copy link
Contributor Author

divfor commented Dec 11, 2019

This is probably the same issue as #1068 / #763 - Trend metrics (either system or custom ones) constantly grow in memory. If you ran your script with --no-thresholds --no-summary and some external --output, k6 shouldn't leak memory. That said, 4000 VUs might be a bit too much for a single machine anyway, unless the machine is seriously beefy or you use the new --compatibility-mode (#1206 - in master, but not yet in an official release).

I resolved k6's performance waving issue by doing two : 1, add "--no-thresholds" option; and 2, use for-loop http.get() instead of http.batch(). thanks for your information.

yes, my machine is powerful enough to take care 4000 VUS with only 50% CPU usage of all cores.
2 x (Intel Xeon Gold 6230 CPU @ 2.10GHz, 20 cores), 384 GB memory, 2 x 25Gbps NIC (Intel XXV710), 2 TB SSD.

however, looks like the threshold function is running on only one CPU core, when it reaches 100%, all requests will be stuck and the performance waves. for a powerful machine this is easy to happen. can we spreads threshold function running on multiple CPU cores?

@divfor
Copy link
Contributor Author

divfor commented Dec 11, 2019

here is k6's CPU usage when issue is resolved:
image

@na--
Copy link
Member

na-- commented Dec 11, 2019

@divfor, thanks for pointing this out - you are completely right, there's no good reason for thresholds to run only on a single core... 😞 I made a note of that in #1136 (comment).

@divfor
Copy link
Contributor Author

divfor commented Dec 17, 2019

@divfor Thanks.

There's currently a bug in batch per host, we will fix it in #1259, and also improve the performance of http.batch.

I assume you use v0.25.1 for your test? If possible, can you try your test with fix in #1259.

@cuonglm I tried the master version of #1259 (also re-added feature 476_MultipleNICs) , and got a crash from both k6 of two box during overnight run:

github.com/loadimpact/k6/lib/netext/httpext.MakeRequest(0x10c3fa0, 0xd7a704dc50, 0xd2e0378b00, 0xd5a92c7b60, 0xc1191baef8, 0x406975)
        /root/go/src/github.com/loadimpact/k6/lib/netext/httpext/request.go:328 +0x7d9
github.com/loadimpact/k6/lib/netext/httpext.MakeBatchRequests.func1(0xd2e0378b00, 0xdb6461e780)
        /root/go/src/github.com/loadimpact/k6/lib/netext/httpext/batch.go:63 +0xf4
github.com/loadimpact/k6/lib/netext/httpext.MakeBatchRequests.func2(0xd2b216f9f0, 0x7fa40000003c, 0xd7a7ff5980, 0xce71a38000, 0x3c, 0x3c)
        /root/go/src/github.com/loadimpact/k6/lib/netext/httpext/batch.go:78 +0x5a
created by github.com/loadimpact/k6/lib/netext/httpext.MakeBatchRequests
        /root/go/src/github.com/loadimpact/k6/lib/netext/httpext/batch.go:72 +0x199

I am reproducing it to get a complete error log..
btw, the code of go build is https://github.com/divfor/k6/tree/MultipleIPs

@na--
Copy link
Member

na-- commented Dec 17, 2019

😕 I'm hoping this isn't some newly introduced bug or data race from #1259... 🤞 We haven't hit something like that in any of our tests, so more information would be very much appreciated. As we mentioned in the other bug report (#1271), the end of the panic trace isn't as useful as the beginning - I have no idea what this error was - a data race, nil pointer, out of memory, etc...

@mstoykov
Copy link
Collaborator

mstoykov commented Nov 6, 2020

Hi @divfor have you been able to reproduce it to get a complete error log, because if not, I think the rest of the issues were either solved or they were already reported ...

@mstoykov
Copy link
Collaborator

Closing this as there have been no new information, and AFAIK no issues to fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants