You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 24, 2020. It is now read-only.
We've noticed that the time it takes to rkt rm an exited pod (as well as rkt gc) increases depending on the number of pods on the machine. We are using rkt to run scheduled and one-off scripts on our machines and eventually accumulated enough exited pods that rming an individual pod would take upwards of 7 minutes. As the gc progressed, that number eventually dropped to 1 second. We've updated our scripts to immediately rm the pod once the script finished, but thought we'd file an issue here to see if the removal time growing as a function of the number of pods is expected or whether we would expect it to be constant with respect to the number of pods.
Environment
We're running Centos, but it appears the issue is also present on the coreos-vagrant image. Here's our centos version info:
We had a node where a kubelet (we run it in kubelet-wrapper in rkt) went into essentially a crashloop and created ~560 pods. It's taking upwards of 30 minutes per to GC them.
The GC service is also eating a whole core to do this.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
76324 root 20 0 2236592 61512 13116 R 99.3 0.0 14105:07 rkt
Dec 15 00:00:15 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c0ca999c-efac-4e9e-a464-bf00b0d72d3a"
Dec 15 01:16:07 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c1a7b702-d8fc-44f5-8575-bb24beb8ac55"
Dec 15 02:36:21 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c3d248fe-3798-434f-93b1-9ed2fad97811"
Dec 15 03:54:10 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c4292b36-47fe-4cb7-adf8-802c469710d3"
Dec 15 05:08:15 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c43d72c7-0cb4-44ec-8a27-ed4bf650ecb9"
Dec 15 06:23:04 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c45120c4-4c48-4836-8d05-30f9fe5bffa1"
Dec 15 07:42:14 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c54f7c66-9d66-46fe-8791-d0cf12a386c2"
Dec 15 09:05:46 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c788cd33-3bb9-4bdd-b059-00763198d24f"
Dec 15 10:25:12 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c7f3b4aa-5654-49f8-a9f5-ee557a461626"
Dec 15 11:40:44 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c9400ae6-9f8e-485f-8051-8eeed8208c4a"
Dec 15 13:03:38 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "ca7b2930-f639-4d25-9233-368ee4fc014c"
Dec 15 14:26:47 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "caac4e22-9b9e-42b8-bd0b-04465b203457"
Dec 15 15:46:37 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cb004db8-b542-4142-9ae2-1f5bcc761ed2"
Dec 15 17:02:46 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cb6c74bb-0a77-4048-924b-b2e0e60c6a94"
Dec 15 18:24:35 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cc3dbd6d-c787-4338-8a65-afb5bd8db0fb"
Dec 15 19:47:09 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cd0f0012-8363-4334-a113-099482edbf22"
Dec 15 21:06:00 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cd7bf2df-21c6-4fec-ae87-d746276eafc1"
Dec 15 22:23:34 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cf696a25-4d93-479a-b9bf-f1355b37dc0f"
Dec 15 23:44:23 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "d035c90b-b77d-4f6b-a241-39ed25040daa"
Dec 16 01:09:11 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "d1253798-d9f2-4e35-99b1-94a60dcfb7cf"
assuming it starts collecting one right after collecting the last that's a long time.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
We've noticed that the time it takes to
rkt rm
an exited pod (as well asrkt gc
) increases depending on the number of pods on the machine. We are using rkt to run scheduled and one-off scripts on our machines and eventually accumulated enough exited pods thatrm
ing an individual pod would take upwards of 7 minutes. As thegc
progressed, that number eventually dropped to 1 second. We've updated our scripts to immediatelyrm
the pod once the script finished, but thought we'd file an issue here to see if the removal time growing as a function of the number of pods is expected or whether we would expect it to be constant with respect to the number of pods.Environment
We're running Centos, but it appears the issue is also present on the coreos-vagrant image. Here's our centos version info:
What did you do?
To reproduce the issue:
Clone the coreos vagrant image:
Create and gc 100 pods:
Create and gc 500 pods:
At 100 pods, it took ~0.05 seconds per pod
At 500 pods, it took ~0.19 seconds per pod
Just wondering if that correlation is expected.
Thanks!
The text was updated successfully, but these errors were encountered: