Pod removal time depends on number of pods #3960

petekinnecom · 2018-09-27T17:37:28Z

Hi,

We've noticed that the time it takes to rkt rm an exited pod (as well as rkt gc) increases depending on the number of pods on the machine. We are using rkt to run scheduled and one-off scripts on our machines and eventually accumulated enough exited pods that rming an individual pod would take upwards of 7 minutes. As the gc progressed, that number eventually dropped to 1 second. We've updated our scripts to immediately rm the pod once the script finished, but thought we'd file an issue here to see if the removal time growing as a function of the number of pods is expected or whether we would expect it to be constant with respect to the number of pods.

Environment

We're running Centos, but it appears the issue is also present on the coreos-vagrant image. Here's our centos version info:

rkt Version: 1.29.0
appc Version: 0.8.11
Go Version: go1.8.3
Go OS/Arch: linux/amd64
Features: -TPM +SDJOURNAL
--
Linux 3.10.0-693.el7.x86_64 x86_64
--
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
--
systemd 219
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 -SECCOMP +BLKID +ELFUTILS +KMOD +IDN

What did you do?

To reproduce the issue:
Clone the coreos vagrant image:

$ git clone https://github.com/coreos/coreos-vagrant.git
$ cd coreos-vagrant

Create and gc 100 pods:

$ vagrant destroy -f && vagrant up && vagrant ssh -c 'sudo rkt fetch --insecure-options=image --net=host docker://busybox:1.29 && for i in {1..100}; do sudo rkt run --insecure-options=image docker://busybox:1.29 --exec echo -- $i; done; time sudo rkt gc --grace-period 0'
[snip]
real    0m5.373s
user    0m0.519s
sys     0m1.068s

Create and gc 500 pods:

$ vagrant destroy -f && vagrant up && vagrant ssh -c 'sudo rkt fetch --insecure-options=image --net=host docker://busybox:1.29 && for i in {1..500}; do sudo rkt run --insecure-options=image docker://busybox:1.29 --exec echo -- $i; done; time sudo rkt gc --grace-period 0'
[snip]
real    1m35.220s
user    0m36.784s
sys     0m11.488s

At 100 pods, it took ~0.05 seconds per pod
At 500 pods, it took ~0.19 seconds per pod

Just wondering if that correlation is expected.

Thanks!

The text was updated successfully, but these errors were encountered:

SleepyBrett · 2018-12-17T18:50:10Z

We had a node where a kubelet (we run it in kubelet-wrapper in rkt) went into essentially a crashloop and created ~560 pods. It's taking upwards of 30 minutes per to GC them.

The GC service is also eating a whole core to do this.

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 76324 root      20   0 2236592  61512  13116 R  99.3   0.0  14105:07 rkt

Dec 15 00:00:15 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c0ca999c-efac-4e9e-a464-bf00b0d72d3a"
Dec 15 01:16:07 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c1a7b702-d8fc-44f5-8575-bb24beb8ac55"
Dec 15 02:36:21 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c3d248fe-3798-434f-93b1-9ed2fad97811"
Dec 15 03:54:10 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c4292b36-47fe-4cb7-adf8-802c469710d3"
Dec 15 05:08:15 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c43d72c7-0cb4-44ec-8a27-ed4bf650ecb9"
Dec 15 06:23:04 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c45120c4-4c48-4836-8d05-30f9fe5bffa1"
Dec 15 07:42:14 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c54f7c66-9d66-46fe-8791-d0cf12a386c2"
Dec 15 09:05:46 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c788cd33-3bb9-4bdd-b059-00763198d24f"
Dec 15 10:25:12 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c7f3b4aa-5654-49f8-a9f5-ee557a461626"
Dec 15 11:40:44 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "c9400ae6-9f8e-485f-8051-8eeed8208c4a"
Dec 15 13:03:38 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "ca7b2930-f639-4d25-9233-368ee4fc014c"
Dec 15 14:26:47 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "caac4e22-9b9e-42b8-bd0b-04465b203457"
Dec 15 15:46:37 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cb004db8-b542-4142-9ae2-1f5bcc761ed2"
Dec 15 17:02:46 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cb6c74bb-0a77-4048-924b-b2e0e60c6a94"
Dec 15 18:24:35 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cc3dbd6d-c787-4338-8a65-afb5bd8db0fb"
Dec 15 19:47:09 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cd0f0012-8363-4334-a113-099482edbf22"
Dec 15 21:06:00 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cd7bf2df-21c6-4fec-ae87-d746276eafc1"
Dec 15 22:23:34 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "cf696a25-4d93-479a-b9bf-f1355b37dc0f"
Dec 15 23:44:23 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "d035c90b-b77d-4f6b-a241-39ed25040daa"
Dec 16 01:09:11 ip-172-16-194-167 rkt[76324]: Garbage collecting pod "d1253798-d9f2-4e35-99b1-94a60dcfb7cf"

assuming it starts collecting one right after collecting the last that's a long time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod removal time depends on number of pods #3960

Pod removal time depends on number of pods #3960

petekinnecom commented Sep 27, 2018

SleepyBrett commented Dec 17, 2018 •

edited

Pod removal time depends on number of pods #3960

Pod removal time depends on number of pods #3960

Comments

petekinnecom commented Sep 27, 2018

SleepyBrett commented Dec 17, 2018 • edited

SleepyBrett commented Dec 17, 2018 •

edited