Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

Kubernetes liveness probes with docker exec fail randomly #2605

Open
hingstarne opened this issue Aug 3, 2019 · 4 comments
Open

Kubernetes liveness probes with docker exec fail randomly #2605

hingstarne opened this issue Aug 3, 2019 · 4 comments

Comments

@hingstarne
Copy link

hingstarne commented Aug 3, 2019

Issue Report

Bug

Kubernetes liveness probes fail randomly on this version of coreos. There is a bug regarding the used runc version ...

runc --version
runc version 1.0.0-rc5+dev.docker-18.06
commit: a592beb5bc4c4092b1b1bac971afed27687340c5
spec: 1.0.0

See here

user 5m 5m 1 user-sqsworker-55f4f9494f-glnm7.15b76be66de646eb Pod spec.containers{rails} Warning Unhealthy kubelet, ip-172-31-101-183.eu-west-1.compute.internal Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "process_linux.go:90: adding pid 18580 to cgroups caused \"failed to write 18580 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/podb72c8c56-b538-11e9-9f9c-06d6b3e699b6/1075b19a94bb045fdf72bfb0133bbbc721f3e04c43a99ded2d0f5eba6f34e7ca/cgroup.procs: invalid argument\"": unknown

This error happens randomly and we cannot provoke it but as it happens with our cni pods as well, thats why it is a big issue for us.

Container Linux Version

cat /etc/os-release 
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2135.6.0
VERSION_ID=2135.6.0
BUILD_ID=2019-07-30-0722
PRETTY_NAME="Container Linux by CoreOS 2135.6.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

aws ec2 instance m5.xlarge

Expected Behavior

Liveness probe fails only on error within the pod

Actual Behavior

Liveness probes fails randomly on setting cgroups.procs.

Reproduction Steps

  1. Use kubernetes 1.11.10 deployed by kops 1.11.1 on aws
  2. Wait and watch events in all namespaces for this error to occur

Other Information

@bgilbert
Copy link
Contributor

bgilbert commented Aug 3, 2019

Thanks for the report. Did this work properly in a previous version of Container Linux?

@hingstarne
Copy link
Author

hingstarne commented Aug 5, 2019

We use an immutable approach and disable the update-engine.

It started when we migrated to CoreOS-stable-2079.3.0-hvm and is still with CoreOS-stable-2135.5.0-hvm that we are using now.
Is there any best practice on how to replace runc on the system properly for testing?

@bgilbert
Copy link
Contributor

bgilbert commented Aug 5, 2019

It started when we migrated to CoreOS-stable-2079.3.0-hvm

Which version were you using before that?

@dotbalo
Copy link

dotbalo commented Jan 13, 2020

Hi, I had the same problem.

like this:

Events:
  Type     Reason     Age        From                 Message
  ----     ------     ----       ----                 -------
  Warning  Unhealthy  <invalid>  kubelet, k8s-node47  Readiness probe failed: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "process_linux.go:90: adding pid 9319 to cgroups caused \"failed to write 9319 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/kubepods/burstable/podb54e8b22-eb09-11e9-a0b0-fa163e623fb1/a76f45c56400a1e956b12d0977b2522cf4856d6744cf2b02075334014cdcb57c/cgroup.procs: invalid argument\"": unknown

kubernetes version:

[root@k8s-node47 ~]# kubelet --version
Kubernetes v1.13.5

CentOS info:

[root@k8s-node47 ~]# uname -a
Linux k8s-node47 4.18.9-1.el7.elrepo.x86_64 #1 SMP Thu Sep 20 09:04:54 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@k8s-node47 ~]# cat /etc/redhat-release 
CentOS Linux release 7.7.1908 (Core)

Docker info:

[root@k8s-node47 ~]# docker info
Containers: 38
 Running: 34
 Paused: 0
 Stopped: 4
Images: 19
Server Version: 18.06.3-ce
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: a592beb5bc4c4092b1b1bac971afed27687340c5
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.18.9-1.el7.elrepo.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.66GiB
Name: k8s-node47
ID: PW4G:TDGS:X3N3:5JBB:NOXS:R3F7:5BGF:NX3N:TCAI:PDWQ:WLQH:6ZR6
Docker Root Dir: /data/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false

Have you found this reason?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants