Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods do not start on upgrade from 1.24.8-rancher1-1 to 1.24.9-rancher1-1 #3160

Open
spyder007 opened this issue Feb 3, 2023 · 46 comments
Open

Comments

@spyder007
Copy link

RKE version:
1.4.2
Docker version: (docker version,docker info preferred)

> docker --version
Docker version 20.10.21, build baeda1f
> docker --info
docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
  compose: Docker Compose (Docker Inc., v2.15.1)
  scan: Docker Scan (Docker Inc., v0.23.0)

Server:
 Containers: 51
  Running: 7
  Paused: 0
  Stopped: 44
 Images: 40
 Server Version: 20.10.21
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 5b842e528e99d4d4c1686467debf2bd4b88ecd86
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.15.0-58-generic
 Operating System: Ubuntu 22.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.7GiB
 Name: k8-internal-000001
 ID: 6ATJ:2X3N:RNNU:MF66:TSKE:T5CU:4KCU:SOQJ:66VS:RPNQ:WI3V:FEUV
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Operating system and kernel: (cat /etc/os-release, uname -r preferred)

~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

~$ uname -r
5.15.0-58-generic

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO)
Hyper-V Ubuntu 22.04 nodes

cluster.yml file:

kubernetes_version: v1.24.8-rancher1-1
nodes:
    # k8-internal-000001 
    - address: 192.168.1.xx
      user: nodeuser
      role:
        - worker
        - etcd
        - controlplane
    # k8-internal-000002
    - address: 192.168.1.xx
      user: nodeuser
      role:
        - worker
        - etcd
    # k8-internal-000003
    - address: 192.168.1.xx
      user: nodeuser
      role:
        - worker
        - etcd
    # k8-internal-000004
    - address: 192.168.1.xx
      user: nodeuser
      role:
        - worker

# Cluster level SSH private key
# Used if no ssh information is set for the node
ssh_key_path: ~/.ssh/id_ed25519

services:
  kube-api:
    secrets_encryption_config:
      enabled: true
      
# Set the name of the Kubernetes cluster  
cluster_name: internal

# Specify network plugin-in (canal, calico, flannel, weave, or none)
network:
    plugin: flannel

ingress:
  provider: none

authentication:
  strategy: x509
  sans:
    - 'cp-internal.local.net'
    - 'cp-internal'
    - '127.0.0.1'
    - 'localhost'
    - 'kubernetes'
    - 'kubernetes.default'
    - 'kubernetes.default.svc'
    - 'kubernetes.default.svc.cluster.local'

dns:
  provider: coredns
  upstreamnameservers:
   - 192.168.1.xx

Steps to Reproduce:
I continue to see issues in trying to upgrade clusters, even small jumps.
Example:
I have two clusters running 1.24.8-rancher1-1 and I attempted to upgrade to 1.24.9-rancher1-1 using RKE 1.4.2. The rke up command runs successfully, but every pod basically goes into an error state with errors such as this:

unable to ensure pod container exists: failed to create container for [kubepods besteffort pod3544c64a-54a2-472a-9d70-39328bde7a0a] : unable to start unit "kubepods-besteffort-pod3544c64a_54a2_472a_9d70_39328bde7a0a.slice" (properties [{Name:Description Value:"libcontainer container kubepods-besteffort-pod3544c64a_54a2_472a_9d70_39328bde7a0a.slice"} {Name:Wants Value:["kubepods-besteffort.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods-besteffort.slice not found.

Results:

The only success I have had in the past is to provision new nodes and then move them, which i usually do in multiple rke up commands. If I sneak a k8 upgrade in when I'm provisioning new nodes and then get rid of the old ones, the cordon/drain process seems to restart everything...

This is really getting worrisome, as I'm unable to perform a simple upgrade of the cluster.

The only success I have had is to provision new nodes and then move them, which i usually do in multiple rke up commands. If I sneak a k8 upgrade in when I'm provisioning new nodes and then get rid of the old ones, the cordon/drain process seems

@spyder007
Copy link
Author

Update:
I tried this again.. same issue, but I then went in and ran a full update (apt upgrade -y) on my nodes, which took them all the way to docker version 23.0.0. And, on this cluster, it worked.

However, I then "reversed" the upgrade steps: I tried upgrading the nodes first with a full node upgrade first, then an k8 upgrade. Big mistake. The node upgrade stopped all running pods on the cluster, and rke up would not execute because it said the docker version was unsupported.

My current node provisioning script runs a full apt upgrade BEFORE installing docker using the install-docker script. It would seem that running an apt update/upgrade on the nodes after the fact puts an unsupported version of docker on the node. Which means one of my clusters is in a precarious position, running docker 23 with 1.24.9-rancher1-1.

I'm really not sure why this is as difficult as it is. Either I am doing something incredibly wrong or there is a lack of stability in these updates. For now, my safest bet is to provision new nodes before I do any upgrades of any kind. Some guidance on this would be much appreciated.

@sbonds
Copy link

sbonds commented Feb 8, 2023

I'm staying on Docker 20.10.22 until Rancher supports Docker 23.

Issue requesting support for Docker 23: rancher/rancher#40417

@spyder007
Copy link
Author

@sbonds I honestly didn't mean to upgrade to 23: the apt upgrade pulled it automatically. I typically provision my machines using the docker-install script on Rancher's site.

I seem to have a lot of issues when replacing nodes, similar to the above issues. There's no telling whether a small upgrade in Kubernetes will cause my pods to completely stop or not. I can't tell if a restart is required or not.

@MarUoneB
Copy link

it looked like i have similar issues, my containers where all in error, rebooting the vm's helped an let everthing work again. But i would like to upgrade without having to reboot my vm's. Because of running longhorn it takes some time before the whole cluster is up and running

@steve-todorov
Copy link

Having the same issue and also running docker 23. Was attempting to upgrade from 1.24.8 to 1.25 and encountered the same error:

unable to ensure pod container exists: failed to create container for [kubepods besteffort pod549daf46-1133-4c4f-a0c8-7cef48d8b35d] : unable to start unit "kubepods-besteffort-pod549daf46_1133_4c4f_a0c8_7cef48d8b35d.slice" (properties [{Name:Description Value:"libcontainer container kubepods-besteffort-pod549daf46_1133_4c4f_a0c8_7cef48d8b35d.slice"} {Name:Wants Value:["kubepods-besteffort.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods-besteffort.slice not found.

@priitr-ent
Copy link

Same issue with v1.24.10-rancher4-1, any kubelet restart causes errors in kubelet log
E0426 12:38:36.926506 360191 qos_container_manager_linux.go:374] "Failed to update QoS cgroup configuration" err="unable to set unit properties: Unit kubepods.slice not found." I0426 12:38:36.926541 360191 kubelet.go:1658] "Failed to update QoS cgroups while syncing pod" pod="argocd/argocd-redis-68bc48958b-sdvn7" err="unable to set unit properties: Unit kubepods.slice not found." E0426 12:38:36.930385 360191 pod_workers.go:965] "Error syncing pod, skipping" err="failed to ensure that the pod: 828ad3df-a0df-46a0-bee8-53576030ff50 cgroups exist and are correctly applied: failed to create container for [kubepods besteffort pod828ad3df-a0df-46a0-bee8-53576030ff50] : unable to start unit \"kubepods-besteffort-pod828ad3df_a0df_46a0_bee8_53576030ff50.slice\" (properties [{Name:Description Value:\"libcontainer container kubepods-besteffort-pod828ad3df_a0df_46a0_bee8_53576030ff50.slice\"} {Name:Wants Value:[\"kubepods-besteffort.slice\"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods-besteffort.slice not found." pod="argocd/argocd-redis-68bc48958b-sdvn7" podUID=828ad3df-a0df-46a0-bee8-53576030ff50 E0426 12:38:37.025963 360191 summary_sys_containers.go:48] "Failed to get system container stats" err="failed to get cgroup stats for \"/kubepods.slice\": failed to get container info for \"/kubepods.slice\": unknown container \"/kubepods.slice\"" containerName="/kubepods.slice"
Issue is present using both docker-ce 20.10.24 and docker-ce 23.0.4
systemctl restart docker is a workaround.

@steve-todorov
Copy link

@priitr-ent have you tried to restart the entire machine? What we ended up doing is run the upgrade, this message pops up and fails the upgrade. Then we rebooted our cluster machines (all of them) and triggered the rke upgrade again. The upgrade was then successful. Unsure of the cause.

@priitr-ent
Copy link

priitr-ent commented Apr 28, 2023

@priitr-ent have you tried to restart the entire machine?

Yeah, that also remediates the error.
Interestingly enough, this does not happen on all nodes (during the upgrade/kubelet configuration change) but often enough to fail all rolling cluster upgrades.
The created cgroups seem to be ok when I do docker stop kubelet / docker start kubelet on host and kubelet has host's /sys RW bind mounted so it's view on cgroups should survive restarts just fine.
However, RKE itself first renames the running version of kubelet container to old-kubelet, starts new kubelet and then removes old-kubelet container. Is it possible that the removal of old-kubelet causes removal of cgroups created by it before new version can see those? Some kind of timing issue?
Edit: even docker stop kubelet/docker start kubelet seems to fix it after failed upgrade run.

@electrical
Copy link

We've been hitting this issue as well. Either with upgrading from 1.23.x to 1.24.x or from 1.24.x to 1.24.x
Restart of kubelet seems to resolve the issue indeed.

@kinarashah
Copy link
Member

@priitr-ent @electrical If possible, could you post docker info and cat /etc/os-release?

@electrical
Copy link

@priitr-ent
Copy link

@kinarashah https://gist.github.com/priitr-ent/8c0129d92ef081bea4403518640a32ec
I'm currently running with ignore_docker_version: true
However, when the issue first appeared I downgraded to 20.10.24 and verified that the behaviour remains the same.

@kgrando
Copy link

kgrando commented Jul 21, 2023

I see the same issue on a cluster running with Azure VMs which we use to test, on the Rancher cluster itself running on vSphare and a downstream cluster of it when I upgrade to Kubernetes 1.26.6. If I go back to version 1.24.8 the issue disappears.
I tested with Docker version 20.10.21, 23.0.6 and 24.0.4 and get the same results.
We are using Debian 11 on all nodes.

I get this always if the kubelet container restarts on Kubernetes 1.26.6.
If I do docker restart kubelet, all pods are restarting, I think this is related to this issue: #3280
Actually, with Kubernetes version 1.24.8, all pods are restarting and are running afterward. If I upgrade Kubernetes / on my other Clusters, all pods are restarting as well, but come not up again. In Rancher, all are in Error state.
Mostly I can "fix" it if I restart the kubelet pod again, sometimes I need to restart the docker service, but either way this cannot be the solution.
docker logs kubelet gives a lot of this output:

E0721 05:39:24.273198 2994965 qos_container_manager_linux.go:374] "Failed to update QoS cgroup configuration" err="unable to set unit properties: Unit kubepods.slice not found."

I0721 05:39:24.273230 2994965 kubelet.go:1767] "Failed to update QoS cgroups while syncing pod" pod="kube-system/coredns-66b64c55d4-x7clr" err="unable to set unit properties: Unit kubepods.slice not found."

E0721 05:39:24.274904 2994965 pod_workers.go:965] "Error syncing pod, skipping" err="failed to ensure that the pod: f6a31cb4-fd35-4dce-9747-01c12edd48b2 cgroups exist and are correctly applied: failed to create container for [kubepods burstable podf6a31cb4-fd35-4dce-9747-01c12edd48b2] : unable to start unit "kubepods-burstable-podf6a31cb4_fd35_4dce_9747_01c12edd48b2.slice" (properties [{Name:Description Value:"libcontainer container kubepods-burstable-podf6a31cb4_fd35_4dce_9747_01c12edd48b2.slice"} {Name:Wants Value:["kubepods-burstable.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods-burstable.slice not found." pod="kube-system/coredns-66b64c55d4-x7clr" podUID=f6a31cb4-fd35-4dce-9747-01c12edd48b2

E0721 05:39:24.275317 2994965 qos_container_manager_linux.go:374] "Failed to update QoS cgroup configuration" err="unable to set unit properties: Unit kubepods.slice not found."

I0721 05:39:24.275346 2994965 kubelet.go:1767] "Failed to update QoS cgroups while syncing pod" pod="cattle-fleet-system/fleet-controller-56786984f4-gz58x" err="unable to set unit properties: Unit kubepods.slice not found."

E0721 05:39:24.277832 2994965 pod_workers.go:965] "Error syncing pod, skipping" err="failed to ensure that the pod: 40086747-8be0-46ce-9663-e858e6efe4cb cgroups exist and are correctly applied: failed to create container for [kubepods besteffort pod40086747-8be0-46ce-9663-e858e6efe4cb] : unable to start unit "kubepods-besteffort-pod40086747_8be0_46ce_9663_e858e6efe4cb.slice" (properties [{Name:Description Value:"libcontainer container kubepods-besteffort-pod40086747_8be0_46ce_9663_e858e6efe4cb.slice"} {Name:Wants Value:["kubepods-besteffort.slice"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods-besteffort.slice not found." pod="cattle-fleet-system/fleet-controller-56786984f4-gz58x" podUID=40086747-8be0-46ce-9663-e858e6efe4cb

@electrical
Copy link

@kinarashah https://gist.github.com/electrical/76f8567b1243320829704729e7b40da7

@kinarashah any update after supplying the logs?
I'm a bit worried that the thread has been so quiet while this seems to be a major issue.

@itplayer-de
Copy link

Same for me (of course 3280 to).

RKE: 1.4.6 or 1.4.8

Ubuntu: 22.04.2

Kernel: Linux 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Client: Docker Engine - Community
Version: 23.0.1
API version: 1.41 (downgraded from 1.42)
Go version: go1.19.5
Git commit: a5ee5b1
Built: Thu Feb 9 19:46:56 2023
OS/Arch: linux/amd64
Context: default

Server: Docker Engine - Community
Engine:
Version: 20.10.23
API version: 1.41 (minimum version 1.12)
Go version: go1.18.10
Git commit: 6051f14
Built: Thu Jan 19 17:34:14 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.19
GitCommit: 1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0

Workaround that works, switch back to cgrougv1 via Kernel parameter: systemd.unified_cgroup_hierarchy=0

This is surprising, since Suse, in the support matrix, lists Ubuntu with the Docker version for RKE 1.4.6 as functional.

@superseb
Copy link
Contributor

superseb commented Oct 3, 2023

My basic ways to reproduce this have not lead to success, if there is anyone who can reproduce this with a stock cloud image, can you please share which cloud and which image so I can use it? I have used Ubuntu 22.04 on AWS and Debian 11 on DigitalOcean without success. As not everyone is hitting this issue, there needs to be some specific software version/configuration/deployment present that causes this issue.

If you don't have a stock cloud image to reproduce, please share the following outputs:

  • cat /proc/self/mountinfo | grep cgroup
  • systemd --version
  • containerd --version
  • runc --version

If you can only reproduce on your own infra, maybe you can add verbose logging to the kubelet (-v 9) to extract more info on what is happening exactly.

@TomyLobo
Copy link

TomyLobo commented Oct 10, 2023

Thanks for the offer, @superseb.
I also have this issue after upgrading from 1.24.9 to 1.24.17.

I'm using a self-built image based on https://github.com/David-VTUK/Rancher-Packer
I had to make a bunch of modifications to make it work though.
I can enumerate them if necessary.

Here are the version numbers you requested:

  • 34 24 0:29 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:9 - cgroup2 cgroup2 rw,nsdelegate,memory_recursiveprot
  • systemd 249 (249.11-0ubuntu3.7)
    +PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
  • containerd containerd.io 1.6.18 2456e983eb9e37e47538f59ea18f2043c9a73640
  • runc version 1.1.4
    commit: v1.1.4-0-g5fd4c4d
    spec: 1.0.2-dev
    go: go1.19.6
    libseccomp: 2.5.3

What's really odd is that I see this in the rancher provisioning log:

[INFO ] Initiating Kubernetes cluster
[INFO ] Successfully Deployed state file at [management-state/rke/rke-548008336/cluster.rkestate]
[INFO ] Building Kubernetes cluster
[INFO ] [dialer] Setup tunnel for host [<IP of the control/etcd node>]
[ERROR] Failed to set up SSH tunneling for host [<IP of the control/etcd node>]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": can not build dialer to [c-hppzq:m-t4p4g]
[ERROR] Removing host [<IP of the control/etcd node>] from node lists

I don't see any traffic between the VM that hosts the rancher docker-compose environment and the control/etcd node.
I can establish SSH connections between the machines fine.

btw, I also posted on the Rancher Users slack instance before finding this issue:
https://rancher-users.slack.com/archives/C3ASABBD1/p1696943943581939

@superseb
Copy link
Contributor

@TomyLobo if still available, can you upload the full rancher (provisioning) logs? It can't take any action on the node if it can't connect so I suspect there are more logs from before that actually did something. In the Slack message you mention that there was a network issue during the upgrade, did it only happen during that time or can you reproduce it from scratch?

@km-bh
Copy link

km-bh commented Oct 23, 2023

I've been encountering this issue as well for all upgrades when using Rancher 2.7.6. However, there's a caveat: this only seems to be a problem with clusters with more than 3-6 nodes. I've created and destroyed a vsphere-type cluster multiple times: the initial build always succeed without issue, but when upgrading, some part of that process invariably breaks it. I can see that the kubernetes-related slices are removed from the node OS (they do not exist in /run/systemd/transient), but a restart of the kubelet container is enough to fix it.

Unfortunately, this can happen multiple times to a node during the upgrade process, especially when it fails and retries at a later time once the node is healthy again. What's worse is that the nodes aren't affected every time; only about 80% of it.

From what I can see, only one thing avoids this issue: using the cgroupfs driver instead of the systemd one. I can replicate this issue 100% of the time with a 3-control/3-etcd/6-worker cluster on Ubuntu 22.04, no matter which version of docker is used.

@TomyLobo
Copy link

This happened to me with a 3-node and a 4-node cluster, so it's definitely also happening with smaller clusters.

@TomyLobo if still available, can you upload the full rancher (provisioning) logs? It can't take any action on the node if it can't connect so I suspect there are more logs from before that actually did something. In the Slack message you mention that there was a network issue during the upgrade, did it only happen during that time or can you reproduce it from scratch?

I might have mentioned this before, the 2nd cluster upgrade was not during a network issue.

About logs:
I think I left the cluster on "upgrading", so I should be able to get them.
Which logs do you want?
The provisioning logs I can get via the cluster management interface didn't look very interesting.

@TomyLobo
Copy link

TomyLobo commented Nov 5, 2023

This also happens if I just change anything in the cluster config, say the vsphere CSI password.
docker restart kubelet fixes it realiably, so far, but I have to do that on each node.

@superseb
Copy link
Contributor

superseb commented Nov 6, 2023

Any logs relating to this issue help as there is no solid lead yet to what is causing this. If you can reproduce it so reliably, can you also get the logs when in debug or even trace (watch for sensitive info being logged in trace)?

Did you see the SSH tunneling error again? Also please include the output of docker info for completeness.

@bananflugan
Copy link

I have just hit this for the first time. Upgraded our DEV and TEST clusters yesterday from 1.24.10 to 1.26.8. Both clusters have 8 worker nodes where 6 of them are Ubuntu 20.04 and 2 Ubuntu 22.04.
The interesting thing is that only the 22.04 nodes got this problem, the rest were fine.

The syslog on the 22.04 nodes show many rows like this.

Nov 7 13:48:38 k8s-staging-w6 systemd[1]: kubepods-burstable-pod3d6197a0_e1e3_4d48_baab_341f10d00adf.slice: Failed to open /run/systemd/transient/kubepods-burstable-pod3d6197a0_e1e3_4d48_baab_341f10d00adf.slice: No such file or directory

@kgrando
Copy link

kgrando commented Nov 8, 2023

I saw the same issue in our cluster a while ago. After some investigation we saw the problem with the change from cgroupv1 to cgorupv2, change from Debian 10 to 11 if I have it in mind right. --> https://kubernetes.io/docs/concepts/architecture/cgroups/
There was an issue in the entrypoint script from the kubelet container from rke. It should be fixed now.
#3280

I was able to change the script (not persistent) like in this PR, then the issue was gone.
rancher/rke-tools#164

Additionally, I run into some related issues where systemd (cgroupv2) was not able to create the cgroups right after the first boot. It was necessary to reboot the vm first. In this case, no container is able to run, also not kubelet.
I have not much information about it, on Azure I was not able to recreate the issue, in our environment only a short time, we started the template again and shut it down, then the problem was gone.
If you check the cgroup controllers --> cat /sys/fs/cgroup/cgroup.controllers
the output should be like this
cpuset cpu io memory hugetlb pids rdma
In our case 'CPU' was missing.

@superseb
Copy link
Contributor

@bananflugan @kgrando Any information you have on reproducing helps here as we cannot reproduce it from just using Ubuntu 22.04.

@bananflugan Are the Ubuntu 22.04 nodes using cgroupsv2? And do you have any additional info on the way the nodes were added/updated? Was it a new image with Ubuntu 22.04, was it updated from Ubuntu 20.04? Was it updated while in the cluster? Did it include a Docker update as well? From what to what version? Was the node rebooted after the update(s)?

@kgrando Restart of all pods is different than pods not starting, are you seeing pods not starting after upgrade?

@kgrando
Copy link

kgrando commented Nov 16, 2023

@superseb Sorry, it's a while ago, I have not any more all details in mind. Definitely we had the situation where only the kubelet was running and all other pods not because of this cgroup error.
I think the bug is generating two errors... One is, if you restart the kubelet, all pods are restarting too. The other thing is, after the update of the kubelet, all kubernetes scheduled pods come in a state they are running, but not working, then all of them logging this cgroup error.
To fix it, we had to restart the kubelet again. Maybe it has to do with the starting sequence after a cluster update?

@bananflugan
Copy link

@superseb

cat /proc/self/mountinfo | grep cgroup 34 24 0:29 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:9 - cgroup2 cgroup2 rw,nsdelegate,memory_recursiveprot 20554 28 0:29 /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6d651a34_fbbe_4620_820b_704d43f5bc55.slice/docker-82f499cfdfd9522adb9c4e95b525ed7d984740c7914df44fc7851aa4288024a5.scope /run/calico/cgroup rw,relatime shared:5415 - cgroup2 none rw,nsdelegate,memory_recursiveprot

My 22.04 nodes were added by using stock ubuntu server iso. The only non-default packages installed on them are docker, realmd for joining to AD and zabbix-agent.

Except for the k8s-upgrade nothing else was changed on any host. When the upgrade was done i noticed that every container on the 22.04 hosts were not working while the rest were fine, running 20.04. I then did a reboot of the 22.04 servers and all was fine after that.

The docker version im running in all clusters atm is 23.0.6
K8S was upgraded from 1.24.10 -> 1.26.8

@TomyLobo
Copy link

My nodes are ubuntu 22.04 as well with no relevant extras either.
Since you say that 20.04 upgraded fine, maybe the issue is caused by some difference between 20.04 and 22.04?

@superseb superseb removed their assignment Feb 1, 2024
@steffeneichler
Copy link

steffeneichler commented Feb 8, 2024

Hi,

any news on that? We are facing this problem on our clusters too.

kubernetes_version: v1.25.13-rancher1-1
Kernel Version: 6.2.0-1017-aws
Operating System: Ubuntu 22.04.3 LTS
Docker Version: 23.0.6

If you need some logs, I could provide logs collected with log collector script.
https://www.suse.com/support/kb/doc/?id=000020191

from kubelet log

E0208 11:05:30.650902 4086838 qos_container_manager_linux.go:374] "Failed to update QoS cgroup configuration" err="unable to set unit properties: Unit kubepods.slice not found."
I0208 11:05:30.650923 4086838 kubelet.go:1684] "Failed to update QoS cgroups while syncing pod" pod="me-vanaheim-default/victoria-metrics-agent-scg-7479b644b7-6szfv" err="unable to set unit properties: Unit kubepods.slice not found."
E0208 11:05:30.651795 4086838 pod_workers.go:965] "Error syncing pod, skipping" err="failed to ensure that the pod: 77cbb602-08ca-460e-ae7f-5c68005bf2b3 cgroups exist and are correctly applied: failed to create container for [kubepods pod77cbb602-08ca-460e-ae7f-5c68005bf2b3] : unable to start unit \"kubepods-pod77cbb602_08ca_460e_ae7f_5c68005bf2b3.slice\" (properties [{Name:Description Value:\"libcontainer container kubepods-pod77cbb602_08ca_460e_ae7f_5c68005bf2b3.slice\"} {Name:Wants Value:[\"kubepods.slice\"]} {Name:MemoryAccounting Value:true} {Name:CPUAccounting Value:true} {Name:IOAccounting Value:true} {Name:TasksAccounting Value:true} {Name:DefaultDependencies Value:false}]): Unit kubepods.slice not found." pod="me-vanaheim-default/victoria-metrics-agent-scg-7479b644b7-6szfv" podUID=77cbb602-08ca-460e-ae7f-5c68005bf2b3
E0208 11:05:30.713580 4086838 controller.go:187] failed to update lease, error: Put "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/worker-ip-10-175-33-213-eu-central-1b?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Kind regards
Steffen

@steffeneichler
Copy link

Only way to fix this problem for us (for the moment), was to reboot the affected node. Problem occurs during rke run on all nodes where kubelet has to be restarted. etcd, controlplane and worker role is affected.

rke run on downstream stuck at kubelet healthcheck on affected node.

@kinarashah
Copy link
Member

@steffeneichler Could you upload the logs? You could also DM me on slack or email it to kinara.shah@suse.com if that's preferable. I want to compare the before and after kubelet process args. I haven't been able to reproduce this myself, so I don't have a good root cause for this issue yet.

Seems like there are 2 issues here, kubelet restart is restarting all user pods (which I see) but I don't see the kubelet error for those pods. If anyone else has a sample workload yaml + cluster yaml they're running, that'd be helpful as well.

Versions I have checked:

  • RKE v1.4.8 - v1.4.13
  • Ubuntu 22.04, cgroup2fs
  • Docker 20.10.24, runc version: v1.1.12-0-g51d5e94

@jhoblitt
Copy link

I've seen this as well with rke 1.4.6 updating 1.23.7 -> 1.25.9 with both dockerd 23.0.6 and 24.0.9. rke will ultimately fail with

INFO[0126] [addons] Executing deploy job rke-network-plugin 
FATA[0178] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system 

at which point rebooting the nodes seems to be fix. Rerunning rke up will complete normally. The journal on all nodes is spammed with messages similar to

Feb 14 20:39:31 XXX systemd[1]: kubepods-besteffort.slice: Failed to open /run/systemd/transient/kubepods-besteffort.slice: No such file or directory
Feb 14 20:39:31 XXX systemd[1]: kubepods-besteffort-pod4b180a53_b12b_4944_ac11_6168c7b58064.slice: Failed to open /run/systemd/transient/kubepods-besteffort-pod4b180a53_b12b_4944_ac11_6168c7b58064.slice: No such file or directory

until the host is rebooted.

@steffeneichler
Copy link

@kinarashah

@steffeneichler Could you upload the logs? You could also DM me on slack or email it to kinara.shah@suse.com if that's preferable. I want to compare the before and after kubelet process args. I haven't been able to reproduce this myself, so I don't have a good root cause for this issue yet.

Seems like there are 2 issues here, kubelet restart is restarting all user pods (which I see) but I don't see the kubelet error for those pods. If anyone else has a sample workload yaml + cluster yaml they're running, that'd be helpful as well.

Versions I have checked:

* RKE v1.4.8 - v1.4.13

* Ubuntu 22.04, cgroup2fs

* Docker 20.10.24, runc version: v1.1.12-0-g51d5e94

We updated last week 7 clusters from k8s 1.25.13 to 1.26.11. On all these clusters Ubuntu 22.04 is installed. "Unfortunately" the problem didn't occurred again.
I will create logs if this problem happens next time.
During my recherche I also found information that this problem could be also caused by rke. But I lost the link to the article.
How can I find out which rke version is used to update the downstream clusters?
We used v1.5.3 to update the upstream. If this version is also used for downstreams, maybe the problem is fixed there.

@kinarashah
Copy link
Member

@steffeneichler Thank you for trying! @shalomjacob has also been trying to reproduce but couldn't. I'll look at diff between v1.4.x and v1.5.x to see if something stands out. Rancher embeds the RKE version so it'll depend on the Rancher version. Are you using Rancher v2.8.x? v2.8.x versions will use RKE v1.5.x.

@kinarashah
Copy link
Member

@jhoblitt Any chance you collected logs or kubelet container args? Are you able to reproduce this consistently? If so, any logs you collected would be helpful.

@jhoblitt
Copy link

@kinarashah I have a cluster update scheduled for Monday. I can collect any logs your interested in then.

@huv95
Copy link

huv95 commented Mar 5, 2024

I've seen this as well with rke 1.4.6 updating 1.23.7 -> 1.25.9 with both dockerd 23.0.6 and 24.0.9. rke will ultimately fail with

INFO[0126] [addons] Executing deploy job rke-network-plugin 
FATA[0178] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system 

at which point rebooting the nodes seems to be fix. Rerunning rke up will complete normally. The journal on all nodes is spammed with messages similar to

Feb 14 20:39:31 XXX systemd[1]: kubepods-besteffort.slice: Failed to open /run/systemd/transient/kubepods-besteffort.slice: No such file or directory
Feb 14 20:39:31 XXX systemd[1]: kubepods-besteffort-pod4b180a53_b12b_4944_ac11_6168c7b58064.slice: Failed to open /run/systemd/transient/kubepods-besteffort-pod4b180a53_b12b_4944_ac11_6168c7b58064.slice: No such file or directory

until the host is rebooted.

Which CNI are you using? cilium or calico?

@huv95
Copy link

huv95 commented Mar 5, 2024

I've seen this as well with rke 1.4.6 updating 1.23.7 -> 1.25.9 with both dockerd 23.0.6 and 24.0.9. rke will ultimately fail with

INFO[0126] [addons] Executing deploy job rke-network-plugin 
FATA[0178] Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system 

at which point rebooting the nodes seems to be fix. Rerunning rke up will complete normally. The journal on all nodes is spammed with messages similar to

Feb 14 20:39:31 XXX systemd[1]: kubepods-besteffort.slice: Failed to open /run/systemd/transient/kubepods-besteffort.slice: No such file or directory
Feb 14 20:39:31 XXX systemd[1]: kubepods-besteffort-pod4b180a53_b12b_4944_ac11_6168c7b58064.slice: Failed to open /run/systemd/transient/kubepods-besteffort-pod4b180a53_b12b_4944_ac11_6168c7b58064.slice: No such file or directory

until the host is rebooted.

@jhoblitt I got the same error as you. But it happens in some nodes, not all.
I noticed that service with a QoS Class is BestEffort will encounter this error, while Burstable is not.
Searching in the node where the problem occurred, I found they had no files
/run/systemd/transient/kubepods-besteffort.slice
Try copying from working node the file in same path, my problem was solved.
This is content of my file

# This is a transient unit file, created programmatically via the systemd API. Do not edit.
[Unit]
Description=libcontainer container kubepods-besteffort.slice
Wants=kubepods.slice

[Slice]
MemoryAccounting=yes
CPUAccounting=yes
IOAccounting=yes
TasksAccounting=yes

[Unit]
DefaultDependencies=no

Try it if your file cannot be found.

@kinarashah
Copy link
Member

@huv95 Thanks for sharing what worked for you, to clarify there was no path for /run/systemd/transient/kubepods-besteffort.slice but /run/systemd/transient/kubepods-burstable.slice/ existed?

@huv95
Copy link

huv95 commented Mar 12, 2024

@kinarashah No when error occurs.
After I recreated the /run/systemd/transient/kubepods-besteffort.slice, this process worked and automatically created /run/systemd/transient/kubepods-burstable.slice/
I also found another issue that I think is related to this issue. #3280
I think when kubelet restarts, it drops the QoS cgroup hierarchy, and does not recreate them.
My env is rke v1.4.8 (rke-tools:v0.1.89).
You can reproduce the error by restarting the kubelet container, or more cruelly, kill the container.

@kinarashah
Copy link
Member

@huv95 Thank you for explaining the behavior, appreciate it. Yeah I've been looking into the issue you linked and kubelet's code around when it updates cgroups but didn't reproduce the error yet, will try killing the container and see if it works.

@huv95
Copy link

huv95 commented Mar 12, 2024

@kinarashah Or an easier way
sudo rm /run/systemd/transient/kubepods-besteffort.slice :))

@kinarashah
Copy link
Member

kinarashah commented Mar 13, 2024

@huv95 The version you're upgrading to (v1.25.9 or the original issue's version v1.24.9) - both these versions do not have the fix for the pods restart issue, it was fixed with rke-tools v0.1.92.

Could you try upgrading to the latest RKE and try upgrading to v1.25.16-rancher2-3?

I will take a look into upstream code more if you still reproduce the issue, but I suspect this should reduce/fix the pods restarting issue.

@huv95
Copy link

huv95 commented Apr 11, 2024

@kinarashah I just upgraded from v1.25.12 to v1.26.14, using rke:1.5.7.
It works fine!

@tunaman
Copy link

tunaman commented Apr 25, 2024

We just got hit by this bug except for us it happened weeks after updating Rancher from 2.7.10 to 2.8.2 and was triggered by adding a node to an existing k8s RKE1 v1.26.7 cluster. This resulted in all pods on the cluster ending up in Error state, the Docker daemon was in D state and the only thing that worked for us was to do a hard reboot of all worker nodes. All nodes are running Ubuntu 22.04.3 with docker 23.06.

The only thing I could find in the Rancher control plane logs was the following

rancher-d65b6886-dl567 rancher 2024/04/25 09:26:50 [INFO] Updated cluster [c-xxxxx] with node version [20]
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:50 [INFO] Provisioned cluster [c-xxxxx]
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:50 [INFO] checking cluster [c-xxxxx] for worker nodes upgrade
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:50 [INFO] env changed for [kubelet] old: [RKE_KUBELET_CRIDOCKERD=true RKE_CLOUD_CONFIG_CHECKSUM=6b3ae7041eee769322e929bdd278031c RKE_KUBELET_DOCKER_CONFIG=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX== RKE_KUBELET_DOCKER_FILE=/var/lib/kubelet/config.json] new [RKE_KUBELET_CRIDOCKERD=true RKE_CLOUD_CONFIG_CHECKSUM=b5b36c507f054284a4fd3bb538da6444 RKE_KUBELET_DOCKER_CONFIG=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX== RKE_KUBELET_DOCKER_FILE=/var/lib/kubelet/config.json]
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:50 [ERROR] error syncing 'c-xxxxx/m-pxjp9': handler rke-worker-upgrader: getNodePlan error for node [m-pxjp9]: failed to find plan for 10.15.72.222, requeuing
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:50 [INFO] Provisioning cluster [c-xxxxx]
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:50 [INFO] Updating cluster [c-xxxxx]
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:51 [INFO] checking cluster [c-xxxxx] for worker nodes upgrade
rancher-d65b6886-dl567 rancher 2024/04/25 09:26:51 [INFO] env changed for [kubelet] old: [RKE_KUBELET_CRIDOCKERD=true RKE_CLOUD_CONFIG_CHECKSUM=6b3ae7041eee769322e929bdd278031c RKE_KUBELET_DOCKER_CONFIG=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX== RKE_KUBELET_DOCKER_FILE=/var/lib/kubelet/config.json] new [RKE_KUBELET_CRIDOCKERD=true RKE_CLOUD_CONFIG_CHECKSUM=b5b36c507f054284a4fd3bb538da6444 RKE_KUBELET_DOCKER_CONFIG=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX== RKE_KUBELET_DOCKER_FILE=/var/lib/kubelet/config.json]

@TomyLobo
Copy link

@tunaman
You're describing different symptoms happening under different circumstances, at a different time, in a different component and the issue is resolved with a different workaround.
Are you sure you're in the right thread? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests