Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create multiple cluster using kind 0.20.0+ on RedHat 8 with cgroup v1/cgroupns #3558

Open
3 tasks done
sbonds opened this issue Mar 21, 2024 · 9 comments
Open
3 tasks done
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@sbonds
Copy link

sbonds commented Mar 21, 2024

This bug is nearly identical to #3340 which was thought to be specific to Amazon Linux, but this appears on RedHat 8 and Rocky Linux 8.

What happened:

Creating a second cluster fails. The message seen from the kind command is:

ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged lp-cluster-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1

The Docker logs show more info such as:

Failed to attach 180 to compat systemd cgroup /dev-hugepages.mount: No such file or directory
         Mounting dev-hugepages.mount - Huge Pages File System...
Failed to attach 180 to compat systemd cgroup /dev-hugepages.mount: No such file or directory
Failed to attach 181 to compat systemd cgroup /sys-kernel-debug.mount: No such file or directory
         Mounting sys-kernel-debug.… - Kernel Debug File System...
Failed to attach 181 to compat systemd cgroup /sys-kernel-debug.mount: No such file or directory
Failed to attach 182 to compat systemd cgroup /sys-kernel-tracing.mount: No such file or directory
         Mounting sys-kernel-tracin… - Kernel Trace File System...
Failed to attach 182 to compat systemd cgroup /sys-kernel-tracing.mount: No such file or directory
         Starting kmod-static-nodes…ate List of Static Device Nodes...
         Starting modprobe@configfs…m - Load Kernel Module configfs...
         Starting modprobe@dm_mod.s…[0m - Load Kernel Module dm_mod...
         Starting modprobe@fuse.ser…e - Load Kernel Module fuse...
         Starting modprobe@loop.ser…e - Load Kernel Module loop...
Failed to attach 188 to compat systemd cgroup /system.slice/systemd-journald.service: No such file or directory
         Starting systemd-journald.service - Journal Service...
Failed to attach 188 to compat systemd cgroup /system.slice/systemd-journald.service: No such file or directory

What you expected to happen:

Creating a second cluster succeeds.

How to reproduce it (as minimally and precisely as possible):

Well, I figure you want precisely more than minimally. These steps may also be found at https://pastebin.com/RjGt0RBN. Apologies for the ads they will insert.

Reproduce on a RedHat Azure VM

You will need an Azure account and be logged in with the ability to create resource groups and VMs (e.g. "Contributor".) Running this test will cost between $0.25 and $1.00 provided you move at a reasonable pace.

Running these commands can be done from multiple possible places, but one of the most consistent is to use the Microsoft Azure CLI Docker container:

  • docker run -i -t mcr.microsoft.com/azure-cli

The only real annoyance about the Microsoft container is it doesn't have ssh installed. It can be installed with:

  • apk add openssh

From a bash shell run this, adjusting names as needed. This should work as-is if you don't want to change away from the "sbonds" names:

# Based on https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-cli
export RESOURCE_GROUP_NAME=rg-sbonds-kind-cgroups
export VNET_NAME=vnet-sbonds-kind-cgroups
export LOCATION=westus2
export VM_NAME=sbonds-kind-cgroups-bug-demo
export VM_IMAGE=RedHat:RHEL:8_8:latest
export ADMIN_USERNAME=sbonds_admin
# Generate an SSH key
ssh-keygen -t rsa -b 2048 -N "" -C sbonds-kind-cgroups -f "$HOME/.ssh/id_rsa_sbonds-kind-cgroups"
export SSH_PUBLIC_KEY_CONTENT=$(cat "$HOME/.ssh/id_rsa_sbonds-kind-cgroups.pub")
# Create resource group to hold VM and virtual network (vnet)
az group create \
  --name "$RESOURCE_GROUP_NAME" \
  --location "$LOCATION"
# Create VM with access provided by a local ssh key
az vm create \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --name "$VM_NAME" \
  --image "$VM_IMAGE" \
  --admin-username "$ADMIN_USERNAME" \
  --ssh-key-values "$SSH_PUBLIC_KEY_CONTENT" \
  --size Standard_B2s \
  --public-ip-sku Basic

Log in to the new VM

Put the public IP returned from the above "az vm create" command into VM_IP:

export VM_IP="52.252.199.204" # your IP will be different

SSH to the public IP address using the SSH private key and the admin username:

ssh -i "$HOME/.ssh/id_rsa_sbonds-kind-cgroups" -o "StrictHostKeyChecking no" "$ADMIN_USERNAME@$VM_IP"

Install Docker

To become root:

sudo su -

To not be root, type "exit" and it leaves the sudo session, reverting back to your original login.

Docker installation

As root:

dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
dnf -y install docker-ce docker-ce-cli containerd.io
systemctl start docker
systemctl enable docker
docker image ls
usermod -G sbonds_admin,wheel,docker sbonds_admin

Test from non-root ("sbonds") account. No images yet, so this is empty:

REPOSITORY   TAG       IMAGE ID   CREATED   SIZE

Install Kind

As root:

curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod 0755 ./kind
mv ./kind /usr/local/bin/kind
kind --version

Log out and back in

Log out and back in again to pick up the results of the "usermod" which allows non-root access to Docker.

exit
exit

Up-arrow will restore your prior ssh command:

ssh -i "$HOME/.ssh/id_rsa_sbonds-kind-cgroups" -o "StrictHostKeyChecking no" "$ADMIN_USERNAME@$VM_IP"

Create default kind cluster

kind create cluster

Output:

Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.29.2) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Not sure what to do next? 😅  Check out https://kind.sigs.k8s.io/docs/user/quick-start/

Create non-default kind cluster

  • kind create cluster --kubeconfig $HOME/.kube/kubeconfig_nondefault-cluster --name nondefault-cluster --retain

Collect logs

mkdir $HOME/kind-logs
kind export logs $HOME/kind-logs --name nondefault-cluster

Copy the logs to your local location

Exit the remote ssh connection

exit

scp the logs locally

Also create a tar archive if you'll need to copy the files out of the Docker container.

cd /root
scp -r -i "$HOME/.ssh/id_rsa_sbonds-kind-cgroups" -o "StrictHostKeyChecking no" "$ADMIN_USERNAME@$VM_IP:kind-logs" .
tar czvf kind-logs.tar.gz kind-logs

If necessary, copy it out of your Docker container:

From a local command line (bash, Powershell, etc.):

docker ps

Find the ID of your mcr.microsoft.com/azure-cli image and use docker cp to copy out the kind-logs.tar.gz file created earlier.

docker cp bbede2b3ace1:/root/kind-logs.tar.gz .

7-zip and other utilities can handle .tar.gz files.

Destroy your test environment

This stops those Azure charges from accumulating. Be sure to get the resource group name right since this command is extremely destructive and does not prompt for confirmation.

echo $RESOURCE_GROUP_NAME

Last chance to stop! Be sure this is correct since the next command can do a lot of damage quickly if the wrong group is used.

az group delete --resource-group "$RESOURCE_GROUP_NAME" --yes

Anything else we need to know?:

Environment:

  • kind version: (use kind version): v0.22.0
  • Runtime info: (use docker info, podman info or nerdctl info): Docker 26.0.0
  • OS (e.g. from /etc/os-release): RHEL 8.9
  • Kubernetes version: (use kubectl version): didn't install kubectl
  • Any proxies or other special environment settings?: nope

Tasks

@sbonds sbonds added the kind/bug Categorizes issue or PR as related to a bug. label Mar 21, 2024
@sbonds
Copy link
Author

sbonds commented Mar 21, 2024

This didn't seem to take the first time-- the logs from when I did the above process

kind-logs.tar.gz

@BenTheElder
Copy link
Member

Thanks for the detailed post, to be clear: I'm not able to use Azure at work, the Kubernetes project does not have funding from Azure currently, and I'm not going to be paying Azure personally (especially to debug a commercial non-open linux distro).

I am however looking at the logs you uploaded.

This bit from kubelet logs:

Mar 21 19:17:01 nondefault-cluster-control-plane kubelet[3472]: E0321 19:17:01.574844 3472 kubelet.go:1542] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubelet kubepods] doesn't exist"

Looking around a bit more:

Kernel Version: 4.18.0-477.10.1.el8_8.x86_64

Unfortunately it's not too surprising that a 6 year old kernel is giving issues, I don't think any of the rest of the container ecosystem (IE docker, runc, containerd, kubernetes, ....) is actively testing on anything like this. It's difficult for us to extend better coverage than the underlying ecosystem.

I'm guessing we are hitting issues with cgroupns=private with 4.18
Not using cgroupns leaves bigger problems on current distros.

You could try kind v0.19 as a stopgap to confirm if it's cgroupns related.

@sbonds
Copy link
Author

sbonds commented Mar 25, 2024

I completely understand your feelings about RedHat, given the recent rise in their evilness. It remains a popular distro, especially among larger companies the likes of which employ folks learning about Kubernetes. If it helps, I noted this problem on Rocky Linux 8, so it's not specific to the commercial version of RedHat.

Azure is not required to replicate the issue, but is a convenient way to do so. It should be relatively easy to replicate using a local Rocky Linux 8 VM.

So far my workaround has been to use Ubuntu. :-)

@BenTheElder
Copy link
Member

Rephrasing a bit: I really appreciate how thorough this bug report is including the reproducing guidance!

Just trying to be clear and realistic about prioritization for my own time.
RHEL/Azure aren't high for me personally (happy to review bug fixes and look at logs etc. though!) and KIND itself is only a portion of what we do.

Ubuntu is a good workaround, the ecosystem has pretty solid coverage there (also Fedora in KIND at least).

It's really strange that this is only the second cluster and not one of the common resource exhaustion failure modes like inotify limits.

Could you grab the logs from the first cluster just to confirm the difference in serial log etc?

@sbonds
Copy link
Author

sbonds commented Mar 26, 2024

Here are both sets of logs from a repeated repro session using Rocky Linux 8. I also added the system log from /var/log/messages:

kind-logs-20240326.tar.gz

Repro step change to VM_IMAGE to use Rocky Linux 8 instead:

# Based on https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-cli
export RESOURCE_GROUP_NAME=rg-sbonds-kind-cgroups
export VNET_NAME=vnet-sbonds-kind-cgroups
export LOCATION=westus2
export VM_NAME=sbonds-kind-cgroups-bug-demo
export VM_IMAGE=erockyenterprisesoftwarefoundationinc1653071250513:rockylinux:free:latest
export ADMIN_USERNAME=sbonds_admin
# Generate an SSH key
ssh-keygen -t rsa -b 2048 -N "" -C sbonds-kind-cgroups -f "$HOME/.ssh/id_rsa_sbonds-kind-cgroups"
export SSH_PUBLIC_KEY_CONTENT=$(cat "$HOME/.ssh/id_rsa_sbonds-kind-cgroups.pub")
# Create resource group to hold VM and virtual network (vnet)
az group create \
  --name "$RESOURCE_GROUP_NAME" \
  --location "$LOCATION"
# Create VM with access provided by a local ssh key
az vm create \
  --resource-group "$RESOURCE_GROUP_NAME" \
  --name "$VM_NAME" \
  --image "$VM_IMAGE" \
  --admin-username "$ADMIN_USERNAME" \
  --ssh-key-values "$SSH_PUBLIC_KEY_CONTENT" \
  --size Standard_B2s \
  --public-ip-sku Basic

@sbonds
Copy link
Author

sbonds commented Mar 26, 2024

The second cluster adds fine on Rocky Linux 9.

@sbonds
Copy link
Author

sbonds commented Mar 26, 2024

And v19 works fine on Rock Linux 8. Should you want the logs for comparison, here they are:

kind-logs-20240326-v19.tar.gz

@sbonds
Copy link
Author

sbonds commented Mar 26, 2024

Given that the old Kind works on the old RedHat and the new kind works on the new RedHat, this seems like a low priority to fix. Perhaps a documentation update and compatibility check could be added to make it easier to tell that Kind v20+ is known to have issues on RedHat/Rocky 8.

@BenTheElder
Copy link
Member

In a related note: kubernetes/enhancements#4572 there is open discussion to drop cgroup v1 support in Kubernetes.

Similarly, systemd is going to drop v1, once that filters into our base image (and we'll want to keep getting systemd patches) we'll have the same issue there systemd/systemd#30852

Hard to say exact timeline, but the ecosystem around us is moving away from supporting v1 and that will start to make attempts for kind to support v1 moot, making it harder to prioritize doubling back to dig deeper on this case.

Per Kubernetes's own docs, a 5.8+ kernel will be needed for cgroup v2.

Some time after this happens we will probably add a blanket check for docker/podman/nerdctl to be on a cgroup v2 host, which will be a lot less brittle than attempting to detect the set of distros with broken v1+cgroupns (also not sure if docker etc expose enough information there but they do for v1/v2, inspecting the host local to kind process is incorrect since remote daemons otherwise mostly work)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants