kubeadm join --control-plane to create HA setup killed the cluster #2275

balboah · 2020-09-02T13:07:59Z

BUG REPORT

Versions

kubeadm version (use kubeadm version): v1.19.0
Environment:

Kubernetes version (use kubectl version): v1.19.0
Cloud provider or hardware configuration: bare metal onprem
OS (e.g. from /etc/os-release): Ubuntu LTS 20.04
Kernel (e.g. uname -a): 5.4.0-1015-raspi aarch64
Other: cluster originally initialized with kubeadm v1.18.6, containerd 1.3.3, etcd 3.4.9

What happened?

While following the high availability guide at https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ to join another controller node for replication and high availability, the cluster instead stopped (ironically) working.

With the first controller node fully functional with working nodes and scheduling pods as far as I could tell, these steps were taken on the controller1 (working) and controller2 (to be joined into HA):

controller1$ kubeadm init phase upload-certs --upload-certs
controller1$ kubeadm token create
controller2$ kubeadm join --token <copied-from-controller1-output> --discovery-token-unsafe-skip-ca-verification --control-plane --certificate-key <copied-from-controller1-output> api.example.com:6443

Now the output on controller2 stopped with:

[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
To see the stack trace of this error execute with --v=5 or higher

Going back to controller1, it could no longer connect to the api server. The cluster doesn't respond any more.
Restarting kubelet resulted in a looping log of: node "controller1" not found

It seems to me the etcd data on controller1 somehow vanished or became corrupt after the attempted join by controller2. However I'm not sure exactly how to check for etcd logs while running as static pod on containerd instead of docker.

What you expected to happen?

I would never expect that the first controller might break while joining the second one.

How to reproduce it (as minimally and precisely as possible)?

The setup is as follows:

raspberry pi 4 with 4G of ram as controllers with "A2" rated sdcards
wireguard mesh between all nodes which are set as node IP, configured outside of k8s
kubernetes 1.8.6 upgraded to 1.8.9, then 1.9.0
containerd CRI
calico "default" CNI

Anything else we need to know?

It's a small cluster, there have been no performance issues with the rPi controllers or other etcd corruption before this

The text was updated successfully, but these errors were encountered:

neolit123 · 2020-09-02T13:47:14Z

hi,

kubernetes 1.8.6 upgraded to 1.8.9, then 1.9.0

just to double check, you meant 1.1{8|9}.*?

While following the high availability guide at https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ to join another controller node for replication and high availability, the cluster instead stopped (ironically) working.

and

api.example.com:6443

is that a "controlPlaneEndpoint"?
please see:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#initializing-your-control-plane-node

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint

if no, then the second controller will not work after the "upload-certs" command.
it's complicated to adjust that.

Turning a single control plane cluster created without --control-plane-endpoint into a highly available cluster is not supported by kubeadm.

It seems to me the etcd data on controller1 somehow vanished or became corrupt after the attempted join by controller2. However I'm not sure exactly how to check for etcd logs while running as static pod on containerd instead of docker.

ctr had a logs sub-command IIRC.

I would never expect that the first controller might break while joining the second one.

the original CP node, should not break unless something happened with etcd.

/triage support
(not that we close support tickets shortly and recommend users to use the support channels)

balboah · 2020-09-02T14:24:25Z

Hey, thanks for replying.

just to double check, you meant 1.1{8|9}.*?

Oh right, I mean that the initial cluster had all nodes with 1.18.6, then all nodes upgraded to 1.18.9.* and finally 1.19.0. The current deb package version is 1.19.0-00.

is that a "controlPlaneEndpoint"?

Yeah I've been using the DNS name to talk with the cluster from my desktop client. It had both controlPlaneEndpoint and advertise-address set in the kubeadm-config which I saved at least while doing the previous 1.19.0 upgrade. However, the DNS entry of controlPlaneEndpoint has not yet been hooked up with the IP of the 2nd controller. As I wanted to wait with this the next step. To my understanding this setting doesn't affect etcd replication at all.

Turning a single control plane cluster created without --control-plane-endpoint into a highly available cluster is not supported by kubeadm.

I did not specify * --control-plane-endpoint* while doing the init phase upload-certs, if that makes a difference.

ctr had a logs sub-command IIRC.

I was able to list containers and can see etcd etcd:3.4.9-1 running, but I failed to find anything about logs.

the original CP node, should not break unless something happened with etcd.

Yeah, perhaps it's an etcd issue rather than something with how kubeadm works. If only I could see what etcd is logging

balboah · 2020-09-02T14:43:06Z

While trying to connect to etcd with curl and etcdctl, there are no reply at all. even when the tcp connect is successful.

balboah · 2020-09-02T15:14:21Z

after a few container deletes and etcdctl snapshot restore while also stopping the 2nd controller from trying to join, it seems I'm at least having the cluster with one controller back into a functional state

neolit123 · 2020-09-02T15:32:53Z

I did not specify * --control-plane-endpoint* while doing the init phase upload-certs, if that makes a difference.

in general, you should pass the same --config or flags you have passed to kubeadm init to it's phases if you are calling them on demand. otherwise you could get the phases generating "content" that is different from what you want.

I was able to list containers and can see etcd etcd:3.4.9-1 running, but I failed to find anything about logs.

i don't have ctr handy to check this, maybe ctr c:
https://manpages.debian.org/experimental/containerd/ctr.1.en.html

neolit123 · 2020-09-02T15:34:55Z

While trying to connect to etcd with curl and etcdctl, there are no reply at all. even when the tcp connect is successful.

etcd could have crashed, you could file logs in a new issue in the kubernetes/kubernetes repository or etcd repository if you have them and see e.g. panics.

after a few container deletes and etcdctl snapshot restore while also stopping the 2nd controller from trying to join, it seems I'm at least having the cluster with one controller back into a functional state

that is good.

all of our kubeadm CI uses the following:

creates a VIP / LB and a single CP node
joins more CP nodes
joins worker nodes
runs tests ... etc

so, this is a supported scenario:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm

yet, unclear what happened in your case.

i'm going to close this support the ticket, but please drop a message if you find out what happened.
thanks
/close

k8s-ci-robot · 2020-09-02T15:35:09Z

@neolit123: Closing this issue.

In response to this:

While trying to connect to etcd with curl and etcdctl, there are no reply at all. even when the tcp connect is successful.

etcd could have crashed, you could file logs in a new issue in the kubernetes/kubernetes repository or etcd repository if you have them and see e.g. panics.

after a few container deletes and etcdctl snapshot restore while also stopping the 2nd controller from trying to join, it seems I'm at least having the cluster with one controller back into a functional state

that is good.

all of our kubeadm CI uses the following:

creates a VIP / LB and a single CP node

joins more CP nodes

joins worker nodes

runs tests ... etc

so, this is a supported scenario:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all

yet, unclear what happened in your case.

i'm going to close this support the ticket, but please drop a message if you find out what happened.
thanks
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

balboah · 2020-09-03T09:21:34Z

Alright I figured it out eventually.
So on the Ubuntu install, logs will get collected at /var/log/pods/kube-system*.

The first issue was that I did not provide --apiserver-advertise-address on the new controller join, it resolved to the default interface which was incorrect in my case (even though --node-ip is passed to kubelet).

The second issue which probably only happened because I kubeadm reset; rm -rf /etc/kubernets to retry, was that kubelet refused to start etcd since I provided --resolv-conf=/etc/kubernetes/resolv.conf.

So if there are any issues with etcd connectivity while joining a new controller, your cluster will go down since etcd can't figure out who's leader.

To anyone else getting into this broken state, I did this to recover previously:

(on both controllers) service stop kubelet
ctr -n k8s.io c list; ctr -n k8s.io c delete
cp /var/lib/etcd/member/snap/db ~/backup
rm -rf /var/lib/etcd
ETCDCTL_API=3 etcdctl snapshot restore ~/backup --name=controller1 --initial-cluster=controller1=https://10.96.0.1:2380 --initial-advertise-peer-urls=https://10.96.0.1:2380 --data-dir=/var/lib/etcd --skip-hash-check=true

This would make sure etcd doesn't try to hook up with a 2nd peer which would break quorum when it doesn't respond.

k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Sep 2, 2020

k8s-ci-robot closed this as completed Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubeadm join --control-plane to create HA setup killed the cluster #2275

kubeadm join --control-plane to create HA setup killed the cluster #2275

balboah commented Sep 2, 2020

neolit123 commented Sep 2, 2020 •

edited

balboah commented Sep 2, 2020 •

edited

balboah commented Sep 2, 2020

balboah commented Sep 2, 2020 •

edited

neolit123 commented Sep 2, 2020

neolit123 commented Sep 2, 2020 •

edited

k8s-ci-robot commented Sep 2, 2020

balboah commented Sep 3, 2020

kubeadm join --control-plane to create HA setup killed the cluster #2275

kubeadm join --control-plane to create HA setup killed the cluster #2275

Comments

balboah commented Sep 2, 2020

Versions

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

neolit123 commented Sep 2, 2020 • edited

balboah commented Sep 2, 2020 • edited

balboah commented Sep 2, 2020

balboah commented Sep 2, 2020 • edited

neolit123 commented Sep 2, 2020

neolit123 commented Sep 2, 2020 • edited

k8s-ci-robot commented Sep 2, 2020

balboah commented Sep 3, 2020

neolit123 commented Sep 2, 2020 •

edited

balboah commented Sep 2, 2020 •

edited

balboah commented Sep 2, 2020 •

edited

neolit123 commented Sep 2, 2020 •

edited