Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm join --control-plane to create HA setup killed the cluster #2275

Closed
balboah opened this issue Sep 2, 2020 · 8 comments
Closed

kubeadm join --control-plane to create HA setup killed the cluster #2275

balboah opened this issue Sep 2, 2020 · 8 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@balboah
Copy link

balboah commented Sep 2, 2020

BUG REPORT

Versions

kubeadm version (use kubeadm version): v1.19.0
Environment:

  • Kubernetes version (use kubectl version): v1.19.0
  • Cloud provider or hardware configuration: bare metal onprem
  • OS (e.g. from /etc/os-release): Ubuntu LTS 20.04
  • Kernel (e.g. uname -a): 5.4.0-1015-raspi aarch64
  • Other: cluster originally initialized with kubeadm v1.18.6, containerd 1.3.3, etcd 3.4.9

What happened?

While following the high availability guide at https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ to join another controller node for replication and high availability, the cluster instead stopped (ironically) working.

With the first controller node fully functional with working nodes and scheduling pods as far as I could tell, these steps were taken on the controller1 (working) and controller2 (to be joined into HA):

  1. controller1$ kubeadm init phase upload-certs --upload-certs
  2. controller1$ kubeadm token create
  3. controller2$ kubeadm join --token <copied-from-controller1-output> --discovery-token-unsafe-skip-ca-verification --control-plane --certificate-key <copied-from-controller1-output> api.example.com:6443

Now the output on controller2 stopped with:

[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
To see the stack trace of this error execute with --v=5 or higher

Going back to controller1, it could no longer connect to the api server. The cluster doesn't respond any more.
Restarting kubelet resulted in a looping log of: node "controller1" not found

It seems to me the etcd data on controller1 somehow vanished or became corrupt after the attempted join by controller2. However I'm not sure exactly how to check for etcd logs while running as static pod on containerd instead of docker.

What you expected to happen?

I would never expect that the first controller might break while joining the second one.

How to reproduce it (as minimally and precisely as possible)?

The setup is as follows:

  • raspberry pi 4 with 4G of ram as controllers with "A2" rated sdcards
  • wireguard mesh between all nodes which are set as node IP, configured outside of k8s
  • kubernetes 1.8.6 upgraded to 1.8.9, then 1.9.0
  • containerd CRI
  • calico "default" CNI

Anything else we need to know?

It's a small cluster, there have been no performance issues with the rPi controllers or other etcd corruption before this

@neolit123
Copy link
Member

neolit123 commented Sep 2, 2020

hi,

kubernetes 1.8.6 upgraded to 1.8.9, then 1.9.0

just to double check, you meant 1.1{8|9}.*?

While following the high availability guide at https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ to join another controller node for replication and high availability, the cluster instead stopped (ironically) working.

and

api.example.com:6443

is that a "controlPlaneEndpoint"?
please see:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#initializing-your-control-plane-node

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint

if no, then the second controller will not work after the "upload-certs" command.
it's complicated to adjust that.

Turning a single control plane cluster created without --control-plane-endpoint into a highly available cluster is not supported by kubeadm.

It seems to me the etcd data on controller1 somehow vanished or became corrupt after the attempted join by controller2. However I'm not sure exactly how to check for etcd logs while running as static pod on containerd instead of docker.

ctr had a logs sub-command IIRC.

I would never expect that the first controller might break while joining the second one.

the original CP node, should not break unless something happened with etcd.

/triage support
(not that we close support tickets shortly and recommend users to use the support channels)

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Sep 2, 2020
@balboah
Copy link
Author

balboah commented Sep 2, 2020

Hey, thanks for replying.

just to double check, you meant 1.1{8|9}.*?

Oh right, I mean that the initial cluster had all nodes with 1.18.6, then all nodes upgraded to 1.18.9.* and finally 1.19.0. The current deb package version is 1.19.0-00.

is that a "controlPlaneEndpoint"?

Yeah I've been using the DNS name to talk with the cluster from my desktop client. It had both controlPlaneEndpoint and advertise-address set in the kubeadm-config which I saved at least while doing the previous 1.19.0 upgrade. However, the DNS entry of controlPlaneEndpoint has not yet been hooked up with the IP of the 2nd controller. As I wanted to wait with this the next step. To my understanding this setting doesn't affect etcd replication at all.

Turning a single control plane cluster created without --control-plane-endpoint into a highly available cluster is not supported by kubeadm.

I did not specify * --control-plane-endpoint* while doing the init phase upload-certs, if that makes a difference.

ctr had a logs sub-command IIRC.

I was able to list containers and can see etcd etcd:3.4.9-1 running, but I failed to find anything about logs.

the original CP node, should not break unless something happened with etcd.

Yeah, perhaps it's an etcd issue rather than something with how kubeadm works. If only I could see what etcd is logging

@balboah
Copy link
Author

balboah commented Sep 2, 2020

While trying to connect to etcd with curl and etcdctl, there are no reply at all. even when the tcp connect is successful.

@balboah
Copy link
Author

balboah commented Sep 2, 2020

after a few container deletes and etcdctl snapshot restore while also stopping the 2nd controller from trying to join, it seems I'm at least having the cluster with one controller back into a functional state

@neolit123
Copy link
Member

I did not specify * --control-plane-endpoint* while doing the init phase upload-certs, if that makes a difference.

in general, you should pass the same --config or flags you have passed to kubeadm init to it's phases if you are calling them on demand. otherwise you could get the phases generating "content" that is different from what you want.

I was able to list containers and can see etcd etcd:3.4.9-1 running, but I failed to find anything about logs.

i don't have ctr handy to check this, maybe ctr c:
https://manpages.debian.org/experimental/containerd/ctr.1.en.html

@neolit123
Copy link
Member

neolit123 commented Sep 2, 2020

While trying to connect to etcd with curl and etcdctl, there are no reply at all. even when the tcp connect is successful.

etcd could have crashed, you could file logs in a new issue in the kubernetes/kubernetes repository or etcd repository if you have them and see e.g. panics.

after a few container deletes and etcdctl snapshot restore while also stopping the 2nd controller from trying to join, it seems I'm at least having the cluster with one controller back into a functional state

that is good.

all of our kubeadm CI uses the following:

  • creates a VIP / LB and a single CP node
  • joins more CP nodes
  • joins worker nodes
  • runs tests ... etc

so, this is a supported scenario:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm

yet, unclear what happened in your case.

i'm going to close this support the ticket, but please drop a message if you find out what happened.
thanks
/close

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

While trying to connect to etcd with curl and etcdctl, there are no reply at all. even when the tcp connect is successful.

etcd could have crashed, you could file logs in a new issue in the kubernetes/kubernetes repository or etcd repository if you have them and see e.g. panics.

after a few container deletes and etcdctl snapshot restore while also stopping the 2nd controller from trying to join, it seems I'm at least having the cluster with one controller back into a functional state

that is good.

all of our kubeadm CI uses the following:

  • creates a VIP / LB and a single CP node
  • joins more CP nodes
  • joins worker nodes
  • runs tests ... etc

so, this is a supported scenario:
https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-all

yet, unclear what happened in your case.

i'm going to close this support the ticket, but please drop a message if you find out what happened.
thanks
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@balboah
Copy link
Author

balboah commented Sep 3, 2020

Alright I figured it out eventually.
So on the Ubuntu install, logs will get collected at /var/log/pods/kube-system*.

The first issue was that I did not provide --apiserver-advertise-address on the new controller join, it resolved to the default interface which was incorrect in my case (even though --node-ip is passed to kubelet).

The second issue which probably only happened because I kubeadm reset; rm -rf /etc/kubernets to retry, was that kubelet refused to start etcd since I provided --resolv-conf=/etc/kubernetes/resolv.conf.

So if there are any issues with etcd connectivity while joining a new controller, your cluster will go down since etcd can't figure out who's leader.

To anyone else getting into this broken state, I did this to recover previously:

  1. (on both controllers) service stop kubelet
  2. ctr -n k8s.io c list; ctr -n k8s.io c delete
  3. cp /var/lib/etcd/member/snap/db ~/backup
  4. rm -rf /var/lib/etcd
  5. ETCDCTL_API=3 etcdctl snapshot restore ~/backup --name=controller1 --initial-cluster=controller1=https://10.96.0.1:2380 --initial-advertise-peer-urls=https://10.96.0.1:2380 --data-dir=/var/lib/etcd --skip-hash-check=true

This would make sure etcd doesn't try to hook up with a 2nd peer which would break quorum when it doesn't respond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants