Kubeadm update to 1.10 fails on ha k8s/etcd cluster #837

brokenmass · 2018-05-20T14:57:51Z

BUG REPORT

Versions

kubeadm version: 1.10.2

Environment:

Kubernetes version: 1.9.3
Cloud provider or hardware configuration: 3 x k8s master HA
OS: RHEL7
Kernel: 3.10.0-693.11.6.el7.x86_64

What happened?

A couple of months ago I created a kubernetes 1.9.3 HA cluster using kubeadm 1.9.3, following the 'official' documentation https://kubernetes.io/docs/setup/independent/high-availability/ , setting up the etcd HA cluster hosting it on the master nodes using static pods

I wanted to upgrade my cluster to k8s 1.10.2 using the latest kubeadm; after updating kubeadm, when running kubeadm upgrade plan, I got the following error:

[root@shared-cob-01 tmp]# kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[upgrade/plan] computing upgrade possibilities
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.9.3
[upgrade/versions] kubeadm version: v1.10.2
[upgrade/versions] Latest stable version: v1.10.2
[upgrade/versions] FATAL: context deadline exceeded

I investigate the issue and found the 2 root causes:

1) `kubeadm` doesn't identify `etcd` cluster as TLS enabled

The guide instruct to use the following command in the etcd static pod

- etcd --name <name> \
  - --data-dir /var/lib/etcd \
  - --listen-client-urls http://localhost:2379 \
  - --advertise-client-urls http://localhost:2379 \
  - --listen-peer-urls http://localhost:2380 \
  - --initial-advertise-peer-urls http://localhost:2380 \
  - --cert-file=/certs/server.pem \
  - --key-file=/certs/server-key.pem \
  - --client-cert-auth \
  - --trusted-ca-file=/certs/ca.pem \
  - --peer-cert-file=/certs/peer.pem \
  - --peer-key-file=/certs/peer-key.pem \
  - --peer-client-cert-auth \
  - --peer-trusted-ca-file=/certs/ca.pem \
  - --initial-cluster etcd0=https://<etcd0-ip-address>:2380,etcd1=https://<etcd1-ip-address>:2380,etcd2=https://<etcd2-ip-address>:2380 \
  - --initial-cluster-token my-etcd-token \
  - --initial-cluster-state new

kubeadm >= 1.10 checks (here: https://github.com/kubernetes/kubernetes/blob/release-1.10/cmd/kubeadm/app/util/etcd/etcd.go#L56) if etcd has TLS enabled by checking the presence of the following flags in the static pod command.

"--cert-file=",
"--key-file=",
"--trusted-ca-file=",
"--client-cert-auth=",
"--peer-cert-file=",
"--peer-key-file=",
"--peer-trusted-ca-file=",
"--peer-client-cert-auth=",

but as the flags --client-cert-auth and --peer-client-cert-auth are used in the instructions without any parameter (being booleans) kubeadm didn’t recognise the etcd cluster to have TLS enabled.

PERSONAL FIX:
I updated my etcd static pod command to use - --client-cert-auth=true and - --peer-client-cert-auth=true

GENERAL FIX:
Update the instructions to use --client-cert-auth=true and --peer-client-cert-auth=true and relax kubeadm checks using "--peer-cert-file" and"--peer-key-file" (without the equals)

2) `kubeadm` didn't use the correct certificates

after fixing point 1, the problem still persisted as kubeadm was not using the right certificates.
By following the kubeadm HA guide, in fact, the created certificates are ca.pem ca-key.pem peer.pem peer-key.pem client.pem client-key.pem but the latest kubeadm expects ca.crt ca.key``peer.crt peer.key``healthcheck-client.crt healthcheck-client.key instead.
Yhe kubeadm-config MasterConfiguration keys etcd.caFile, etcd.certFile and etcd.keyFile are ignored.

PERSONAL FIX:
Renamed .pem certificate to their .crt and .key equivalent and updated the etcd static pod configuration to use them.

GENERAL FIX:
Use the kubeadm-config data.caFile, data.certFile and data.keyFile values, infer the right certificates from etcd static pod definition (pod path + volumes hostPath) and/or create a new temporary client certificate to use during the upgrade.

What you expected to happen?

The upgrade plan should have been executed correctly

How to reproduce it (as minimally and precisely as possible)?

create a k8s ha cluster using kubeadm 1.9.3 following https://kubernetes.io/docs/setup/independent/high-availability/ and try to update it to k8s >= 1.10 using the latest kubeadm

The text was updated successfully, but these errors were encountered:

brokenmass · 2018-05-22T10:52:17Z

this issue seems to be fixed in kubeadm 1.10.3, even though it will not automatically update the static etcd pod as it recognise it as 'external'

FloMedja · 2018-05-22T13:39:16Z

I am using kubeadm 1.10.3 and have the same issues . My cluster is 1.10.2 with an external secure etcd

FloMedja · 2018-05-22T13:41:51Z

@brokenmass Does the values for your personnal fixes to the second cause you notice look like this :

  caFile: /etc/kubernetes/pki/etcd/ca.crt
  certFile: /etc/kubernetes/pki/etcd/healthcheck-client.crt
  keyFile: /etc/kubernetes/pki/etcd/healthcheck-client.key

FloMedja · 2018-05-22T15:16:55Z

@detiber can you help please ?

brokenmass · 2018-05-22T22:49:14Z

@FloMedja
in my case the values looks like :

  caFile: /etc/kubernetes/pki/etcd/ca.pem
  certFile: /etc/kubernetes/pki/etcd/client.pem
  keyFile: /etc/kubernetes/pki/etcd/client-key.pem

and 1.10.3 is working correctly

FloMedja · 2018-05-23T11:58:09Z

@brokenmass So with kubeadm 1.10.3 everything work without no need of your personals fixes. In this case i am little confused. I have kubeadm 1.10.3 but the same error message that you mention in this bug report. I will double check my config may be i make some mistakes elsewhere

brokenmass · 2018-05-23T18:01:49Z

add here (or join kubernetes slack and send me a direct message) your kubeadm-config, etcd static pods yml and the full output of kubeadm upgrade plan

detiber · 2018-05-24T16:12:50Z

My apologies, I'm just now seeing this. @chuckha did the original work for the static-pod HA etcd docs, I'll work with him over the next couple of days to see if we can help straighten out the HA upgrades.

FloMedja · 2018-05-24T16:37:52Z

@detiber thanks you. the upgrade plan finally work. but i face some race conditions issues when tries to upgrade the cluster. sometime it work sometimes i hae the same error as kubernetes/kubeadm/issues/850 . kubeadm run into race condition when try to restart a pod on one node.

detiber · 2018-05-25T17:07:31Z

I ran into some snags getting a test env setup for this today and I'm running out of time before my weekend starts. I'll pick back up on this early next week.

timothysc · 2018-05-25T18:12:44Z

/assign @chuckha @detiber

luxas · 2018-06-12T18:12:10Z

@chuckha @detiber @stealthybox any update on this?

timothysc · 2018-08-29T19:47:58Z

So 1.9->1.10 HA upgrade was not a supported or vetted path.

We are currently in progress on updating our maintain our docs for 1.11->1.12 which we do plan to maintain going forwards.

neolit123 added kind/bug Categorizes issue or PR as related to a bug. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. documentation/improvement area/HA area/upgrades area/UX labels May 20, 2018

k8s-ci-robot assigned chuckha and detiber May 25, 2018

timothysc added this to the v1.11 milestone May 25, 2018

timothysc removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 25, 2018

timothysc unassigned detiber May 29, 2018

timothysc added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 29, 2018

timothysc modified the milestones: v1.11, v1.12 Jun 13, 2018

timothysc assigned timothysc and unassigned chuckha Aug 21, 2018

timothysc closed this as completed Aug 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubeadm update to 1.10 fails on ha k8s/etcd cluster #837

Kubeadm update to 1.10 fails on ha k8s/etcd cluster #837

brokenmass commented May 20, 2018 •

edited

brokenmass commented May 22, 2018 •

edited

FloMedja commented May 22, 2018 •

edited

FloMedja commented May 22, 2018 •

edited

FloMedja commented May 22, 2018

brokenmass commented May 22, 2018

FloMedja commented May 23, 2018

brokenmass commented May 23, 2018 •

edited

detiber commented May 24, 2018

FloMedja commented May 24, 2018

detiber commented May 25, 2018

timothysc commented May 25, 2018

luxas commented Jun 12, 2018

timothysc commented Aug 29, 2018

Kubeadm update to 1.10 fails on ha k8s/etcd cluster #837

Kubeadm update to 1.10 fails on ha k8s/etcd cluster #837

Comments

brokenmass commented May 20, 2018 • edited

BUG REPORT

Versions

What happened?

1) kubeadm doesn't identify etcd cluster as TLS enabled

2) kubeadm didn't use the correct certificates

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

brokenmass commented May 22, 2018 • edited

FloMedja commented May 22, 2018 • edited

FloMedja commented May 22, 2018 • edited

FloMedja commented May 22, 2018

brokenmass commented May 22, 2018

FloMedja commented May 23, 2018

brokenmass commented May 23, 2018 • edited

detiber commented May 24, 2018

FloMedja commented May 24, 2018

detiber commented May 25, 2018

timothysc commented May 25, 2018

luxas commented Jun 12, 2018

timothysc commented Aug 29, 2018

brokenmass commented May 20, 2018 •

edited

1) `kubeadm` doesn't identify `etcd` cluster as TLS enabled

2) `kubeadm` didn't use the correct certificates

brokenmass commented May 22, 2018 •

edited

FloMedja commented May 22, 2018 •

edited

FloMedja commented May 22, 2018 •

edited

brokenmass commented May 23, 2018 •

edited