Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot access Kubernetes Cluster after Master node Dynamic IP changes #108453

Closed
hrabhijith opened this issue Mar 2, 2022 · 16 comments
Closed

Cannot access Kubernetes Cluster after Master node Dynamic IP changes #108453

hrabhijith opened this issue Mar 2, 2022 · 16 comments
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@hrabhijith
Copy link

hrabhijith commented Mar 2, 2022

What happened?

I had a single node (only master) Kubernetes cluster up and running with couple of workloads. The cluster was created using "Kubeadm".

Now, the IP address of the Master node changes due to DHCP or assume that I had to shift the Master node to other network or unplugged the ethernet and connected to a different NIC. (Dual NIC)

The following issues come up:

1. Kubectl access is lost. So I changed IP in kube config.

The connection to the server <newIP>:6443 was refused - did you specify the right host or port?

2. Kubelet had old IP

So I edited /etc/kubernetes/kubelet.env to have new IP and daemon-reload and I restarted Kubelet. Kubelet picked new IP, but has following errors. (--hostname--override = node1)

Mär 02 13:41:45 katpk028 kubelet[1523]: I0302 13:41:45.376675    1523 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Get "https://127.0.0.1:6443/apis/storage.k8s.io/v1/csinodes/node1": dial tcp 127.0.0.1:6443: connect: connection refused
Mär 02 13:41:45 katpk028 kubelet[1523]: E0302 13:41:45.421006    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:45 katpk028 kubelet[1523]: I0302 13:41:45.509577    1523 kubelet_node_status.go:362] "Setting node annotation to enable volume controller attach/detach"
Mär 02 13:41:45 katpk028 kubelet[1523]: E0302 13:41:45.521636    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:45 katpk028 kubelet[1523]: I0302 13:41:45.528637    1523 kubelet_node_status.go:554] "Recording event message for node" node="node1" event="NodeHasSufficientMemory"
Mär 02 13:41:45 katpk028 kubelet[1523]: I0302 13:41:45.528659    1523 kubelet_node_status.go:554] "Recording event message for node" node="node1" event="NodeHasNoDiskPressure"
Mär 02 13:41:45 katpk028 kubelet[1523]: I0302 13:41:45.528669    1523 kubelet_node_status.go:554] "Recording event message for node" node="node1" event="NodeHasSufficientPID"
Mär 02 13:41:45 katpk028 kubelet[1523]: I0302 13:41:45.528837    1523 scope.go:111] "RemoveContainer" containerID="e84ead912867f6fe36d33560b49c4cd7eb93ee836ac3ad2c21c6d1058596a941"
Mär 02 13:41:45 katpk028 kubelet[1523]: E0302 13:41:45.621905    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:45 katpk028 kubelet[1523]: E0302 13:41:45.722709    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:45 katpk028 kubelet[1523]: E0302 13:41:45.823109    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:45 katpk028 kubelet[1523]: E0302 13:41:45.923448    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:46 katpk028 kubelet[1523]: E0302 13:41:46.023789    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:46 katpk028 kubelet[1523]: E0302 13:41:46.124139    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:46 katpk028 kubelet[1523]: E0302 13:41:46.224458    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:46 katpk028 kubelet[1523]: E0302 13:41:46.325133    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:41:46 katpk028 kubelet[1523]: E0302 13:41:46.425867    1523 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"

I restarted and also tried re installing docker. No changes.

I tried the following methods to restore the cluster with new IP address.

Method 1: I saved the contents of /etc/kubernetes/kubeadm-config.yaml.

Once the IP changed, I moved the /etc/kubernetes folder to backup and in the "kubeadm-config.yaml" file from last step I changed the old IP to new IP. I stopped kubelet and then I ran,

kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd --config=kubeadm-config.yaml

But got the following error:

[init] Using Kubernetes version: v1.21.6
[preflight] Running pre-flight checks
        [WARNING FileExisting-ethtool]: ethtool not found in system path

error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ExternalEtcdVersion]: Get "https://<NEWIP>:2379/version": dial tcp <newIP>:2379: connect: connection refused
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

I removed the config file from command and again ran,

kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd

But I got the following error:

I0302 15:09:29.104514  162666 version.go:254] remote version is much newer: v1.23.4; falling back to: stable-1.21
[init] Using Kubernetes version: v1.21.10
[preflight] Running pre-flight checks
        [WARNING FileExisting-ethtool]: ethtool not found in system path
        [WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [<hostname> kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 <newIP>]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [<hostname> localhost] and IPs [<newIP> 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [<hostname> localhost] and IPs [<newIP> 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

After some research, few mentioned about "swap" but it is disabled.
few mentioned about re installing docker, no use.
few mentioned about cgroup driver being different in docker and kubelet. Currently docker has systemd and Kubelet was running fine in old IP. So, how can I get that problem now when just the IP changed?

However, I even changed cgroup for docker to cgroupfs. but still Kubeadm init fails after time out.

Method 2: Once the IP changed, I used the below command to change the old IP to new IP in all the files under /etc/kubernetes.

sudo find /etc/kubernetes -type f | sudo xargs sed -i "s/{{OLD_IP}}/{{NEW_IP}}/"

The files updated with new IP are:

  1. /etc/kubernetes/manifests/kube-apiserver.yaml
  2. /etc/kubernetes/kubeadm-images.yaml
  3. /etc/kubernetes/kubelet.env
  4. /etc/kubernetes/admin.conf
  5. /etc/kubernetes/kubeadm-config.yaml

Then deleted the old apiserver certs from /etc/kubernetes/ssl. (pki folder is empty by default!)

Then I ran the below command to generate new certs.

kubeadm init phase certs apiserver --config=/etc/kubernetes/kubeadm-config.yaml

Then daemon-reload and I restarted kubelet and docker. but still has following errors.

Mär 02 13:50:17 katpk028 kubelet[1508]: E0302 13:50:17.792885    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:17 katpk028 kubelet[1508]: E0302 13:50:17.893543    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:17 katpk028 kubelet[1508]: E0302 13:50:17.994090    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.094771    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.195480    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.295748    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: I0302 13:50:18.308115    1508 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Get "https://127.0.0.1:6443/apis/storage.k8s.io/v1/csinodes/node1": dial tcp 127.0.0.1:6443: connect: connection refused
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.396733    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.497307    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.597797    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.698457    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.799237    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:18 katpk028 kubelet[1508]: E0302 13:50:18.900251    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.000831    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.101441    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.202033    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.303079    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: I0302 13:50:19.308486    1508 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Get "https://127.0.0.1:6443/apis/storage.k8s.io/v1/csinodes/node1": dial tcp 127.0.0.1:6443: connect: connection refused
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.403643    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.504064    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.604653    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.705629    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.806383    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:19 katpk028 kubelet[1508]: E0302 13:50:19.906777    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.007469    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.108269    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.208458    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: I0302 13:50:20.308233    1508 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Get "https://127.0.0.1:6443/apis/storage.k8s.io/v1/csinodes/node1": dial tcp 127.0.0.1:6443: connect: connection refused
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.309302    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.409717    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.510087    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.610935    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.711247    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.812115    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:20 katpk028 kubelet[1508]: E0302 13:50:20.912478    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.013087    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.113562    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.214178    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: I0302 13:50:21.307955    1508 csi_plugin.go:1024] Failed to contact API server when waiting for CSINode publishing: Get "https://127.0.0.1:6443/apis/storage.k8s.io/v1/csinodes/node1": dial tcp 127.0.0.1:6443: connect: connection refused
Mär 02 13:50:21 katpk028 kubelet[1508]: I0302 13:50:21.311093    1508 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-scheduler-node1" status=Running
Mär 02 13:50:21 katpk028 kubelet[1508]: I0302 13:50:21.311136    1508 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-apiserver-node1" status=Running
Mär 02 13:50:21 katpk028 kubelet[1508]: I0302 13:50:21.311148    1508 kubelet_getters.go:176] "Pod status updated" pod="kube-system/kube-controller-manager-node1" status=Running
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.315203    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.415941    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.480326    1508 eviction_manager.go:255] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.516060    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.617035    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.717364    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.805911    1508 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/node1?timeout=10s": dial tcp 127.0.0.1:6443: connect: connection refused
Mär 02 13:50:21 katpk028 kubelet[1508]: E0302 13:50:21.818137    1508 kubelet.go:2291] "Error getting node" err="node \"node1\" not found"

What did you expect to happen?

I expected that kubelet and kube-API server could fetch the new IP from config files and I could access the cluster and all the resources are expected to be just fine. (except few issues regarding IP)

Also, kubeadm init would bring the kubelet up.

How can we reproduce it (as minimally and precisely as possible)?

Change the IP address of Kubernetes master node and then try to access the cluster.

Anything else we need to know?

After Method 1, if I change the IP back to the old IP (actually), the Kubelet starts working and I could access and see the cluster resources without any issues.

However, after Method 2, if I revert all the changes to config to old IP and actually connect the old IP and restrat, I could see the same error in kubelet and I could not even access the cluster from the former IP address.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6", GitCommit:"d921bc6d1810da51177fbd0ed61dc811c5228097", GitTreeState:"clean", BuildDate:"2021-10-27T17:50:34Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6", GitCommit:"d921bc6d1810da51177fbd0ed61dc811c5228097", GitTreeState:"clean", BuildDate:"2021-10-27T17:44:26Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
node1 Ready control-plane,master 22m v1.21.6 Ubuntu 20.04.3 LTS 5.11.0-27-generic docker://20.10.8

Cloud provider

No cloud provider. Installation at the Edge.

OS version

NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Linux 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Kubeadm

Container runtime (CRI) and and version (if applicable)

Client: Docker Engine - Community
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:54:27 2021
OS/Arch: linux/amd64
Context: default
Experimental: true

Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:33 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

/sig k8s-infra
/sig network
/sig node

@hrabhijith hrabhijith added the kind/bug Categorizes issue or PR as related to a bug. label Mar 2, 2022
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 2, 2022
@k8s-ci-robot
Copy link
Contributor

@hrabhijith: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Mar 2, 2022
@hrabhijith
Copy link
Author

/sig k8s-infra
/sig network
/sig node

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 2, 2022
@aojea
Copy link
Member

aojea commented Mar 2, 2022

The cluster was created using "Kubeadm".

@neolit123 does kubeadm allows to change the IP? I don't think that can works since certificates are using those IPs right?

@hrabhijith
Copy link
Author

hrabhijith commented Mar 2, 2022

The idea behind running the kubeadm init second time is to regenerate the certs with new IP. I have deleted certs before running kubeadm init. I had also tried giving just --advertise-api-address= newIP to kubeadm init. But still it fails.

I have no idea about what does kubeadm do second time, after changing IP. I just followed steps which worked before for people in this scenario.

@neolit123
Copy link
Member

neolit123 commented Mar 2, 2022

Now, the IP address of the Master node changes due to DHCP or assume that I had to shift the Master node to other network or unplugged the ethernet and connected to a different NIC. (Dual NIC)

changing the IP of the CP node is not supported and complicated, you should have used the control-plane-endpoint option with a DNS name instead of --apiserver-advertise-address. see here:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint

and then just change the IP behind the DNS.

also related to switching from single CP node to HA:
kubernetes/website#27629

The idea behind running the kubeadm init second time is to regenerate the certs with new IP. I have deleted certs before running kubeadm init. I had also tried giving just --advertise-api-address= newIP to kubeadm init. But still it fails.

running the whole kubeadm init twice is not supported without a kubeadm reset first.

you could:

  • stop the kubelet on the CP node (downtime)
  • delete the certs on disk
  • use kubeadm init phase certs --config, and switch to using a DNS control-plane-endpoint
  • update the kube-public/cluster-info config map
  • restart kubelet on CP node
  • rejoin workers

other ideas here:
kubernetes/kubeadm#338

this is generally untested and unsupported.
if you'd like to contribute to the kubeadm docs, PRs are welcome.

/kind support
/close

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Mar 2, 2022
@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

Now, the IP address of the Master node changes due to DHCP or assume that I had to shift the Master node to other network or unplugged the ethernet and connected to a different NIC. (Dual NIC)

changing the IP of the CP node is not supported and complicated, you should have used the control-plane-endpoint option with a DNS name instead of --apiserver-advertise-address. see here:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#considerations-about-apiserver-advertise-address-and-controlplaneendpoint

and then just change the IP behind the DNS.

also related to switching from single CP node to HA:
kubernetes/website#27629

The idea behind running the kubeadm init second time is to regenerate the certs with new IP. I have deleted certs before running kubeadm init. I had also tried giving just --advertise-api-address= newIP to kubeadm init. But still it fails.

running the whole kubeadm init twice is not supported without a kubeadm reset first.

you could:

  • stop the kubelet on the CP node (downtime)
  • delete the certs on disk
  • use kubeadm init phase certs --config, and switch to using a DNS control-plane-endpoint
  • update the kube-public/cluster-info config map

other ideas here:
kubernetes/kubeadm#338

this is generally untested and unsupported.
if you'd like to contribute to the kubeadm docs, PRs are welcome.

/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

/remove-sig node network k8s-infra
/remove-kind bug
/sig cluster-lifecycle

@k8s-ci-robot k8s-ci-robot added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed sig/node Categorizes an issue or PR as relevant to SIG Node. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. kind/bug Categorizes issue or PR as related to a bug. labels Mar 2, 2022
@hrabhijith
Copy link
Author

hrabhijith commented Mar 2, 2022

use kubeadm init phase certs --config, and switch to using a DNS control-plane-endpoint

How to switch to DNS for control-plane-endpoint with existing cluster using IP address for both advertise-api-address and control-plane-endpoint? Should I edit kubeadm-config configmap in kube-system ? If yes, should I replace IP to DNS name for these two fields? or update DNS name to control-plane-endpoint only and delete advertise-api-address field?

After setting DNS for control-plane-endpoint, and generate new certs using that kubeadm-config, I can change the IP of the CP and everything should go fine??

@hrabhijith
Copy link
Author

hrabhijith commented Mar 3, 2022

I created a new cluster using kubeadm

sudo kubeadm init --control-plane-endpoint=<DNS>

Then I checked /etc/kubernetes contents:

/etc/kubernetes/controller-manager.conf:    server: https://<IP>:6443
/etc/kubernetes/manifests/etcd.yaml:    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://<IP>:2379
/etc/kubernetes/manifests/etcd.yaml:    - --advertise-client-urls=https://<IP>:2379
/etc/kubernetes/manifests/etcd.yaml:    - --initial-advertise-peer-urls=https://<IP>:2380
/etc/kubernetes/manifests/etcd.yaml:    - --initial-cluster=kadasd048=https://<IP>:2380
/etc/kubernetes/manifests/etcd.yaml:    - --listen-client-urls=https://127.0.0.1:2379,https://<IP>:2379
/etc/kubernetes/manifests/etcd.yaml:    - --listen-peer-urls=https://<IP>:2380
/etc/kubernetes/manifests/kube-scheduler.yaml:      value: <IP>,10.0.0.0/8,localhost,127.0.0.1
/etc/kubernetes/manifests/kube-scheduler.yaml:      value: <IP>,10.0.0.0/8,localhost,127.0.0.1
/etc/kubernetes/manifests/kube-controller-manager.yaml:      value: <IP>,10.0.0.0/8,localhost,127.0.0.1
/etc/kubernetes/manifests/kube-controller-manager.yaml:      value: <IP>,10.0.0.0/8,localhost,127.0.0.1
/etc/kubernetes/manifests/kube-apiserver.yaml:    kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: <IP>:6443
/etc/kubernetes/manifests/kube-apiserver.yaml:    - --advertise-address=<IP>
/etc/kubernetes/manifests/kube-apiserver.yaml:      value: <IP>,10.0.0.0/8,localhost,127.0.0.1
/etc/kubernetes/manifests/kube-apiserver.yaml:      value: <IP>,10.0.0.0/8,localhost,127.0.0.1
/etc/kubernetes/manifests/kube-apiserver.yaml:        host: <IP>
/etc/kubernetes/manifests/kube-apiserver.yaml:        host: <IP>
/etc/kubernetes/manifests/kube-apiserver.yaml:        host: <IP>
/etc/kubernetes/scheduler.conf:    server: https://<IP>:6443

All files have IP and not DNS.

Only places the DNS is used in this case are,

/etc/kubernetes/admin.conf:    server: https://<DNS>:6443
/etc/kubernetes/kubelet.conf:    server: https://<DNS>:6443

How this argument is helping to solve IP changes of the node? If changed manually, then above issues.

@aojea @neolit123

@neolit123
Copy link
Member

The DNS would at least help worker nodes to not have to rejoin.
As you found there are a lot of places with the IP. You have to manually change it everywhere...certificates, manifests, configmaps.

@neolit123
Copy link
Member

delete the certs on disk

If you already did this step on the CP node and generated new CA, the worker nodes would have to rejoin anyway because they no longer trust this server and they have no client credentials for it.

Its actually less work to recreate the cluster.
Maybe backup with Valero can help, but it should be done when the cluster is active.

@hrabhijith
Copy link
Author

Thanks for the replies.
In my case, there are no worker nodes. The cluster has only one node and that is CP. And when IP of the CP changes, even if I change the /etc/kubernetes contents everywhere (manifests, configmaps) and generate new apiserver certs with new IP, I am not able gain access to the cluster and Kubelet is giving errors, mainly connection refused.

I will check the backup option you mentioned.

@ATP-55
Copy link

ATP-55 commented Nov 24, 2022

Hello @neolit123 Do kubeadm has a solution on how the cluster can be setup so that it can handle dynamic change I.P address of Master or Worker nodes? As i understand from the thread we can only address for worker nodes but it doesn't supports I.P change of Master nodes.

@neolit123
Copy link
Member

not being able to handle dynamic IP changes is a wider k8s problem, so not only a kubeadm one.
to avoid the situation from the start one can use a DNS name for the workers to connect to the api server(s) - e.g. a load balancer DNS name.

this comment has more info:
#108453 (comment)

@ATP-55
Copy link

ATP-55 commented Dec 8, 2022

Thank you very much @neolit123 for your response! I have one last query, As you mentioned in below link with your suggestion, this can handle dynamic ip change of worker nodes but not applicable for ip change of master nodes.

#108453 (comment)

@neolit123
Copy link
Member

yes, if the LB DNS name stays the same workers will be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests

5 participants