Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using ip:port in kubeadm join command to render kubelet config and kube-proxy on node please #664

Closed
fanux opened this issue Jan 19, 2018 · 48 comments
Labels
area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@fanux
Copy link

fanux commented Jan 19, 2018

FEATURE REQUEST

What happened?

kubeadm join node not using the ip:port in join command. I want to using the LB ip and port to join node.

master0 1.1.1.1:6443
master1 2.2.2.2:6443
LB 3.3.3.3:6443

using kubeadm join 3.3.3.3:6443 ... but kubelet config and kube-proxy config may also be master0 ip or master1 ip, this behaviour is not expected in HA.

What you expected to happen?

I want kubeadm render configs using ip port in kubeadm join command.

Anything else we need to know?

Now I need change the node kubelet config and kubeproxy config manually

@timothysc
Copy link
Member

/assign @liztio

@timothysc timothysc added area/UX kind/feature Categorizes issue or PR as related to a new feature. labels Apr 6, 2018
@timothysc timothysc added this to the v1.11 milestone Apr 6, 2018
@timothysc timothysc added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 6, 2018
@jethrogb
Copy link

jethrogb commented Apr 10, 2018

See #598. Clearly a bug, not a feature request, based on logging output by kubeadm.

@timothysc timothysc removed this from the v1.11 milestone May 14, 2018
@timothysc timothysc added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 14, 2018
@timothysc timothysc added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jul 3, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2018
@jethrogb
Copy link

jethrogb commented Oct 1, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 1, 2018
@timothysc
Copy link
Member

timothysc commented Oct 11, 2018

/assign @rdodev

could you verify that this still exists for 1.12 and assign 1.13 milestone if you can repro given all our shuffling

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 11, 2018
@jethrogb
Copy link

#598 has easy repro steps

@rdodev
Copy link

rdodev commented Oct 16, 2018

Finally getting around this.

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Oct 16, 2018
@rdodev
Copy link

rdodev commented Oct 16, 2018

@timothysc was unable to replicate in 1.12

root@ip-10-0-0-43:~#  kubeadm join 10.0.0.106:6000 --token nwoa2x.cqar2ndxrtnw9arc --discovery-token-ca-cert-hash sha256:d993ceed705830e8a10fcf2cb29d7c2030217039c6ebafcfb2766dceb45ed885
[preflight] running pre-flight checks
	[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

[discovery] Trying to connect to API Server "10.0.0.106:6000"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.106:6000"
[discovery] Requesting info from "https://10.0.0.106:6000" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.106:6000"
[discovery] Successfully established connection with API Server "10.0.0.106:6000"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ip-10-0-0-43" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

root@ip-10-0-0-106:~# kubectl get nodes
NAME            STATUS     ROLES    AGE 
ip-10-0-0-106   NotReady   master   3m37s 
ip-10-0-0-43    NotReady   <none>   86s

@jethrogb
Copy link

@rdodev I reproduced it last week on 1.12. Why do you think kubeadm is actually connecting to 10.0.0.106:6000?

@rdodev
Copy link

rdodev commented Oct 16, 2018

@jethrogb firewall rules. In the repro steps you linked they're forcing via iptables.

@rdodev
Copy link

rdodev commented Oct 16, 2018

@jethrogb

root@ip-10-0-0-43:~# kubeadm join 10.0.0.106:6443 --token nwoa2x.cqar2ndxrtnw9arc --discovery-token-ca-cert-hash sha256:d993ceed705830e8a10fcf2cb29d7c2030217039c6ebafcfb2766dceb45ed885
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "10.0.0.106:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.106:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.0.0.106:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.0.106:6443: connect: connection refused]
[discovery] Failed to request cluster info, will try again: [Get https://10.0.0.106:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.0.106:6443: connect: connection refused]
^C
root@ip-10-0-0-43:~# kubeadm join 10.0.0.106:6000 --token nwoa2x.cqar2ndxrtnw9arc --discovery-token-ca-cert-hash sha256:d993ceed705830e8a10fcf2cb29d7c2030217039c6ebafcfb2766dceb45ed885
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "10.0.0.106:6000"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.106:6000"
[discovery] Requesting info from "https://10.0.0.106:6000" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.106:6000"
[discovery] Successfully established connection with API Server "10.0.0.106:6000"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
[kubelet] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[preflight] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ip-10-0-0-43" as an annotation

This node has joined the cluster:```

@jethrogb
Copy link

testuser@ali0:~$ sudo kubeadm join 10.198.0.221:6443 --token cykhjx.3kabrvhgdkwohqz5 --discovery-token-ca-cert-hash sha256:c2a5e209423b6dd23fe865d0de7a62e42a3638ae40b243885545e4b5152564db --ignore-preflight-errors=SystemVerification
[preflight] running pre-flight checks
	[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_sh ip_vs_rr ip_vs_wrr] or no builtin kernel ipvs support: map[ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{} ip_vs:{} ip_vs_rr:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

[discovery] Trying to connect to API Server "10.198.0.221:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.198.0.221:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.198.0.221:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.198.0.221:6443: connect: connection refused]
^C
testuser@ali0:~$ sudo kubeadm join 10.198.0.221:6000 --token cykhjx.3kabrvhgdkwohqz5 --discovery-token-ca-cert-hash sha256:c2a5e209423b6dd23fe865d0de7a62e42a3638ae40b243885545e4b5152564db --ignore-preflight-errors=SystemVerification
[preflight] running pre-flight checks
	[WARNING RequiredIPVSKernelModulesAvailable]: the IPVS proxier will not be used, because the following required kernel modules are not loaded: [ip_vs_sh ip_vs_rr ip_vs_wrr] or no builtin kernel ipvs support: map[ip_vs:{} ip_vs_rr:{} ip_vs_wrr:{} ip_vs_sh:{} nf_conntrack_ipv4:{}]
you can solve this problem with following methods:
 1. Run 'modprobe -- ' to load missing kernel modules;
2. Provide the missing builtin kernel ipvs support

[discovery] Trying to connect to API Server "10.198.0.221:6000"
[discovery] Created cluster-info discovery client, requesting info from "https://10.198.0.221:6000"
[discovery] Requesting info from "https://10.198.0.221:6000" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.198.0.221:6000"
[discovery] Successfully established connection with API Server "10.198.0.221:6000"
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.12" ConfigMap in the kube-system namespace
Get https://10.198.0.221:6443/api/v1/namespaces/kube-system/configmaps/kubelet-config-1.12: dial tcp 10.198.0.221:6443: connect: connection refused

@jethrogb
Copy link

On the master:

$ sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get cm -n kube-public cluster-info -oyaml
apiVersion: v1
data:
  jws-kubeconfig-cykhjx: eyJhbGciOiJIUzI1NiIsImtpZCI6ImN5a2hqeCJ9..BiYLnM2uq2lehUOez8n0tBuMqkErikP0ULsGzyAf_go
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNE1UQXhOakl6TVRNME5sb1hEVEk0TVRBeE16SXpNVE0wTmxvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTXNPCkN3OVpVazZJSTVBTjJSUVVyNmRlM1dpMmhOM2hUVSt5aEhCZVMrU0ttQUFPVkp0SmxSTHMwa0c0eXBSb3pENXIKQUphOVRaSi9XOFhLTWdIOUR3ckdHWC9OUzRVRzNoNXdyME5xMlBxeVVqMGZETUNBR2d2MGc3NlNGaTlCWGcrcwoyaEFmOEl5UFlOZ2F1WXFvMUttdjdleXVHUmp2Z2JnU1J2WVIwZWVWYkhxWTIvdlA3T2RBeXRBcytKcGFTS28zCmpVZTR3dGtEcTYralo4ZnlnUS9EbkkwY0pRK1pMaUVIS0d0T2JscnRNWlRxS0RsTXVQd0Y4TE4yQ1kyZUh1WUgKaTM3cUgxMHp1SmlQZXBmOXdVdzc1QkR3eUNlVTVTbUJWUFo0b2xJT3c3ZW5JdDhoNGVpWTlOSklDbHdPNUhDWApaWG0xYmp6L0FKdEhoejg5QXFVQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFBSzlGRkg5eDB0RXhaTGREWjkzQm4walYzTnEKRWl5VzBmUTVpOHBBdlBVV3dMUVd1dkpuM1BLSUtTcjlpZGdwSUthUk1FKzMyRGRXQzVvZDIyakdBQ1REdjBjdAoxbFBSM3RBSTAwQnY2bS9BM09NQVRqY1JKd1hhL0ZHMDdRMU1sbkxibGhXMTlqaVMxQU9ycjRGZ2l1Z3VJQy9uCmd0UWZ3ZHJqTEhZSDY1KzJPRGtnWldNVjBqbjdpZlNMdnlpamJjRUttVXpSZm44T0hmYldWNXRMd2dRN295dHYKRE5tWHdkRkc3WFh3MVZVZjJKQkhDVGJHNndVU1diVFRPbzB1NnJLazJQakZoKzU5QVl4R2I1Ynp4N2thTW8xZwpYZktrUVVWSVcxaGZhelpSUHYzbWEzTmNsSis0R3VIMGc2OThvaEpHZGFkVHpXNmx2WnhoUW9NKzgycz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
        server: https://10.198.0.221:6443
      name: ""
    contexts: []
    current-context: ""
    kind: Config
    preferences: {}
    users: []
kind: ConfigMap
metadata:
  creationTimestamp: 2018-10-16T23:14:15Z
  name: cluster-info
  namespace: kube-public
  resourceVersion: "288"
  selfLink: /api/v1/namespaces/kube-public/configmaps/cluster-info
  uid: 3318106a-d199-11e8-b21c-54e1ad024614

@rdodev
Copy link

rdodev commented Oct 17, 2018

@jethrogb don't know what to tell you. This is a mint fresh install on fresh infra.

root@ip-10-0-0-106:~# KUBECONFIG=/etc/kubernetes/admin.conf kubectl get cm -n kube-public cluster-info -oyaml
apiVersion: v1
data:
  jws-kubeconfig-nwoa2x: eyJhbGciOiJIUzI1NiIsImtpZCI6Im53b2EyeCJ9..Be2U7ch__XzQ7em8vLEw8WAX6dQZeeLXaKVjh_a7YYA
  kubeconfig: |
    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNE1UQXhOakl5TXpNME4xb1hEVEk0TVRBeE16SXlNek0wTjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBS2cxCjREbzhNbGtBSVZJM29xem9XK2trbUtmYjIyOGFLd1FzaXJsSTNMN2F1QnlrWC9JaEk0Tm9UYkZmMFpXbEdkRTYKUlVJNFdUZml1L2RqWXJqZG9YM2pZcGtxRERmTm5KNWxteGkzUStwbmVmM3hTWGtEbTNEOXFadWV0R0JXRTZzRwppNHIycUZxSmRnS21MMCswdnlXNmhkRUNUY1VwdFFTSzkzQmUxTzBMQnFRa1BLd0I0QjQ3Z3d6bGtSOFpaeTAyCm1zN1IvaE9lK0h5NEl2c0FQTmFQbHBpVFhQRyt5d2lLMkoxcXJBb0hzUDhNelNhdzN3OHB4bkJmc2V2NmErYWsKZm42b1p3QVJibi9yTDRNbHJaSlNpWC8vVEdvWTN5YlZYZ2lDWWVzMHNZQWR6T1Q3Sjl2VDBzYkRHK0Z2STFTYQpha05WUDJwdVNkdlhvcmtoc1JFQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFEbHJ5eklBRExwbUVEaURxdXhIdURPb2hmS0sKbVhMeVc4SkllYXFzT0k0cGd0aDJITDRJcG4vbm14VWF3bVh4SVB4YWc3N2I1cXZHcm5YTXN6SHd4WUp2SnJ0cgpJU2VyOVdvSmpuY0xVUnhkeTVBb3ZMWFZYZ3Y2S1dHVlFnMkt2dXpmNGMyL1ZVN09jNnpQMlRhNVJJaHgrcVU2CnBSeWN5Q2RJOUdaMUFpN0JSSTd1M3VtUjRiT3BhckpMaVRvZ2hsMGNDTlBDRDBhZ2dlNHBGemxSd0VLbEpINmMKMmgzcGFxZ0dQUU5YY1ZzcGdtbTgvQ2JvbFVta1d1RjZRTm1KemxuK2tUdlhkRTJiY3NkSUxyeU5Nb0J0L2paUQpoaVZxTnhBVWVuV1hEVk8wVnd5ZXRxY3crL2ZGb05jZndUL1FERXduQXpJd29SM3FHdUZXVk1aQllVZz0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
        server: https://10.0.0.106:6000
      name: ""
    contexts: []
    current-context: ""
    kind: Config
    preferences: {}
    users: []
kind: ConfigMap
metadata:
  creationTimestamp: 2018-10-16T22:34:14Z
  name: cluster-info
  namespace: kube-public
  resourceVersion: "314"
  selfLink: /api/v1/namespaces/kube-public/configmaps/cluster-info
  uid: 9c0579c2-d193-11e8-b95c-026da1fc2270```

@jethrogb
Copy link

jethrogb commented Oct 17, 2018

@rdodev That cluster-info looks like it was modified from the default. Using it will not reproduce the issue. What was your kubeadm init command?

@timothysc timothysc added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 30, 2018
@neolit123 neolit123 removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Jan 3, 2019
@timothysc timothysc added this to the v1.14 milestone Jan 4, 2019
@rosti
Copy link

rosti commented Feb 15, 2019

Folks, the interpretation of the API server endpoint command line argument during join is misleading. In truth, this is used only during bootstrap and for nothing else. And even so, it's used only during token based discovery. It will be ignored (without even a single warning) with kubeconfig based discovery.

So there are really a couple of problems here:

  • Bootstrap token based discovery API server endpoint is having a misleading UX. My best bet is to deprecate the standalone argument way of supplying this and introduce a descriptive command line switch for this (something like --discovery-token-apiserver). The supplied value then goes to joinCfg.Discovery.BootstrapToken.APIServerEndpoint.

  • If someone wishes to overwrite the actual API Server on a per node basis, we may have to modify the config (probably add a field in NodeRegistrationOptions and/or possibly command line switch?).
    Not persisting it has the potential to break something on subsequent kubeadm runs (such as on upgrade) so we may need to store it as an annotation too.

@yagonobre
Copy link
Member

  • Bootstrap token based discovery API server endpoint is having a misleading UX. My best bet is to deprecate the standalone argument way of supplying this and introduce a descriptive command line switch for this (something like --discovery-token-apiserver). The supplied value then goes to joinCfg.Discovery.BootstrapToken.APIServerEndpoint.

+1

  • If someone wishes to overwrite the actual API Server on a per node basis, we may have to modify the config (probably add a field in NodeRegistrationOptions and/or possibly command line switch?).
    Not persisting it has the potential to break something on subsequent kubeadm runs (such as on upgrade) so we may need to store it as an annotation too.

IMO we shouldn't overwrite this on join time, maybe we can have a command to update the Api Server Endpoint on cluster-info configmap

@rosti
Copy link

rosti commented Feb 15, 2019

I wasn't clear actually. By "modify the config" I meant to actually add a new field somewhere in the next config format (possibly v1beta2) and probably persist it somewhere in the cluster (node annotation?).
This needs some discussing though and probably won't happen in the current cycle (especially if we go along the "adding a config option" way).

What we can certainly do in this cycle is to add a command line switch for the bootstrap token discovery API server and deprecate supplying it as an anonymous argument.

@neolit123 @fabriziopandini WDYT?

@neolit123
Copy link
Member

neolit123 commented Feb 15, 2019

What we can certainly do in this cycle is to add a command line switch for the bootstrap token discovery API server and deprecate supplying it as an anonymous argument.

Tim and Fabrizio sort of disagreed.
but i'm all +1 for running the GA deprecation policy on that arg.

it's nothing but trouble.

@rosti
Copy link

rosti commented Feb 18, 2019

@neolit123 even if we don't go on the command line switch track, we can actually do a better job in documenting the arg, both in the inline tool help (kubeadm join --help) and in the website docs.
I assume, that better docs can (sort of) "fix" the problem too and that this can be done as part of the join phases docs write up.

@neolit123 neolit123 modified the milestones: v1.14, v1.15 Mar 11, 2019
@timothysc timothysc modified the milestones: v1.15, Next Apr 30, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2019
@neolit123
Copy link
Member

neolit123 commented Aug 2, 2019

reading trough the whole ticket there are multiple issues that are either misunderstandings, covered elsewhere or not documented.

the original issue:

the original issue talks about:

but kubelet config and kube-proxy config may also be master0 ip or master1 ip, this behaviour is not expected in HA.

i do not understand this problem, so if you feel this is still viable to 1.13 (1.12 is not supported anymore) please log a new ticket with full details and examples.
this ticket deviated a lot.

the misleading anonymous join argument:

this was tracked in here:
#1375

we were planning to switch to a named argument for discovery e.g. --discovery-endpoint but opted out of this idea and you are going to have to continue using kubeadm join addr:ip

transitioning from single control-plane to HA:

was tracked here:
#1664

a PR for this topic just merged in our docs:
kubernetes/website#15524

the TL;DR is that if you modify the address of the api-server on a control-plane node after the cluster is created you need to regenerate certificates and patch cluster-objects. to better transition to HA use DNS names.


if there is something else at play, please create clean tickets with full details so that the maintainers can grasp the problem.

@jethrogb
Copy link

@neolit123 #598 got merged into this but it's not clear to me if it is resolved.

@neolit123
Copy link
Member

neolit123 commented Aug 26, 2019

@jethrogb

#598 got merged into this

if your problem from #598 is still viable, please reopen the issue but please mind that it has to be reproducible with 1.13+, because older versions are outside of the support skew and are not supported by the kubeadm team.

@masaeedu
Copy link

masaeedu commented Jul 29, 2020

@neolit123 I have a Kubernetes control plane node running inside a Docker container via https://github.com/kubernetes-sigs/kind. The api server is exposed on the Docker host via port forwarding. I need to add a worker node in the Docker host's network to the cluster.

Obviously the IP address and hostname of the container running the kubelet and Docker host where the API is exposed via port forwarding differ, so we run into the problems being described in this issue. For one thing, when someone reaches the master node's API via the forwarded port on the Docker host, the IP address does not match the certificate. This is easy to fix: we can just add the Docker host's IP to certificateSANs when using kubeadm to deploy the cluster.

The other problem (which is harder to solve) is that when we try to join the worker node to the cluster we need to consistently override the API endpoint being used to reach the master node (i.e. it should use the Docker host's IP everywhere, and not the internal IP address of the Docker container, to which it has no access).

As far as I understand there's still no way to do this, or at least I can't see one from looking at the flags for kubeadm join, and making the host:port position argument serve this purpose is what this issue was asking for (admittedly I didn't fully understand the counterargument against this). Am I missing something?

@neolit123
Copy link
Member

neolit123 commented Jul 29, 2020

The other problem (which is harder to solve) is that when we try to join the worker node to the cluster we need to consistently override the API endpoint being used to reach the master node (i.e. it should use the Docker host's IP everywhere, and not the internal IP address of the Docker container, to which it has no access).

this seems like an unsupported scenario by kind.
did you get feedback from the kind maintainers (or #kind on k8s slack)?

As far as I understand there's still no way to do this, or at least I can't see one from looking at the flags for kubeadm join, and making the host:port position argument serve this purpose is what this issue was asking for (admittedly I didn't fully understand the counterargument against this). Am I missing something?

the OP of this issue was confusing and i don't think it's related to your problem.

@masaeedu
Copy link

@neolit123 The important question is whether it's a supported scenario by kubeadm; kind just wraps kubeadm and a container runtime. I can think of a number of other scenarios that don't involve kind where a kubernetes control plane node's API port is forwarded somewhere else, and the worker node must be registered to it in a network where the original control plane address is not accessible. E.g. using an SSH tunnel, or a TCP reverse proxy.

@neolit123
Copy link
Member

neolit123 commented Jul 29, 2020

kubeadm join needs an kube-apiserver endpoint to perform discovery and Node bootstrap.
that kube-apiserver could be anywhere - same network or another network and kubeadm does support those cases.
the endpoint can be load balancer endpoint too.

that endpoint is then written on a worker node's kubelet.conf file that is used to communicate to the API server

you can omit the positional argument completely from kubeadm join and use JoinConfiguration's Discovery field.

The other problem (which is harder to solve) is that when we try to join the worker node to the cluster we need to consistently override the API endpoint being used to reach the master node (i.e. it should use the Docker host's IP everywhere, and not the internal IP address of the Docker container, to which it has no access).

this is seems like a problem of the high-level software that uses kubeadm (e.g. kind).
the high level software is not executing kubeadm join with the endpoint you desire.

The important question is whether it's a supported scenario by kubeadm; kind just wraps kubeadm and a container runtime. I can think of a number of other scenarios that don't involve kind where a kubernetes control plane node's API port is forwarded somewhere else, and the worker node must be registered to it in a network where the original control plane address is not accessible. E.g. using an SSH tunnel, or a TCP reverse proxy.

if a kube-apiserver is not accessible, kubeadm cannot join this new Node to the cluster. period.
kubeadm join needs a valid endpoint to which a k8s client can connect to to perform discovery and validation, which then would lead to TLS bootstrap and the creation of a new Node object.

so yes, kubeadm join does need a valid / reachable API server endpoint.

@masaeedu
Copy link

masaeedu commented Jul 29, 2020

the high level software is not executing kubeadm join with the endpoint you desire.

It's not kind that's executing kubeadm join, it's me. I'm executing kubeadm join manually, providing the address of the Docker host where the API is exposed via port forwarding (note that this does not match the --control-plane-endpoint that was used to start the control plane node itself; that address is not accessible to the worker node).

The problem is that the address I provide to kubeadm join is not used consistently throughout the join process: it is only used in the initial stages, after which the process fails because at some point the worker node downloads configuration from the the control plane API, and then starts using the original, inaccessible address corresponding to the --control-plane-endpoint argument that was used to start the control plane node.

@masaeedu
Copy link

masaeedu commented Jul 29, 2020

if a kube-apiserver is not accessible, kubeadm cannot join this new Node to the cluster. period.

The kube-apiserver is accessible via port forwarding. It is not accessible at the original address that was specified using --advertise-addr or --control-plane-endpoint when kubeadm init was used, because that address is a function of the network in which the control plane node itself is running, and not necessarily of the network in which the joining worker is running.

@neolit123
Copy link
Member

please log a separate issue and provide IP addresses and concrete examples of your setup.

@jethrogb
Copy link

@neolit123 it's not clear to me why yet another issue is needed. This issue has already been reported several times over the past several years and it's the same problem every time: you run kubeadm join ADDRESS and at some point ADDRESS (which works) is swapped out for something else (which doesn't).

@neolit123
Copy link
Member

let's start in a fresh issue to see:

  • the precise minimal reproduction steps.
  • affected kubeadm versions.

@neolit123
Copy link
Member

affected kubeadm versions.

in fact, i'd be really curious about the above because looking at our logic for 1.17 and 1.18 under:
https://github.com/kubernetes/kubernetes/tree/release-1.17/cmd/kubeadm/app/discovery
https://github.com/kubernetes/kubernetes/tree/release-1.18/cmd/kubeadm/app/discovery

the endpoint you feed as positional argument or via JoinConfiguration ends up in the validated bootstrap-kubelet.conf that is written on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/UX help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests