Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout issue waiting for kube-apiserver #167

Open
owenmorgan opened this issue Sep 17, 2016 · 5 comments
Open

Timeout issue waiting for kube-apiserver #167

owenmorgan opened this issue Sep 17, 2016 · 5 comments

Comments

@owenmorgan
Copy link

owenmorgan commented Sep 17, 2016

When I get to running

ansible-playbook -u core --ssh-common-args="-i /tmp/kubeform/terraform/aws/public-cloud/id_rsa -q" --inventory-file=inventory site.yml -e kube_apiserver_vip=$(cd /tmp/kubeform/terraform/aws/public-cloud && terraform output master_elb_hostname)

It runs through fine until it waits for the kube-apiserver task. It will time out.

TASK [kube-master : wait for kube-apiserver up] ********************************
fatal: [kube-master-2]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for 127.0.0.1:8080"}
fatal: [kube-master-0]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for 127.0.0.1:8080"}
fatal: [kube-master-1]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for 127.0.0.1:8080"}

I have checked on one of the masters. and see this

ip-10-0-2-98 core # curl http://127.0.0.1:8080
curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection refused
ip-10-0-2-98 core # sudo systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2016-09-17 18:06:58 BST; 4min 55s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Main PID: 5072 (kubelet)
Tasks: 7
Memory: 37.8M
CPU: 4.551s
CGroup: /system.slice/kubelet.service
└─5072 /kubelet --api-servers=http://127.0.0.1:8080 --network-plugin-dir=/etc/kubernetes/cni/net.d --network-plugin=cni --register-schedulable=false --allow-privileged=true --config=/etc/kubernetes/manifests --hostn

Sep 17 18:11:52 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: I0917 17:11:52.114644 5072 kubelet.go:1197] Attempting to register node ip-10-0-2-98.eu-west-1.compute.internal
Sep 17 18:11:52 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: I0917 17:11:52.114881 5072 kubelet.go:1200] Unable to register ip-10-0-2-98.eu-west-1.compute.internal with the apiserver: Post http://127.0.0.1
Sep 17 18:11:52 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:52.405733 5072 reflector.go:205] pkg/kubelet/kubelet.go:286: Failed to list *api.Node: Get http://127.0.0.1:8080/api/v1/nodes?fieldS
Sep 17 18:11:52 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:52.420330 5072 generic.go:197] GenericPLEG: Unable to retrieve pods: Cannot connect to the Docker daemon. Is the docker daemon runni
Sep 17 18:11:52 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:52.420417 5072 reflector.go:205] pkg/kubelet/config/apiserver.go:43: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?
Sep 17 18:11:52 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:52.421192 5072 reflector.go:205] pkg/kubelet/kubelet.go:267: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?
Sep 17 18:11:53 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:53.406291 5072 reflector.go:205] pkg/kubelet/kubelet.go:286: Failed to list *api.Node: Get http://127.0.0.1:8080/api/v1/nodes?fieldS
Sep 17 18:11:53 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:53.420731 5072 reflector.go:205] pkg/kubelet/config/apiserver.go:43: Failed to list *api.Pod: Get http://127.0.0.1:8080/api/v1/pods?
Sep 17 18:11:53 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:53.420890 5072 generic.go:197] GenericPLEG: Unable to retrieve pods: Cannot connect to the Docker daemon. Is the docker daemon runni
Sep 17 18:11:53 ip-10-0-2-98.eu-west-1.compute.internal kubelet-wrapper[5072]: E0917 17:11:53.421620 5072 reflector.go:205] pkg/kubelet/kubelet.go:267: Failed to list *api.Service: Get http://127.0.0.1:8080/api/v1/services?

Any ideas?

Thanks

@owenmorgan
Copy link
Author

might be worth noting that when i ssh into one of the masters i receive this message.

Failed Units: 3
docker.service
setup-network-environment.service
docker.socket

@enxebre
Copy link
Contributor

enxebre commented Sep 21, 2016

hey @owenmorgan what does systemd docker logs say?

@tamsky
Copy link
Contributor

tamsky commented Nov 17, 2016

I see the same thing, as well, workers look like:

Failed Units: 2
  docker.service
  setup-network-environment.service

@enxebre here are the systemd docker.service logs for a master

core@ip-10-0-1-61 ~ $ journalctl --unit=docker.service  | cat
[snip until reboot]
-- Reboot --
Nov 17 20:45:22 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:4] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:22 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:5] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:29 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:4] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:29 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:5] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:30 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:4] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:30 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: [/etc/systemd/system/docker.s
ervice.d/60-wait-for-flannel-config.conf:5] Unknown lvalue 'Restart' in section 'Unit'
Nov 17 20:45:35 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: Started Docker Application Co
ntainer Engine.
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal dockerd[1028]: dockerd: "dockerd" require
s 0 arguments.
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal dockerd[1028]: Usage:        dockerd [OPT
IONS]
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: docker.service: Main process
exited, code=exited, status=1/FAILURE
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: docker.service: Unit entered
failed state.
Nov 17 20:45:37 ip-10-0-1-61.us-west-2.compute.internal systemd[1]: docker.service: Failed with r
esult 'exit-code'.
[snip]

@enxebre
Copy link
Contributor

enxebre commented Nov 18, 2016

https://github.com/Capgemini/kubeform/blob/master/terraform/aws/public-cloud/master-cloud-config.yml.tpl#L46 seems wrong and duplicated.
restart=always should be inside [service] , probably same for the others cloud-config

@methril
Copy link

methril commented Nov 29, 2016

I had similar issues on DO.
I get less ssh timeout issues adding the following codes to my .ssh/config file
Host kube-*
Port 22
User core
StrictHostKeyChecking=no
UserKnownHostsFile=/dev/null

Later on I had some more ansible-playbook errors due to Out of Memory in some steps, and some of the candidates to be killed where kube api server and others.
I solved instancing 4gb machines on DO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants