Can't spinup a working cluster with default settings #1803

shadycuz · 2017-10-15T14:26:30Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Environment:

Cloud provider or hardware configuration:
Scaleway (bare hardware)
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
Linux 4.4.92-mainline-rev1 x86_64
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
Version of Ansible (ansible --version):
ansible 2.4.0.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/etc/ansible/module']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]

Kubespray version (commit) (git rev-parse --short HEAD):
92d0380

Network plugin used:
Default

Copy of your inventory file:

Command used to invoke ansible:
ansible-playbook -b kubespray/cluster.yml -u root -i /etc/ansible/inventory/hosts

Output of ansible run:

Anything else do we need to know:

Cluster seems healthy, I can launch deployments etc. from the console of one of the three master servers. When trying to launch a dashboard using kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.6.3.yaml
everything is created normally. When trying to access it remotely using /ui I was able to login but retrieved an error. Thats when I noticed this user doesn't seem to be able to do anything?

Trying to reach /api/v1/nodes

returns


  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "nodes is forbidden: User \"kube\" cannot list nodes at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "kind": "nodes"
  },
  "code": 403
}

???

root@Master03:~# cat /etc/kubernetes/users/known_users.csv
changeme,kube,admin

Maybe I am missing something? It's been a while since I stood something up with kubespray but in the past using defaults was always a sure thing.

The text was updated successfully, but these errors were encountered:

shadycuz · 2017-10-17T13:40:27Z

Possibly being affected by this kubernetes/kubeadm#484

shadycuz · 2017-10-18T12:51:02Z

@mattymo

mattymo · 2017-10-18T13:24:57Z

@shadycuz

I deployed an environment by hand with the following settings:

kubeadm_enabled: false
kube_basic_auth: true

Then I tried to list nodes with my generated credentials:

root@k8s-mattymo-test-1:~# curl -s https://kube:qppUCGxPJoBlCwE@localhost:6443/api/v1/nodes | head -5
{
  "kind": "NodeList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/nodes",

The UI works as well.

If you want to access the UI but not enable kube_basic_auth (or can't because you are using kubeadm mode), you can access the UI with kubectl proxy https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#using-kubectl-proxy . Keep in mind you want to use the following options on your host to allow this:

kubectl_localhost: true
kubeconfig_localhost: true

shadycuz · 2017-10-18T13:38:27Z

@mattymo Thanks for this. Like I said I used all defaults and that used to give me what I wanted. I guess things have changed and I will need to look through those var's files for now on. I also need to look up kubeadm as I don't know what that is. I will recreate the cluster soon with those settings.

Thanks.

mattymo · 2017-10-18T13:47:02Z

We disabled basic auth because it's a much weaker attack vector to an account with admin privileges than a proper x509 cert.

shadycuz · 2017-10-18T13:53:40Z

@mattymo I need to look into it more. I have an ELB pointed at my Masters on port 6883? Or whichever port API is listening and that is how I always reached the cluster and used basic auth to hit the dashboard. If I can still reach the UI and run commands like kubectl create while using that load balancer then I am fine with use kubeadm and certs. I just didn't think I could log into the gui via a cert? So I need to look into it more.

shadycuz · 2017-10-18T14:12:24Z

@mattymo Ahh, Look at what user floreks wrote here kubernetes/kubernetes#31665

I will try to install that 509 cert in my browser and see if I can reach api and dashboard throug my Nginx reverse proxy. Mainly because I don't want to use a VPN to hit the cluster.

mattymo · 2017-10-18T14:38:22Z

@shadycuz Enabling kube_basic_auth is easier than authenticating with a cert in a browser, in my experience.

shadycuz · 2017-10-18T14:39:39Z

@mattymo I will just do that, its a personal cluster for fun and learning. I will set a really long random password and call it a day =). Thanks for adding that kubeconfig role to set it up for end user, Super Nice!

shadycuz · 2017-10-19T12:44:23Z

@mattymo maybe I should start a new issue but I spun up a new cluster this morning I used

kubeadm_enabled: false
kube_basic_auth: true
kubectl_localhost: true
kubeconfig_localhost: true

unfortuantly I lost the ansible output, but all steps were changed or okay untill it was time to wait for the api servers to come up. 20 tries I think and all 3 failed. Was checking 127.0.0.1:8080/health or something like that.

I checked the host docker and something isnt right...

root@Master01:/# service docker status
● docker.service - Docker Application Container Engine
   Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─docker-dns.conf, docker-options.conf
   Active: active (running) since Thu 2017-10-19 11:58:31 UTC; 36min ago
     Docs: http://docs.docker.com
 Main PID: 9240 (dockerd)
    Tasks: 35
   Memory: 1.0G
      CPU: 3min 38.876s
   CGroup: /system.slice/docker.service
           ├─ 9240 dockerd --insecure-registry=10.233.0.0/18 --graph=/var/lib/docker --log-opt max-size=50m --log-opt max-file=5 --iptables=false --dns 10.233.0.3 --dns 10.1.94.8 --dns-search default.svc.cluster.local
           ├─ 9252 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-           └─15654 docker-containerd-shim c0ae8f0a2e9b6e7484463b977586047a4539f7093bde041f90cd89e2b158e52f /var/run/docker/libcontainerd/c0ae8f0a2e9b6e7484463b977586047a4539f7093bde041f90cd89e2b158e52f docker-runc

Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.935797440Z" level=error msg="Handler for GET /v1.26/containers/498ffffcfd05/json returned error: No such container: 498ffffcfd05"
Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.941314558Z" level=error msg="Handler for GET /v1.26/containers/ff1e9c00bb46/json returned error: No such container: ff1e9c00bb46"
Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.945310473Z" level=error msg="Handler for GET /v1.26/containers/00bc1e841a8f/json returned error: No such container: 00bc1e841a8f"
Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.949787846Z" level=error msg="Handler for GET /v1.26/containers/99e59f495ffa/json returned error: No such container: 99e59f495ffa"
Oct 19 12:09:45 Master01 docker[9240]: time="2017-10-19T12:09:45.526205467Z" level=error msg="Handler for DELETE /v1.26/containers/etcdctl-binarycopy returned error: No such container: etcdctl-binarycopy"
Oct 19 12:10:37 Master01 docker[9240]: time="2017-10-19T12:10:37.676144794Z" level=error msg="Handler for DELETE /v1.26/containers/etcd1 returned error: No such container: etcd1"
Oct 19 12:13:36 Master01 docker[9240]: time="2017-10-19T12:13:36.054842758Z" level=error msg="Handler for DELETE /v1.26/containers/etcdctl-binarycopy returned error: No such container: etcdctl-binarycopy"
Oct 19 12:19:46 Master01 docker[9240]: time="2017-10-19T12:19:46.363721789Z" level=error msg="Handler for GET /v1.26/containers/kubelet/archive returned error: No such container: kubelet"
Oct 19 12:19:52 Master01 docker[9240]: time="2017-10-19T12:19:52.395258535Z" level=error msg="Handler for POST /v1.26/containers/kubelet/stop returned error: No such container: kubelet"
Oct 19 12:20:07 Master01 docker[9240]: time="2017-10-19T12:20:07.494390285Z" level=error msg="Handler for POST /v1.26/containers/create returned error: No such image: quay.io/coreos/hyperkube:v1.6.7_coreos.0"

mattymo · 2017-10-19T12:48:39Z

Why are you deploying kubernetes version 1.6.7 instead of 1.8.0?

shadycuz · 2017-10-19T12:51:50Z

@mattymo Idk?

ansible@rundeck:~/kubespray$ git pull
Already up-to-date.
ansible@rundeck:~/kubespray$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   roles/kubernetes/master/defaults/main.yml
        modified:   roles/kubespray-defaults/defaults/main.yaml

no changes added to commit (use "git add" and/or "git commit -a")
ansible@rundeck:~/kubespray$ grep version roles/kubespray-defaults/defaults/main.yaml
## Change this to use another Kubernetes version, e.g. a current beta release
kube_version: v1.6.7
## When OpenStack is used, Cinder version can be explicitly specified if autodetection fails (https://github.com/kubernetes/kubernetes/issues/50461)
# openstack_blockstorage_version: "v1/v2/auto (default)"

shadycuz · 2017-10-19T12:54:25Z

I guess I thought the defaults with out changing much would give me a good cluster out of the box, I know yesterday when I launched a cluster with all defaults it came up okay except basic auth wasn't enabled.

mattymo · 2017-10-19T13:11:42Z

You shouldn't remove the group vars that are included. We can't guarantee that the role defaults are set correctly because it's not covered by CI.

I have a fix on review for the kube version issue: #1845

shadycuz · 2017-10-19T13:26:34Z

@mattymo what do you mean remove groups vars? I didn't? You mean I need to put my inventory in kubespray inventory and run kubespray from inside the kubespray directory?

shadycuz · 2017-10-19T13:30:58Z

@mattymo I think most of my problems have been from not using the groups vars ..... =/ I will stand up a new cluster and put my inventory in the proper place...

shadycuz · 2017-10-19T13:31:54Z

and then I will submit my vars via CLI so they take precedence over anything else cause im not really sure where should I change what

mattymo · 2017-10-19T14:19:45Z

@shadycuz You can make your own json or yaml file with vars and then specify it on the CLI like -e @my_vars.yml

shadycuz · 2017-10-19T15:02:33Z

@mattymo The ansible config file inside of kubespray does not specify a path to an inventory, so dropping an inventory into kubespray/inventory/hosts and then running ansible-playbook -b cluster.yml fails. I had to run mine with ansible-playbook -e my_vars.yml -i inventory/hosts -u root -b cluster.yml but I have no clue if the group variables got called into "action".

shadycuz · 2017-10-19T15:05:24Z

Okay groupsvars should have worked?

In addition to storing variables directly in the inventory file, host and group variables can be stored in individual files relative to the inventory file (not directory, it is always the file).

shadycuz · 2017-10-19T16:10:21Z

@mattymo playbook ran with out erros, first thing I noticed was no artifacts dir?

I used ansible-playbook -i inventory/hosts -b -e my_vars.yml -u root cluster.yml

and

ansible@rundeck:~/kubespray$ cat my_vars.yml
kubeadm_enabled: false
kube_basic_auth: true
kubectl_localhost: true
kubeconfig_localhost: true

shadycuz · 2017-10-19T16:14:12Z

@mattymo well this aint good, I don't know what has happened to kubespray but at one point in time it actually worked with defaults out of the box with my exact setup.

root@Master01:~# kubectl get pods -n kube-system
NAME                                    READY     STATUS              RESTARTS   AGE
calico-node-456fb                       0/1       ContainerCreating   0          33m
calico-node-9x89z                       0/1       ContainerCreating   0          33m
calico-node-g5w8b                       0/1       ContainerCreating   0          33m
calico-node-h2q8m                       0/1       ContainerCreating   0          33m
calico-node-mcgvr                       0/1       ContainerCreating   0          33m
calico-node-xkss4                       0/1       ContainerCreating   0          33m
kube-apiserver-master01                 1/1       Running             0          36m
kube-apiserver-master02                 1/1       Running             0          36m
kube-apiserver-master03                 1/1       Running             0          36m
kube-controller-manager-master01        1/1       Running             0          36m
kube-controller-manager-master02        1/1       Running             0          36m
kube-controller-manager-master03        1/1       Running             0          36m
kube-dns-596d7c8f8-whdzz                0/3       ContainerCreating   0          30m
kube-proxy-compute01                    1/1       Running             0          37m
kube-proxy-compute02                    1/1       Running             0          37m
kube-proxy-compute03                    1/1       Running             0          36m
kube-proxy-master01                     1/1       Running             0          37m
kube-proxy-master02                     1/1       Running             0          37m
kube-proxy-master03                     1/1       Running             0          37m
kube-scheduler-master01                 1/1       Running             0          37m
kube-scheduler-master02                 1/1       Running             0          37m
kube-scheduler-master03                 1/1       Running             0          37m
kubedns-autoscaler-86c47697df-rsjr7     0/1       ContainerCreating   0          30m
kubernetes-dashboard-7fd45476f8-9c7pz   0/1       ContainerCreating   0          30m
nginx-proxy-compute01                   1/1       Running             0          36m
nginx-proxy-compute02                   1/1       Running             0          37m
nginx-proxy-compute03                   1/1       Running             0          37m

shadycuz · 2017-10-19T16:15:55Z

@mattymo from the dashboard pod

Tolerations:     node-role.kubernetes.io/master:NoSchedule
Events:
  Type     Reason       Age                From                Message
  ----     ------       ----               ----                -------
  Normal   Scheduled    32m                default-scheduler   Successfully assigned kubernetes-dashboard-7fd45476f8-9c7pz to compute02
  Warning  FailedSync   16m (x7 over 30m)  kubelet, compute02  Error syncing pod
  Warning  FailedMount  12m (x9 over 30m)  kubelet, compute02  Unable to mount volumes for pod "kubernetes-dashboard-7fd45476f8-9c7pz_kube-system(2496f520-b4e4-11e7-bf2a-de19444b4004)": timeout expired waiting for volumes to attach/mount for pod "kube-system"/"kubernetes-dashboard-7fd45476f8-9c7pz". list of unattached/unmounted volumes=[kubernetes-dashboard-token-slvz2]
  Warning  FailedMount  1m (x23 over 32m)  kubelet, compute02  MountVolume.SetUp failed for volume "kubernetes-dashboard-token-slvz2" : secrets "kubernetes-dashboard-token-slvz2" is forbidden: User "system:node:Compute02" cannot get secrets in the namespace "kube-system": no path found to object

shadycuz · 2017-10-19T17:31:15Z

I didn't use the @sign....

shadycuz · 2017-10-19T22:17:17Z

Tried again with a new cluster

TASK [network_plugin/calico : Calico | wait for etcd] **********************************************************************************************
Thursday 19 October 2017  17:53:35 -0400 (0:00:02.749)       0:26:59.961 ******
FAILED - RETRYING: Calico | wait for etcd (10 retries left).
FAILED - RETRYING: Calico | wait for etcd (9 retries left).
FAILED - RETRYING: Calico | wait for etcd (8 retries left).
FAILED - RETRYING: Calico | wait for etcd (7 retries left).
FAILED - RETRYING: Calico | wait for etcd (6 retries left).
FAILED - RETRYING: Calico | wait for etcd (5 retries left).
FAILED - RETRYING: Calico | wait for etcd (4 retries left).
FAILED - RETRYING: Calico | wait for etcd (3 retries left).
FAILED - RETRYING: Calico | wait for etcd (2 retries left).
FAILED - RETRYING: Calico | wait for etcd (1 retries left).
fatal: [Compute02 -> None]: FAILED! => {"attempts": 10, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error ('_ssl.c:574: The handshake operation timed out',)>", "redirected": false, "status": -1, "url": "https://localhost:2379/health"}

rhino5oh · 2018-03-05T21:58:12Z

@shadycuz did you ever figure out a solution to this error you were getting?

fatal: [Compute02 -> None]: FAILED! => {"attempts": 10, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error ('_ssl.c:574: The handshake operation timed out',)>", "redirected": false, "status": -1, "url": "https://localhost:2379/health"}

I get the same thing sometimes for the "wait for etcd" task and every single time on the "Master | wait for the apiserver to be running" handler

shadycuz · 2018-03-11T14:46:46Z

@rhino5oh I did get past this, but I don't remember how. If you are using the latest code from here it might be something different from my issue. I would ask on the kubernetes slack. Kubespay has a room and I always get good help from people. You might want to SSH into your etcd nodes and check the logs and see what it says. Also make a new issue =)

shadycuz changed the title ~~Can't access api with basic auth~~ Can't access api with basic auth (user kube has no permission for anything) Oct 15, 2017

shadycuz mentioned this issue Oct 17, 2017

fix #1788 lock dashboard version to 1.6.3 #1805

Merged

shadycuz closed this as completed Oct 18, 2017

shadycuz reopened this Oct 19, 2017

shadycuz changed the title ~~Can't access api with basic auth (user kube has no permission for anything)~~ Can't spinup a working cluster with default settings Oct 19, 2017

shadycuz closed this as completed Oct 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't spinup a working cluster with default settings #1803

Can't spinup a working cluster with default settings #1803

shadycuz commented Oct 15, 2017

shadycuz commented Oct 17, 2017

shadycuz commented Oct 18, 2017

mattymo commented Oct 18, 2017

shadycuz commented Oct 18, 2017

mattymo commented Oct 18, 2017

shadycuz commented Oct 18, 2017

shadycuz commented Oct 18, 2017

mattymo commented Oct 18, 2017

shadycuz commented Oct 18, 2017

shadycuz commented Oct 19, 2017

mattymo commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

mattymo commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

mattymo commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

rhino5oh commented Mar 5, 2018

shadycuz commented Mar 11, 2018

Can't spinup a working cluster with default settings #1803

Can't spinup a working cluster with default settings #1803

Comments

shadycuz commented Oct 15, 2017

shadycuz commented Oct 17, 2017

shadycuz commented Oct 18, 2017

mattymo commented Oct 18, 2017

shadycuz commented Oct 18, 2017

mattymo commented Oct 18, 2017

shadycuz commented Oct 18, 2017

shadycuz commented Oct 18, 2017

mattymo commented Oct 18, 2017

shadycuz commented Oct 18, 2017

shadycuz commented Oct 19, 2017

mattymo commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

mattymo commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

mattymo commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

shadycuz commented Oct 19, 2017

rhino5oh commented Mar 5, 2018

shadycuz commented Mar 11, 2018