Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't spinup a working cluster with default settings #1803

Closed
shadycuz opened this issue Oct 15, 2017 · 27 comments
Closed

Can't spinup a working cluster with default settings #1803

shadycuz opened this issue Oct 15, 2017 · 27 comments

Comments

@shadycuz
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (choose one):
BUG REPORT

Environment:

  • Cloud provider or hardware configuration:
    Scaleway (bare hardware)
  • OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"):
    Linux 4.4.92-mainline-rev1 x86_64
    NAME="Ubuntu"
    VERSION="16.04.1 LTS (Xenial Xerus)"
    ID=ubuntu
    ID_LIKE=debian
    PRETTY_NAME="Ubuntu 16.04.1 LTS"
    VERSION_ID="16.04"
    HOME_URL="http://www.ubuntu.com/"
    SUPPORT_URL="http://help.ubuntu.com/"
    BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
    VERSION_CODENAME=xenial
    UBUNTU_CODENAME=xenial
  • Version of Ansible (ansible --version):
    ansible 2.4.0.0
    config file = /etc/ansible/ansible.cfg
    configured module search path = [u'/etc/ansible/module']
    ansible python module location = /usr/lib/python2.7/dist-packages/ansible
    executable location = /usr/bin/ansible
    python version = 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609]

Kubespray version (commit) (git rev-parse --short HEAD):
92d0380

Network plugin used:
Default

Copy of your inventory file:

Command used to invoke ansible:
ansible-playbook -b kubespray/cluster.yml -u root -i /etc/ansible/inventory/hosts

Output of ansible run:

Anything else do we need to know:

Cluster seems healthy, I can launch deployments etc. from the console of one of the three master servers. When trying to launch a dashboard using kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.6.3.yaml
everything is created normally. When trying to access it remotely using /ui I was able to login but retrieved an error. Thats when I noticed this user doesn't seem to be able to do anything?

Trying to reach /api/v1/nodes

returns


  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "nodes is forbidden: User \"kube\" cannot list nodes at the cluster scope",
  "reason": "Forbidden",
  "details": {
    "kind": "nodes"
  },
  "code": 403
}

???

root@Master03:~# cat /etc/kubernetes/users/known_users.csv
changeme,kube,admin

Maybe I am missing something? It's been a while since I stood something up with kubespray but in the past using defaults was always a sure thing.

@shadycuz shadycuz changed the title Can't access api with basic auth Can't access api with basic auth (user kube has no permission for anything) Oct 15, 2017
@shadycuz
Copy link
Author

Possibly being affected by this kubernetes/kubeadm#484

@shadycuz
Copy link
Author

@mattymo

@mattymo
Copy link
Contributor

mattymo commented Oct 18, 2017

@shadycuz

I deployed an environment by hand with the following settings:

kubeadm_enabled: false
kube_basic_auth: true

Then I tried to list nodes with my generated credentials:

root@k8s-mattymo-test-1:~# curl -s https://kube:qppUCGxPJoBlCwE@localhost:6443/api/v1/nodes | head -5
{
  "kind": "NodeList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/nodes",

The UI works as well.

If you want to access the UI but not enable kube_basic_auth (or can't because you are using kubeadm mode), you can access the UI with kubectl proxy https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#using-kubectl-proxy . Keep in mind you want to use the following options on your host to allow this:

kubectl_localhost: true
kubeconfig_localhost: true

@shadycuz
Copy link
Author

@mattymo Thanks for this. Like I said I used all defaults and that used to give me what I wanted. I guess things have changed and I will need to look through those var's files for now on. I also need to look up kubeadm as I don't know what that is. I will recreate the cluster soon with those settings.

Thanks.

@mattymo
Copy link
Contributor

mattymo commented Oct 18, 2017

We disabled basic auth because it's a much weaker attack vector to an account with admin privileges than a proper x509 cert.

@shadycuz
Copy link
Author

@mattymo I need to look into it more. I have an ELB pointed at my Masters on port 6883? Or whichever port API is listening and that is how I always reached the cluster and used basic auth to hit the dashboard. If I can still reach the UI and run commands like kubectl create while using that load balancer then I am fine with use kubeadm and certs. I just didn't think I could log into the gui via a cert? So I need to look into it more.

@shadycuz
Copy link
Author

@mattymo Ahh, Look at what user floreks wrote here kubernetes/kubernetes#31665

I will try to install that 509 cert in my browser and see if I can reach api and dashboard throug my Nginx reverse proxy. Mainly because I don't want to use a VPN to hit the cluster.

@mattymo
Copy link
Contributor

mattymo commented Oct 18, 2017

@shadycuz Enabling kube_basic_auth is easier than authenticating with a cert in a browser, in my experience.

@shadycuz
Copy link
Author

@mattymo I will just do that, its a personal cluster for fun and learning. I will set a really long random password and call it a day =). Thanks for adding that kubeconfig role to set it up for end user, Super Nice!

@shadycuz
Copy link
Author

@mattymo maybe I should start a new issue but I spun up a new cluster this morning I used

kubeadm_enabled: false
kube_basic_auth: true
kubectl_localhost: true
kubeconfig_localhost: true

unfortuantly I lost the ansible output, but all steps were changed or okay untill it was time to wait for the api servers to come up. 20 tries I think and all 3 failed. Was checking 127.0.0.1:8080/health or something like that.

I checked the host docker and something isnt right...

root@Master01:/# service docker status
● docker.service - Docker Application Container Engine
   Loaded: loaded (/etc/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─docker-dns.conf, docker-options.conf
   Active: active (running) since Thu 2017-10-19 11:58:31 UTC; 36min ago
     Docs: http://docs.docker.com
 Main PID: 9240 (dockerd)
    Tasks: 35
   Memory: 1.0G
      CPU: 3min 38.876s
   CGroup: /system.slice/docker.service
           ├─ 9240 dockerd --insecure-registry=10.233.0.0/18 --graph=/var/lib/docker --log-opt max-size=50m --log-opt max-file=5 --iptables=false --dns 10.233.0.3 --dns 10.1.94.8 --dns-search default.svc.cluster.local
           ├─ 9252 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --shim docker-containerd-           └─15654 docker-containerd-shim c0ae8f0a2e9b6e7484463b977586047a4539f7093bde041f90cd89e2b158e52f /var/run/docker/libcontainerd/c0ae8f0a2e9b6e7484463b977586047a4539f7093bde041f90cd89e2b158e52f docker-runc

Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.935797440Z" level=error msg="Handler for GET /v1.26/containers/498ffffcfd05/json returned error: No such container: 498ffffcfd05"
Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.941314558Z" level=error msg="Handler for GET /v1.26/containers/ff1e9c00bb46/json returned error: No such container: ff1e9c00bb46"
Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.945310473Z" level=error msg="Handler for GET /v1.26/containers/00bc1e841a8f/json returned error: No such container: 00bc1e841a8f"
Oct 19 12:05:13 Master01 docker[9240]: time="2017-10-19T12:05:13.949787846Z" level=error msg="Handler for GET /v1.26/containers/99e59f495ffa/json returned error: No such container: 99e59f495ffa"
Oct 19 12:09:45 Master01 docker[9240]: time="2017-10-19T12:09:45.526205467Z" level=error msg="Handler for DELETE /v1.26/containers/etcdctl-binarycopy returned error: No such container: etcdctl-binarycopy"
Oct 19 12:10:37 Master01 docker[9240]: time="2017-10-19T12:10:37.676144794Z" level=error msg="Handler for DELETE /v1.26/containers/etcd1 returned error: No such container: etcd1"
Oct 19 12:13:36 Master01 docker[9240]: time="2017-10-19T12:13:36.054842758Z" level=error msg="Handler for DELETE /v1.26/containers/etcdctl-binarycopy returned error: No such container: etcdctl-binarycopy"
Oct 19 12:19:46 Master01 docker[9240]: time="2017-10-19T12:19:46.363721789Z" level=error msg="Handler for GET /v1.26/containers/kubelet/archive returned error: No such container: kubelet"
Oct 19 12:19:52 Master01 docker[9240]: time="2017-10-19T12:19:52.395258535Z" level=error msg="Handler for POST /v1.26/containers/kubelet/stop returned error: No such container: kubelet"
Oct 19 12:20:07 Master01 docker[9240]: time="2017-10-19T12:20:07.494390285Z" level=error msg="Handler for POST /v1.26/containers/create returned error: No such image: quay.io/coreos/hyperkube:v1.6.7_coreos.0"

@mattymo
Copy link
Contributor

mattymo commented Oct 19, 2017

Why are you deploying kubernetes version 1.6.7 instead of 1.8.0?

@shadycuz
Copy link
Author

@mattymo Idk?

ansible@rundeck:~/kubespray$ git pull
Already up-to-date.
ansible@rundeck:~/kubespray$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   roles/kubernetes/master/defaults/main.yml
        modified:   roles/kubespray-defaults/defaults/main.yaml

no changes added to commit (use "git add" and/or "git commit -a")
ansible@rundeck:~/kubespray$ grep version roles/kubespray-defaults/defaults/main.yaml
## Change this to use another Kubernetes version, e.g. a current beta release
kube_version: v1.6.7
## When OpenStack is used, Cinder version can be explicitly specified if autodetection fails (https://github.com/kubernetes/kubernetes/issues/50461)
# openstack_blockstorage_version: "v1/v2/auto (default)"

@shadycuz
Copy link
Author

I guess I thought the defaults with out changing much would give me a good cluster out of the box, I know yesterday when I launched a cluster with all defaults it came up okay except basic auth wasn't enabled.

@mattymo
Copy link
Contributor

mattymo commented Oct 19, 2017

You shouldn't remove the group vars that are included. We can't guarantee that the role defaults are set correctly because it's not covered by CI.

I have a fix on review for the kube version issue: #1845

@shadycuz
Copy link
Author

@mattymo what do you mean remove groups vars? I didn't? You mean I need to put my inventory in kubespray inventory and run kubespray from inside the kubespray directory?

@shadycuz
Copy link
Author

@mattymo I think most of my problems have been from not using the groups vars ..... =/ I will stand up a new cluster and put my inventory in the proper place...

@shadycuz
Copy link
Author

and then I will submit my vars via CLI so they take precedence over anything else cause im not really sure where should I change what

@mattymo
Copy link
Contributor

mattymo commented Oct 19, 2017

@shadycuz You can make your own json or yaml file with vars and then specify it on the CLI like -e @my_vars.yml

@shadycuz
Copy link
Author

@mattymo The ansible config file inside of kubespray does not specify a path to an inventory, so dropping an inventory into kubespray/inventory/hosts and then running ansible-playbook -b cluster.yml fails. I had to run mine with ansible-playbook -e my_vars.yml -i inventory/hosts -u root -b cluster.yml but I have no clue if the group variables got called into "action".

@shadycuz
Copy link
Author

Okay groupsvars should have worked?

In addition to storing variables directly in the inventory file, host and group variables can be stored in individual files relative to the inventory file (not directory, it is always the file).

@shadycuz
Copy link
Author

@mattymo playbook ran with out erros, first thing I noticed was no artifacts dir?

I used ansible-playbook -i inventory/hosts -b -e my_vars.yml -u root cluster.yml

and

ansible@rundeck:~/kubespray$ cat my_vars.yml
kubeadm_enabled: false
kube_basic_auth: true
kubectl_localhost: true
kubeconfig_localhost: true

@shadycuz
Copy link
Author

@mattymo well this aint good, I don't know what has happened to kubespray but at one point in time it actually worked with defaults out of the box with my exact setup.

root@Master01:~# kubectl get pods -n kube-system
NAME                                    READY     STATUS              RESTARTS   AGE
calico-node-456fb                       0/1       ContainerCreating   0          33m
calico-node-9x89z                       0/1       ContainerCreating   0          33m
calico-node-g5w8b                       0/1       ContainerCreating   0          33m
calico-node-h2q8m                       0/1       ContainerCreating   0          33m
calico-node-mcgvr                       0/1       ContainerCreating   0          33m
calico-node-xkss4                       0/1       ContainerCreating   0          33m
kube-apiserver-master01                 1/1       Running             0          36m
kube-apiserver-master02                 1/1       Running             0          36m
kube-apiserver-master03                 1/1       Running             0          36m
kube-controller-manager-master01        1/1       Running             0          36m
kube-controller-manager-master02        1/1       Running             0          36m
kube-controller-manager-master03        1/1       Running             0          36m
kube-dns-596d7c8f8-whdzz                0/3       ContainerCreating   0          30m
kube-proxy-compute01                    1/1       Running             0          37m
kube-proxy-compute02                    1/1       Running             0          37m
kube-proxy-compute03                    1/1       Running             0          36m
kube-proxy-master01                     1/1       Running             0          37m
kube-proxy-master02                     1/1       Running             0          37m
kube-proxy-master03                     1/1       Running             0          37m
kube-scheduler-master01                 1/1       Running             0          37m
kube-scheduler-master02                 1/1       Running             0          37m
kube-scheduler-master03                 1/1       Running             0          37m
kubedns-autoscaler-86c47697df-rsjr7     0/1       ContainerCreating   0          30m
kubernetes-dashboard-7fd45476f8-9c7pz   0/1       ContainerCreating   0          30m
nginx-proxy-compute01                   1/1       Running             0          36m
nginx-proxy-compute02                   1/1       Running             0          37m
nginx-proxy-compute03                   1/1       Running             0          37m

@shadycuz
Copy link
Author

@mattymo from the dashboard pod

Tolerations:     node-role.kubernetes.io/master:NoSchedule
Events:
  Type     Reason       Age                From                Message
  ----     ------       ----               ----                -------
  Normal   Scheduled    32m                default-scheduler   Successfully assigned kubernetes-dashboard-7fd45476f8-9c7pz to compute02
  Warning  FailedSync   16m (x7 over 30m)  kubelet, compute02  Error syncing pod
  Warning  FailedMount  12m (x9 over 30m)  kubelet, compute02  Unable to mount volumes for pod "kubernetes-dashboard-7fd45476f8-9c7pz_kube-system(2496f520-b4e4-11e7-bf2a-de19444b4004)": timeout expired waiting for volumes to attach/mount for pod "kube-system"/"kubernetes-dashboard-7fd45476f8-9c7pz". list of unattached/unmounted volumes=[kubernetes-dashboard-token-slvz2]
  Warning  FailedMount  1m (x23 over 32m)  kubelet, compute02  MountVolume.SetUp failed for volume "kubernetes-dashboard-token-slvz2" : secrets "kubernetes-dashboard-token-slvz2" is forbidden: User "system:node:Compute02" cannot get secrets in the namespace "kube-system": no path found to object

@shadycuz shadycuz reopened this Oct 19, 2017
@shadycuz shadycuz changed the title Can't access api with basic auth (user kube has no permission for anything) Can't spinup a working cluster with default settings Oct 19, 2017
@shadycuz
Copy link
Author

I didn't use the @sign....

@shadycuz
Copy link
Author

Tried again with a new cluster

TASK [network_plugin/calico : Calico | wait for etcd] **********************************************************************************************
Thursday 19 October 2017  17:53:35 -0400 (0:00:02.749)       0:26:59.961 ******
FAILED - RETRYING: Calico | wait for etcd (10 retries left).
FAILED - RETRYING: Calico | wait for etcd (9 retries left).
FAILED - RETRYING: Calico | wait for etcd (8 retries left).
FAILED - RETRYING: Calico | wait for etcd (7 retries left).
FAILED - RETRYING: Calico | wait for etcd (6 retries left).
FAILED - RETRYING: Calico | wait for etcd (5 retries left).
FAILED - RETRYING: Calico | wait for etcd (4 retries left).
FAILED - RETRYING: Calico | wait for etcd (3 retries left).
FAILED - RETRYING: Calico | wait for etcd (2 retries left).
FAILED - RETRYING: Calico | wait for etcd (1 retries left).
fatal: [Compute02 -> None]: FAILED! => {"attempts": 10, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error ('_ssl.c:574: The handshake operation timed out',)>", "redirected": false, "status": -1, "url": "https://localhost:2379/health"}

@rhino5oh
Copy link

rhino5oh commented Mar 5, 2018

@shadycuz did you ever figure out a solution to this error you were getting?

fatal: [Compute02 -> None]: FAILED! => {"attempts": 10, "changed": false, "content": "", "failed": true, "msg": "Status code was not [200]: Request failed: <urlopen error ('_ssl.c:574: The handshake operation timed out',)>", "redirected": false, "status": -1, "url": "https://localhost:2379/health"}

I get the same thing sometimes for the "wait for etcd" task and every single time on the "Master | wait for the apiserver to be running" handler

@shadycuz
Copy link
Author

@rhino5oh I did get past this, but I don't remember how. If you are using the latest code from here it might be something different from my issue. I would ask on the kubernetes slack. Kubespay has a room and I always get good help from people. You might want to SSH into your etcd nodes and check the logs and see what it says. Also make a new issue =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants