Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredns CrashLoopBackOff due to dnsmasq #1292

Closed
aravind-craft opened this issue Dec 2, 2018 · 44 comments
Closed

coredns CrashLoopBackOff due to dnsmasq #1292

aravind-craft opened this issue Dec 2, 2018 · 44 comments
Labels
area/ecosystem help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@aravind-craft
Copy link

What keywords did you search in kubeadm issues before filing this one?

Ubuntu 16.04 coredns crashloopbackoff

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version):

root@k8s-master:~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.3", GitCommit:"435f92c719f279a3a67808c80521ea17d5715c66", GitTreeState:"clean", BuildDate:"2018-11-26T12:54:02Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
root@k8s-master:~# kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.3", GitCommit:"435f92c719f279a3a67808c80521ea17d5715c66", GitTreeState:"clean", BuildDate:"2018-11-26T12:57:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.3", GitCommit:"435f92c719f279a3a67808c80521ea17d5715c66", GitTreeState:"clean", BuildDate:"2018-11-26T12:46:57Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Local Virtual Machine, 2 CPU, 4 GB RAM

  • OS (e.g. from /etc/os-release):

root@k8s-master:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.5 LTS
Release:	16.04
Codename:	xenial

  • Kernel (e.g. uname -a):
root@k8s-master:~# uname -a
Linux k8s-master 4.15.0-29-generic #31~16.04.1-Ubuntu SMP Wed Jul 18 08:54:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Others:
root@k8s-master:~# docker --version
Docker version 18.06.0-ce, build 0ffa825

root@k8s-master:~# sestatus
The program 'sestatus' is currently not installed. You can install it by typing:
apt install policycoreutils
root@k8s-master:~# kubectl -n kube-system get deployment coredns -o yaml | \
>   sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
>   kubectl apply -f -
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
deployment.extensions/coredns configured
root@k8s-master:~# grep nameserver /etc/resolv.conf 
nameserver 127.0.1.1
root@k8s-master:~# cat /run/systemd/resolve/resolv.conf
cat: /run/systemd/resolve/resolv.conf: No such file or directory
root@k8s-master:~# cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS=--cgroup-driver=systemd --network-plugin=cni
root@k8s-master:~# systemctl list-unit-files | grep enabled | grep systemd-resolved
root@k8s-master:~# ps auxww | grep kubelet
root       501  3.3  2.6 496440 106152 ?       Ssl  07:09   0:41 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-plugin=cni
root@k8s-master:~# ufw disable
Firewall stopped and disabled on system startup

What happened?

root@k8s-master:~# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS             RESTARTS   AGE
kube-system   coredns-576cbf47c7-822v6             0/1     CrashLoopBackOff   11         24m
kube-system   coredns-576cbf47c7-n9tw9             0/1     CrashLoopBackOff   11         24m
kube-system   etcd-k8s-master                      1/1     Running            1          23m
kube-system   kube-apiserver-k8s-master            1/1     Running            1          23m
kube-system   kube-controller-manager-k8s-master   1/1     Running            1          23m
kube-system   kube-flannel-ds-amd64-qbff2          1/1     Running            1          20m
kube-system   kube-proxy-4bbbk                     1/1     Running            1          24m
kube-system   kube-scheduler-k8s-master            1/1     Running            1          23m

What you expected to happen?

I expected coredns pods to start properly

How to reproduce it (as minimally and precisely as possible)?

  1. Install CRI = Docker
    https://kubernetes.io/docs/setup/cri/#docker
  2. Install Kubeadm
    https://kubernetes.io/docs/setup/independent/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl
  3. Initialize kubernetes pod
    kubeadm init --pod-network-cidr=10.244.0.0/16
  4. Install Flanel
    https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network & scroll down to section "Installing a pod network add-on" & select tab "Flannel"
  5. Check Pods are in running state:
    root@k8s-master:~# kubectl get pods --all-namespaces

Anything else we need to know?

"Hack" solution mentioned in https://stackoverflow.com/a/53414041/5731350 works but i am not comfortable disabling something (loop) that is supposed to be working-as-intended.

@neolit123
Copy link
Member

neolit123 commented Dec 2, 2018

please provide logs from the CoreDNS pods.

google reveils numerous reports that this is problematic on Ubuntu:
nameserver 127.0.1.1
read this: https://askubuntu.com/questions/627899/nameserver-127-0-1-1-in-resolv-conf-wont-go-away

in 1.12.x you can also try --feature-gates=CoreDNS=false to enable kube-dns.
just for the sake of testing.

@neolit123 neolit123 added area/ecosystem priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Dec 2, 2018
@bart0sh
Copy link

bart0sh commented Dec 3, 2018

@aravind-murthy Do you happen to run NetworkManager or dnsmasq on your machine?
systemd-resolved uses 'nameserver 127.0.0.53' in the /etc/resolv.conf and saves original resolv.conf in /run/systemd/resolve/resolv.conf. You don't have it, so looks like systemd-resolved is not enabled.

@aravind-craft
Copy link
Author

@aravind-murthy Do you happen to run NetworkManager or dnsmasq on your machine?
systemd-resolved uses 'nameserver 127.0.0.53' in the /etc/resolv.conf and saves original resolv.conf in /run/systemd/resolve/resolv.conf. You don't have it, so looks like systemd-resolved is not enabled.

Hi @bart0sh - please advise how do i check for NM and dnsmasq? This is a brand new VM, so i haven't configured anything explicitly (as far as i am aware) unless it already comes enabled by default in the OS. However, i can check and report back.

@bart0sh
Copy link

bart0sh commented Dec 3, 2018

@aravind-murthy Please check if you have /etc/NetworkManager/NetworkManager.conf and what is mentioned there as dns, e.g. dns=dnsmasq. You can also check if dnsmasq process is in the 'ps aux' output or/and its service status: 'sudo systemctl status dnsmasq'

@aravind-craft
Copy link
Author

aravind-craft commented Dec 3, 2018

please provide logs from the CoreDNS pods.

google reveils numerous reports that this is problematic on Ubuntu:
nameserver 127.0.1.1
read this: https://askubuntu.com/questions/627899/nameserver-127-0-1-1-in-resolv-conf-wont-go-away

in 1.12.x you can also try --feature-gates=CoreDNS=false to enable kube-dns.
just for the sake of testing.

root@k8s-master:~# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS             RESTARTS   AGE
kube-system   coredns-576cbf47c7-dkhx5             0/1     CrashLoopBackOff   1          9m9s
kube-system   coredns-576cbf47c7-qlnqn             0/1     CrashLoopBackOff   1          9m9s
kube-system   etcd-k8s-master                      1/1     Running            0          18s
kube-system   kube-apiserver-k8s-master            1/1     Running            0          18s
kube-system   kube-controller-manager-k8s-master   1/1     Running            0          18s
kube-system   kube-flannel-ds-amd64-nr8lx          1/1     Running            0          32s
kube-system   kube-proxy-8s48m                     1/1     Running            0          9m9s
kube-system   kube-scheduler-k8s-master            1/1     Running            0          18s
root@k8s-master:~# kubectl describe pod coredns-576cbf47c7-dkhx5
Error from server (NotFound): pods "coredns-576cbf47c7-dkhx5" not found

@aravind-craft
Copy link
Author

please provide logs from the CoreDNS pods.

google reveils numerous reports that this is problematic on Ubuntu:
nameserver 127.0.1.1
read this: https://askubuntu.com/questions/627899/nameserver-127-0-1-1-in-resolv-conf-wont-go-away

in 1.12.x you can also try --feature-gates=CoreDNS=false to enable kube-dns.
just for the sake of testing.

root@k8s-master:~# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS             RESTARTS   AGE
kube-system   coredns-576cbf47c7-dkhx5             0/1     CrashLoopBackOff   4          11m
kube-system   coredns-576cbf47c7-qlnqn             0/1     CrashLoopBackOff   4          11m
kube-system   etcd-k8s-master                      1/1     Running            0          2m14s
kube-system   kube-apiserver-k8s-master            1/1     Running            0          2m14s
kube-system   kube-controller-manager-k8s-master   1/1     Running            0          2m14s
kube-system   kube-flannel-ds-amd64-nr8lx          1/1     Running            0          2m28s
kube-system   kube-proxy-8s48m                     1/1     Running            0          11m
kube-system   kube-scheduler-k8s-master            1/1     Running            0          2m14s
root@k8s-master:~# kubectl logs coredns-576cbf47c7-dkhx5
Error from server (NotFound): pods "coredns-576cbf47c7-dkhx5" not found
root@k8s-master:~# kubectl logs coredns-576cbf47c7-dkhx5 --previous
Error from server (NotFound): pods "coredns-576cbf47c7-dkhx5" not found

@aravind-craft
Copy link
Author

aravind-craft commented Dec 3, 2018

@aravind-murthy Please check if you have /etc/NetworkManager/NetworkManager.conf and what is mentioned there as dns, e.g. dns=dnsmasq. You can also check if dnsmasq process is in the 'ps aux' output or/and its service status: 'sudo systemctl status dnsmasq'

root@k8s-master:~# cat /etc/NetworkManager/NetworkManager.conf
[main]
plugins=ifupdown,keyfile,ofono
dns=dnsmasq

[ifupdown]
managed=false
root@k8s-master:~# systemctl status dnsmasq
● dnsmasq.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)
root@k8s-master:~#

@neolit123
Copy link
Member

root@k8s-master:~# kubectl describe pod coredns-576cbf47c7-dkhx5
Error from server (NotFound): pods "coredns-576cbf47c7-dkhx5" not found

^ you need to add -n kube-system to specify the namespace too.

@aravind-craft
Copy link
Author

aravind-craft commented Dec 3, 2018

root@k8s-master:~# kubectl describe pod coredns-576cbf47c7-dkhx5
Error from server (NotFound): pods "coredns-576cbf47c7-dkhx5" not found

^ you need to add -n kube-system to specify the namespace too.

root@k8s-master:~# kubectl describe pod coredns-576cbf47c7-dkhx5 -n kube-system

<output snipped>

Events:
  Type     Reason            Age                     From                 Message
  ----     ------            ----                    ----                 -------
  Warning  FailedScheduling  11m (x34 over 16m)      default-scheduler    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Normal   Started           6m46s (x4 over 7m33s)   kubelet, k8s-master  Started container
  Normal   Pulled            6m5s (x5 over 7m33s)    kubelet, k8s-master  Container image "k8s.gcr.io/coredns:1.2.2" already present on machine
  Normal   Created           6m5s (x5 over 7m33s)    kubelet, k8s-master  Created container
  Warning  BackOff           2m20s (x30 over 7m31s)  kubelet, k8s-master  Back-off restarting failed container

@neolit123
Copy link
Member

once the container enters CrashLoopBackOff you can also call docker ps to see the running containers and then docker logs [coredns-container-id] to see the logs from the container itself.

@bart0sh
Copy link

bart0sh commented Dec 3, 2018

@aravind-murthy that's interesting. The service is enabled in the network manager, but not running. Please, comment out 'dns=dnsmasq' line in the config and restart network manager: sudo systemctl restart network-manager

Then restore original name servers in /etc/resolv.conf (they're probably commented out there). That should help core-dns pods when they're restarted by kubelet.

@bart0sh
Copy link

bart0sh commented Dec 3, 2018

I was able to reproduce this on my machine with kubeadm 1.12.3:

Warning  FailedScheduling  7s (x21 over 3m19s)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

@aravind-craft
Copy link
Author

once the container enters CrashLoopBackOff you can also call docker ps to see the running containers and then docker logs [coredns-container-id] to see the logs from the container itself.

root@k8s-master:~# docker logs k8s_coredns_coredns-576cbf47c7-l6rkc_kube-system_59d21f34-f712-11e8-a662-001c423e384e_4
.:53
2018/12/03 15:47:02 [INFO] CoreDNS-1.2.2
2018/12/03 15:47:02 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/12/03 15:47:02 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/12/03 15:47:02 [FATAL] plugin/loop: Seen "HINFO IN 2526125915973168889.7333277912286930769." more than twice, loop detected

@neolit123
Copy link
Member

neolit123 commented Dec 3, 2018

@aravind-craft
Copy link
Author

Then restore original name servers in /etc/resolv.conf (they're probably commented out there)

I havent commented anything out in this file (/etc/resolv.conf). It currently has following entries:

Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.211.55.1
nameserver 127.0.1.1
search localdomain

@bart0sh
Copy link

bart0sh commented Dec 3, 2018

@aravind-murthy can you comment out line with dnsmasq address: nameserver 127.0.1.1
then you can delete core-dns pods. they'll be restarted by kubelet and hopefully will work

@bart0sh
Copy link

bart0sh commented Dec 3, 2018

@neolit123 my case is different. core-dns pods are in pending state, logs are empty

@aravind-craft
Copy link
Author

aravind-craft commented Dec 3, 2018

@aravind-murthy can you comment out line with dnsmasq address: nameserver 127.0.1.1
then you can delete core-dns pods. they'll be restarted by kubelet and hopefully will work

cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.211.55.1
#nameserver 127.0.1.1
search localdomain
root@k8s-master:~# kubectl -n kube-system delete pod -l k8s-app=kube-dns
pod "coredns-576cbf47c7-l6rkc" deleted
pod "coredns-576cbf47c7-mhd4d" deleted


root@k8s-master:~# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS             RESTARTS   AGE
kube-system   coredns-576cbf47c7-2zjm4             0/1     Error              1          20s
kube-system   coredns-576cbf47c7-854dn             0/1     CrashLoopBackOff   1          20s
kube-system   etcd-k8s-master                      1/1     Running            0          13m
kube-system   kube-apiserver-k8s-master            1/1     Running            0          13m
kube-system   kube-controller-manager-k8s-master   1/1     Running            0          13m
kube-system   kube-flannel-ds-amd64-96724          1/1     Running            0          13m
kube-system   kube-proxy-4gq5w                     1/1     Running            0          14m
kube-system   kube-scheduler-k8s-master            1/1     Running            0          13m


root@k8s-master:~# docker logs k8s_coredns_coredns-576cbf47c7-2zjm4_kube-system_4a9a1492-f714-11e8-a662-001c423e384e_5
.:53
2018/12/03 16:02:16 [INFO] CoreDNS-1.2.2
2018/12/03 16:02:16 [INFO] linux/amd64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/amd64, go1.11, eb51e8b
2018/12/03 16:02:16 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/12/03 16:02:22 [FATAL] plugin/loop: Seen "HINFO IN 4180232279452050349.8122308360122098305." more than twice, loop detected

@aravind-craft
Copy link
Author

see:
coredns/coredns#2087 (comment)

Yes, this works as I noted in the "Anything else we need to know section" but it looks like its a hacky solution.

root@k8s-master:~# kubectl -n kube-system edit configmap coredns
<add a comment in the line containing 'loop' here, and save the file.
configmap/coredns edited
root@k8s-master:~#

root@k8s-master:~# kubectl -n kube-system delete pod -l k8s-app=kube-dns
pod "coredns-576cbf47c7-2zjm4" deleted
pod "coredns-576cbf47c7-854dn" deleted
root@k8s-master:~# kubectl get pods --all-namespaces
NAMESPACE     NAME                                 READY   STATUS    RESTARTS   AGE
kube-system   coredns-576cbf47c7-7ls7n             1/1     Running   0          14s
kube-system   coredns-576cbf47c7-lvbnq             1/1     Running   0          14s
kube-system   etcd-k8s-master                      1/1     Running   0          20m
kube-system   kube-apiserver-k8s-master            1/1     Running   0          20m
kube-system   kube-controller-manager-k8s-master   1/1     Running   0          20m
kube-system   kube-flannel-ds-amd64-96724          1/1     Running   0          20m
kube-system   kube-proxy-4gq5w                     1/1     Running   0          21m
kube-system   kube-scheduler-k8s-master            1/1     Running   0          20m
root@k8s-master:~#

@neolit123
Copy link
Member

Yes, this works as I noted in the "Anything else we need to know section" but it looks like its a hacky solution.

you can also try kube-dns, but it seems to me like Ubuntu is doing things wrong.

@aravind-craft
Copy link
Author

@neolit123
Copy link
Member

crashloopbackoff-or-error-state solution does not work BTW.

sadly there can be more than one reason for a pods' CrashLoopBackoff state.

@bart0sh
Copy link

bart0sh commented Dec 3, 2018

@aravind-murthy Can you look if 'nameserver 127.0.1.1' line is still commented out in your /etc/resolv.conf? It could be rewritten and that can cause dns loop and core-dns crash

@neolit123
Copy link
Member

@chrisohaver what is the official way for solving the loop issues that are present in stock Ubuntu 16.04 that we should include in our TS guide?

@alejandrox1
Copy link

alejandrox1 commented Dec 3, 2018

@aravind-murthy in your setup instructions you mentioned that you did the following:

  1. Initialize kubernetes pod
    kubeadm init --pod-network-cidr=10.244.0.0/16
  2. Install Flanel
    https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network & scroll down to section "Installing a pod network add-on" & select tab "Flannel"

Before installing flannel via kubectl apply -f https://..., did you set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 by running sysctl net.bridge.bridge-nf-call-iptables=1 ?

@chrisohaver
Copy link

@neolit123, IMO the best way fix the DNS self-forwarding loop issue, is to fix the underlying deployment issue in kubelet: that kubelet is sending invalid upstream servers to Pods with dnsPolicy="default". In effect, make kubelet use a copy of resolv.conf which contains the actual upstreams (where ever that is in a stock Ubuntu 16.04). This would fix the issue for all dnsPolicy="default" Pods, not just CoreDNS.

The tricky bit is locating where the "real" resolv.conf exists on your systems. It seems that Ubuntu has moved this around from release to release. So the steps would be, for each node:

  1. find the path to the resolv.conf that lists only actual IP addresses of your upstream servers, no loopback addresses.
  2. configure kubelet to use this path instead of the default /etc/resolv.conf.

A nuclear option would be to manually create a resolv.conf on each node that contains the upstream servers you want k8s to use, and point each kubelet to those.

@neolit123
Copy link
Member

thanks a lot for the details @chrisohaver !

we need to add this info to the troubleshooting guide.
this is a distro / resolv.conf problem and not a kubeadm problem, but our users face it.

page to update:
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-pods-have-crashloopbackoff-or-error-state

@neolit123 neolit123 added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 3, 2018
@neolit123 neolit123 added kind/documentation Categorizes issue or PR as related to documentation. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Dec 3, 2018
@aravind-craft
Copy link
Author

Before installing flannel via kubectl apply -f https://..., did you set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 by running sysctl net.bridge.bridge-nf-call-iptables=1 ?

I did not, thanks for pointing that out. I will retry on a clean VM once more and advise shortly.

@bart0sh
Copy link

bart0sh commented Dec 3, 2018

@neolit123 in this particular case we need to figure out how to disable dnsmasq completely. Looks like disabling it in network manager is not enough and dhcp client or other piece of software puts 'nameserver 127.0.1.1' back into it, which triggers loop detection in core-dns.

@neolit123
Copy link
Member

neolit123 commented Dec 3, 2018

@neolit123 in this particular case we need to figure out how to disable dnsmasq completely

i'm realizing that we automate what is passed to the kubelet depending on systemd-resolved and that's problematic. but instructing the user to write a custom resolv.conf is still an option too, i guess.

Looks like disabling it in network manager is not enough and dhcp client or other piece of software puts 'nameserver 127.0.1.1' back into it, which triggers loop detection in core-dns.

if we use the default resolv.conf, we need to disable the automatic addition of these loopback nameservers.

so steps are::

  1. comment out dns=dnsmasq in the network manager config
  2. this post has some info of what has to be done next:
    https://askubuntu.com/a/716813
    but to me it seems that the custom resolv.conf solution is easier to formulate in the TS guide.

@chrisohaver
Copy link

I've proposed a PR that generalizes the loop troubleshooting docs in the coredns loop plugin readme, so it more clearly applies to any kind of local DNS caching server, not just systemd-resolved. coredns/coredns#2363

@alejandrox1
Copy link

@aravind-murthy have you retried this ?

@aravind-craft
Copy link
Author

aravind-craft commented Dec 4, 2018

@aravind-murthy have you retried this ?

@alejandrox1 Not yet please give me some time, i need the k8s cluster up and running to test jenkins-k8s-kubectl plugin (unrelated proof-of-concept to this issue). I will report back when I can.

@neolit123 neolit123 changed the title Ubuntu 16.04 coredns CrashLoopBackOff on new VM & fresh/clean k8s install coredns CrashLoopBackOff due to dnsmasq Dec 4, 2018
@neolit123
Copy link
Member

looks like we already have an entry for this in the TS guide here:
https://kubernetes.io/docs/setup/independent/troubleshooting-kubeadm/#coredns-pods-have-crashloopbackoff-or-error-state

@neolit123
Copy link
Member

@aravind-murthy please feel free to report your findings, still.

@aravind-craft
Copy link
Author

aravind-craft commented Dec 12, 2018

Hi @alejandrox1 , @neolit123 - i dont know whether its the combination of the new kubernetes v1.13 (when i raised this ticket i used v1.12 because that was the version available at that time) or me following the instructions (properly this time), but, on Ubuntu 16.04.05, installing latest kubernetes v1.13, following https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#tabs-pod-install-4 (i.e. Flannel) and then setting net.bridge.bridge-nf-call-iptables to 1, using the command
sysctl net.bridge.bridge-nf-call-iptables=1 and then rebooting the machine/VM, to allow sysctl setting to 'take hold' & then installing Flannel......

No more crashloopback errors for coredns!!!

:-) :-)

Thank you so much

@dimmg
Copy link

dimmg commented Jan 28, 2019

@aravind-murthy that's interesting. The service is enabled in the network manager, but not running. Please, comment out 'dns=dnsmasq' line in the config and restart network manager: sudo systemctl restart network-manager

Then restore original name servers in /etc/resolv.conf (they're probably commented out there). That should help core-dns pods when they're restarted by kubelet.

disabling dnsmasq for network manager and commenting out dnsmasq nameservers, did the trick for me!

@Ramane19
Copy link

I had this same issue after deleting the loop.

Can someone help me with this?

kubectl logs coredns-fb8b8dccf-j6mjl -n kube-system
Error from server (BadRequest): container "coredns" in pod "coredns-fb8b8dccf-j6mjl" is waiting to start: ContainerCreating
master@master:~$ sudo kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-fb8b8dccf-j6mjl 0/1 ContainerCreating 0 7m31s
kube-system coredns-fb8b8dccf-lst4v 0/1 ContainerCreating 0 7m31s
kube-system etcd-master.testcluster.com 1/1 Running 0 25m
kube-system kube-apiserver-master.testcluster.com 1/1 Running

@chrisohaver
Copy link

@Ramane19, your pods are stuck in "ContainerCreating", which is a different issue.

@Ramane19
Copy link

Before it showing pending, after I deleted the loop it showed creating container?

Is there any other way i can resolve this issue?

@chrisohaver
Copy link

@Ramane19
Copy link

I did, it talks about the looping issue, not the creation of the containers

@Ramane19, Have you read https://github.com/coredns/coredns/tree/master/plugin/loop#troubleshooting

@chrisohaver
Copy link

OK, then it seems to be unrelated to this issue - i.e. off topic.

@katpagavalli
Copy link

Then restore original name servers in /etc/resolv.conf (they're probably commented out there)

I havent commented anything out in this file (/etc/resolv.conf). It currently has following entries:

Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 10.211.55.1
nameserver 127.0.1.1
search localdomain

This worked for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ecosystem help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

8 participants