Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod's access to the internet suddenly stopped working, DNS resolution fails #4459

Open
McToel opened this issue Mar 14, 2024 · 8 comments
Open
Labels
kind/support Question with a workaround

Comments

@McToel
Copy link

McToel commented Mar 14, 2024

Summary

Suddenly, pods can no longer access the internet. When I try to curl google.com from inside a pod, it fails with "connection reset by peer" or a different error. On some very rare occasions, it gives a result, which does not equal curl google.com from the host machine. DNS add-on is enabled, microk8s is running on an Ubuntu server 22.04 host and is up-to-date.

I have made the following observations running commands in pods:

Running nslookup returns the same result for every external domain:

root@dnsutils:/# nslookup deb.debian.org
Server:		10.152.183.10
Address:	10.152.183.10#53

Non-authoritative answer:
Name:	deb.debian.org.fritz.box
Address: 45.76.93.104

Running host gives the same result as nslookup:

root@dnsutils:/# host deb.debian.org
deb.debian.org.fritz.box has address 45.76.93.104
deb.debian.org.fritz.box has IPv6 address 2001:19f0:6c00:1b0e:5400:4ff:fecd:7828

dig works fine, returning the correct IP address:

root@dnsutils:/# dig deb.debian.org

; <<>> DiG 9.9.5-9+deb8u19-Debian <<>> deb.debian.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41191
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;deb.debian.org.			IN	A

;; ANSWER SECTION:
deb.debian.org.		5	IN	CNAME	debian.map.fastlydns.net.
debian.map.fastlydns.net. 5	IN	A	146.75.122.132

;; Query time: 20 msec
;; SERVER: 10.152.183.10#53(10.152.183.10)
;; WHEN: Thu Mar 14 12:10:46 UTC 2024
;; MSG SIZE  rcvd: 135

Running curl against the valid IP from google.com does work and return the correct result.

I have done the official Kubernetes DNS troubleshooting, however none of the mentioned error occurred. /etc/resolv.conf in the pods looks like this:

search default.svc.cluster.local svc.cluster.local cluster.local fritz.box
nameserver 10.152.183.10
options ndots:5

I guess that some part of my DNS configuration is incorrect, but I have not changed anything before the internet broke, and the DNS add-on should work out of the box as far as I understand.

Introspection Report

inspection-report-20240313_232359.tar.gz

@McToel
Copy link
Author

McToel commented Mar 14, 2024

After two days of troubleshooting, I found a solution by disabling DNS and re-enabling it with my routers IP address as DNS:

microk8s disable dns
microk8s enable dns:192.168.178.1

I found the IP in /run/systemd/resolve/resolv.conf which, according to the Kubernetes docs, should be the correct resolve file when the host machine is using systemd-resolved.

What did not help was to set a different DNS (I've tried Cloudflare 1.1.1.1). Also, digworked the whole time, so I really do not understand what is going on.

@neoaggelos
Copy link
Member

neoaggelos commented Mar 14, 2024

Hi @McToel

Which microk8s version are you using? Starting from MicroK8s 1.26, MicroK8s will attempt to pick the upstream nameservers from /run/systemd/resolve/resolv.conf by default.

  1. How do you (and did you) enable dns?

  2. What's the output of snap run --shell microk8s -c '$SNAP/scripts/find-resolv-conf.py'?

  3. What's in your /run/systemd/resolve/resolv.conf?

@neoaggelos neoaggelos added the kind/support Question with a workaround label Mar 14, 2024
@McToel
Copy link
Author

McToel commented Mar 14, 2024

I'm running MicroK8s v1.29.2 revision 6641.

It could be, that as I first enabled DNS I was running a version prior to 1.26. But while investigating the problem, I have disabled and enabled dns a few times. I enabled dns with microk8s enable dns in the beginning.

Here is the output for 2. and 3.:

~ snap run --shell microk8s -c '$SNAP/scripts/find-resolv-conf.py'
/run/systemd/resolve/resolv.conf
➜  ~ cat /run/systemd/resolve/resolv.conf
# This is /run/systemd/resolve/resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 192.168.178.1
search fritz.box

@ThijsBorst
Copy link

I have exactly the same problem... I can't resolve it by pointing it to my pi hole either. It works for a second and then the problems start again.

@BennyDeeDev
Copy link

BennyDeeDev commented Mar 22, 2024

I had a similar issue and fixed it with conditional forwarding on my pi hole

image

Maybe someone with more network knowledge can explain why this only happens to pods but not on other devices in my network.

Hoping this can help you out

@ThijsBorst
Copy link

ThijsBorst commented Mar 25, 2024

I recreated the dns service with the following settings, which worked for me.

First remove it:
microk8s disable dns

Then after I've recreated it:
microk8s enable dns:<pi-hole address>

@ErnyTech
Copy link

root@dnsutils:/# nslookup deb.debian.org
Server:		10.152.183.10
Address:	10.152.183.10#53

Non-authoritative answer:
Name:	deb.debian.org.fritz.box
Address: 45.76.93.104

It looks that you are suffering DNS hijacking, please check this https://crapts.org/2024/04/21/all-fritz-box-modems-have-been-hijacked/

@syedhaidy
Copy link

syedhaidy commented Apr 23, 2024

Hi ,
Applied below steps, pods started to communicate each-other.

  1. microk8s disable dns
  2. microk8s enable dns:<ip address from /run/systemd/resolve/resolv.conf>

But after 1 hour , pod suddenly stopped communicating with each other.

Please help me to resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Question with a workaround
Projects
None yet
Development

No branches or pull requests

6 participants