k8s operator: tailscale ingress sometimes tries to connect to 127.0.0.1 instead of ClusterIP, fails with "netstack: could not connect to local server at ..." #12079

garymm · 2024-05-09T23:31:48Z

What is the issue?

I'm really not sure how to reproduce this but I've seen this a couple of times. Restarting the tailscale ingress pods seems to fix it.

I set up two services (docker-registry, and headlamp) with type ClusterIP both listening on port 80.
Both services have a tailscale ingress.
This is a test cluster with only one node, so everything is on the same node.

When trying to connect, I see errors like this in the tailscale pod:

2024/05/09 22:55:03 Accept: TCP{100.115.199.49:52551 > 100.77.184.113:80} 64 tcp ok
2024/05/09 22:55:03 netstack: could not connect to local server at 127.0.0.1:80: dial tcp 127.0.0.1:80: connect: connection refused

Restarting the tailscale ingress pod seems to fix the issue.

I'm not a kubernetes expert, but it seems supicious that tailscale is trying to connect to the service on 127.0.0.1:80 rather than using the cluster IP.

# kubectl get svc -A
NAMESPACE         NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
default           kubernetes                         ClusterIP   10.233.0.1     <none>        443/TCP                  30m
docker-registry   docker-registry                    ClusterIP   10.233.22.50   <none>        80/TCP                 17m
headlamp          headlamp                           ClusterIP   10.233.49.34   <none>        80/TCP                   27m
kube-system       coredns                            ClusterIP   10.233.0.3     <none>        53/UDP,53/TCP,9153/TCP   29m
tailscale         ts-docker-registry-ingress-4w69x   ClusterIP   None           <none>        <none>                   16m
tailscale         ts-headlamp-ingress-6mzrv          ClusterIP   None           <none>        <none>                   17m

# kubectl get ingress -A
NAMESPACE         NAME                      CLASS       HOSTS   ADDRESS                                      PORTS     AGE
docker-registry   docker-registry-ingress   tailscale   *       berkeley-staging-docker.taila1eba.ts.net     80, 443   36m
headlamp          headlamp-ingress          tailscale   *       berkeley-staging-headlamp.taila1eba.ts.net   80, 443   36m

I am able to connect to both services simultaneously using kubectl port-forward, so I'm pretty sure this is not an inherit limitation of my kubernetes set-up.

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Linux

OS version

kubernetes

Tailscale version

1.62.1

Other software

calico CNI

Bug report

BUG-893cd6c8f00a44fda54bd05672e58e601316cc21d2892024305ff3afa3f4c675-20240509231821Z-5c7d259ff6d31521

The text was updated successfully, but these errors were encountered:

garymm · 2024-05-13T21:38:06Z

I'm seeing this again. I tried upgradng to 1.64.2 (latest helm chart) and deleting all the pods and this time I can't figure out a way to fix it.

garymm · 2024-05-13T21:44:29Z

Restarting the kubernetes host seems to have fixed it, at least for now.

garymm added bug Bug needs-triage labels May 9, 2024

garymm changed the title ~~k8s operator: tailscale ingress uses 127.0.0.1 rather than ClusterIP, causing failure when two services use the same port~~ k8s operator: tailscale ingress sometimes fails with "netstack: could not connect to local server at ..." May 9, 2024

irbekrm added the kubernetes label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s operator: tailscale ingress sometimes tries to connect to 127.0.0.1 instead of ClusterIP, fails with "netstack: could not connect to local server at ..." #12079

k8s operator: tailscale ingress sometimes tries to connect to 127.0.0.1 instead of ClusterIP, fails with "netstack: could not connect to local server at ..." #12079

garymm commented May 9, 2024 •

edited

garymm commented May 13, 2024

garymm commented May 13, 2024

k8s operator: tailscale ingress sometimes tries to connect to 127.0.0.1 instead of ClusterIP, fails with "netstack: could not connect to local server at ..." #12079

k8s operator: tailscale ingress sometimes tries to connect to 127.0.0.1 instead of ClusterIP, fails with "netstack: could not connect to local server at ..." #12079

Comments

garymm commented May 9, 2024 • edited

What is the issue?

Steps to reproduce

Are there any recent changes that introduced the issue?

OS

OS version

Tailscale version

Other software

Bug report

garymm commented May 13, 2024

garymm commented May 13, 2024

garymm commented May 9, 2024 •

edited