Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress Controller: upstream connect error or disconnect/reset before headers. reset reason: connection failure #20942

Open
2 tasks done
karelvanhecke opened this issue Aug 17, 2022 · 33 comments
Assignees
Labels
area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. area/servicemesh GH issues or PRs regarding servicemesh kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.

Comments

@karelvanhecke
Copy link

karelvanhecke commented Aug 17, 2022

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When using the built-in ingress controller, the following issue occurs for a portion of requests to the ingress:
upstream connect error or disconnect/reset before headers. reset reason: connection failure

Cilium helm values:

enableIPv4Masquerade: false
enableIPv6Masquerade: false
ingressController:
  enabled: true
ipam:
  mode: kubernetes
ipv4NativeRoutingCIDR: 10.1.128.0/18
ipv6:
  enabled: true
ipv6NativeRoutingCIDR: <IPV6_PREFIX>:8e40::/59
k8s:
  requireIPv4PodCIDR: true
  requireIPv6PodCIDR: true
k8sServiceHost: cluster1-api
k8sServicePort: 6443
kubeProxyReplacement: strict
kubeProxyReplacementHealthzBindAddr: '[::]:10256'
loadBalancer:
  algorithm: maglev
  mode: dsr
tunnel: disabled

Ingress configuration:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: nginx-cilium
spec:
  ingressClassName: cilium
  rules:
    - http:
        paths:
          - pathType: Prefix
            backend:
              service:
                name: nginx
                port:
                  number: 80
            path: /

Nginx service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:stable
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80

Http request:

$ curl -v [<IPV6_PREFIX>:8e60::1]
* Rebuilt URL to: [<IPV6_PREFIX>:8e60::1]/
*   Trying <IPV6_PREFIX>:8e60::1...
* TCP_NODELAY set
* Connected to <IPV6_PREFIX>:8e60::1 (<IPV6_PREFIX>:8e60::1) port 80 (#0)
> GET / HTTP/1.1
> Host: [<IPV6_PREFIX>:8e60::1]
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 91
< content-type: text/plain
< date: Wed, 17 Aug 2022 11:13:03 GMT
< server: envoy
<
* Connection #0 to host<IPV6_PREFIX>:8e60::1 left intact
upstream connect error or disconnect/reset before headers. reset reason: connection failure

Also occurs when using IPv4.

The issue cannot be reproduced when using the NGINX ingress controller.

Cilium Version

Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
Client: 1.12.1 4c9a630 2022-08-15T16:29:39-07:00 go version go1.18.5 linux/amd64
Daemon: 1.12.1 4c9a630 2022-08-15T16:29:39-07:00 go version go1.18.5 linux/amd64

Kernel Version

Linux cluster1-control1 5.14.0-70.22.1.el9_0.x86_64 #1 SMP PREEMPT Tue Aug 2 10:02:12 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"archive", BuildDate:"2022-07-21T00:00:00Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:23:26Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@karelvanhecke karelvanhecke added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. labels Aug 17, 2022
@sayboras
Copy link
Member

👋

Can you help to share spec for nginx service, or how to install it as well ?

@sayboras sayboras self-assigned this Aug 17, 2022
@sayboras sayboras added the area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. label Aug 17, 2022
@karelvanhecke
Copy link
Author

@sayboras Added the spec for the sample nginx service that's used as a backend.
I also included the helm values used during the Cilium installation.

@sayboras
Copy link
Member

thanks, let me test it out and get back to you soon.

@karelvanhecke
Copy link
Author

karelvanhecke commented Aug 17, 2022

I've narrowed it down a bit: connections to the internal ClusterIP of the Cilium ingress service don't seem to have the problem.

internal client pod -> Cilium ingress ClusterIP: No issues
external client -> Cilium ingress nodeport: issue occurs for +-50% of requests
external client -> Cilium ingress loadbalancer IP: issue occurs for +-50% of requests

The issue still never occurs when reproducing the same scenarios for the nginx service itself, nor the nginx ingress controller.

@karelvanhecke
Copy link
Author

The connection to the backend service fails when the endpoint of that service is on the same node that received the request to the Cilium ingress controller.

For example:
Cilium ingress controller on node A receives a request:

  • when it proxies the request to an endpoint of the backend service on node B, the connection succeeds
  • when it proxies the request to an endpoint on node A locally, the connection fails

Cilium monitor output shows that the connection from node A (2001:db82:1d1c:8e04::a) to the backend service pod on node A (2001:db82:1d1c:8e0c::5a0f) fails in this scenario.

-> endpoint 2311 flow 0xa1f34b1c , identity host->10882 state established ifindex lxcd3a5bfa10428 orig-ip 2001:db82:1d1c:8e04::a: 2001:db82:1d1c:8e04::a -> 2001:db82:1d1c:8e0c::5a0f DestinationUnreachable(AddressUnreachable)

#20634 seems to describe the same issue.

@pranaypathik
Copy link

Hello
Just a note that I am also facing similar issues when using ingress controller running on the same node.
I was pointed out to this github issue via the slack channel. I was testing out https://docs.cilium.io/en/v1.12/gettingstarted/tls-visibility/ and by chance the ingress controller and the source container was on the same node.
When i move ingress controller to a separate node it works as summariesd by karelvanhecke.

@sayboras
Copy link
Member

Can you test it out again with latest snapshot release ? We have some users confirm that the issue is no longer happening with https://github.com/cilium/cilium/releases/tag/v1.14.0-snapshot.2

@karelvanhecke
Copy link
Author

Can you test it out again with latest snapshot release ? We have some users confirm that the issue is no longer happening with https://github.com/cilium/cilium/releases/tag/v1.14.0-snapshot.2

I spun up a fresh dualstack cluster, same configuration as before and was able to reproduce the issue.

The ingress controller and backend pod being on the same node still triggers:
upstream connect error or disconnect/reset before headers. reset reason: connection failure

Checked without DSR and it still occurred.

@karelvanhecke
Copy link
Author

The issue only occurs when native routing is enabled.
In tunnel mode, cilium ingress is able to access the target pod on the same node.

@zs-ko
Copy link

zs-ko commented Oct 6, 2023

I have the extact same issue using the Gateway API. Services/pods on the same node as the envoy proxy receiver causes 503. No issue when routing across nodes.

Also using native routing.

edit*
disabling native routing and using geneve tunnel resolves the issue

@thebearingedge
Copy link

I haven't taken the time to build a minimal reproduction but I did notice an error in either the API server pod or the scheduler pod. When applying an ingress.

There was a missing or incorrect resource API group. So there seemed to be a malformed object being created and eventually the object was garbage collected.

When I switched to using contour envoy for ingress, the issue went away.

I wonder if anyone else experiencing these problems would also see the same errors logged in the API server pod or kube scheduler. I ended up switching to contour envoy for ingress and didn't have the problem anymore.

Will try to reproduce when I have time.

@nebula-it
Copy link

nebula-it commented Oct 15, 2023

I was trying to replicate my prod Talos Linux cluster in docker and hit this issue 100% of time. Although the same cilium config does not produce any errors when running on actual cluster (i.e nodes are separate VMs or physical machines).
So I think I've the easiest way to replicate this issue in just docker. Let me know if that would be helpful to troubleshoot this further and I can post the quick start guide to get Talos cluster up in docker.

Edit: This is an issue on the actual cluster as well, just gets hidden by the fact that most of the time the node receiving traffic and one hosting pod are different. If its same node, then we hit the same issue.

@youngnick youngnick added the area/servicemesh GH issues or PRs regarding servicemesh label Dec 5, 2023
@ahmedwarsama
Copy link

Seems to be similar to this issue #28837. I have a similar problem. When the traffic hits a pod on the same node as the envoy proxy it errors in native routing mode. Works perfectly with encapsulation mode though.

@ekarlso
Copy link

ekarlso commented Jan 28, 2024

I'm too getting this on Oracle Linux 9, Kernel 5.15.x, Cilium 1.15.x ci image and k8s 1.28.5

@acelinkio
Copy link

acelinkio commented Feb 5, 2024

Could you please try manually loading the kernel modules a couple of kernel modules and see if that resolves the issue. #25021 (comment)

sudo modprobe iptable_raw
sudo modprobe xt_socket

Manually load kernel modules resolved my upstream reset: reset reason: connection failure problem. Now I am fighting with upstream connect error or disconnect/reset before headers. reset reason: connection timeout errors

@aanm aanm added the sig/agent Cilium agent related. label Mar 25, 2024
@nebula-it
Copy link

Hey @sayboras, I can confirm the latest release (1.15.2) does not fix this either.
I have found a quick way to reproduce this in docker using Talos, so if it helps with debugging this further I can post the instructions.

@DennisGlindhart
Copy link

Try setting enable-endpoint-routes: true (helm-value: endpointRoutes.enabled). Solved something similar ( Issue #30510 ) for me, which could be related to same-node-traffic

@JustinLex
Copy link

I'm having the same 503 "connection timeout" issue with Gateway API and enable-endpoint-routes: true fixed it for me.

I'm running Cilium 1.15.3 with native routing on IPv6-prioritized dual-stack. I have autoDirectNodeRoutes and wireguard enabled and I'm using bgpControlPlane for exposing loadbalancer services. Full values.yaml file here

I noticed a strange "ICMPv6 DestinationUnreachable" in the hubble flows, but there was other traffic on the pod too, so it might not have been from ingress/gateway. Might be a red herring, but I also didn't see the 3rd "ACK" packet from ingress, so there is definitely an issue during the handshake.

Apr  8 06:03:11.988: [2600:70ff:b04f:beef::3445]:37141 (ingress) -> kube-system/hubble-ui-6548d56557-szfhc:8081 (ID:22188) to-endpoint FORWARDED (TCP Flags: SYN)
Apr  8 06:03:11.988: [2600:70ff:b04f:beef::3445]:37141 (ingress) <- kube-system/hubble-ui-6548d56557-szfhc:8081 (ID:22188) to-network FORWARDED (TCP Flags: SYN, ACK)
Apr  8 06:03:18.120: 2600:70ff:b04f:beef::9b10 (host) -> kube-system/hubble-ui-6548d56557-szfhc (ID:22188) to-endpoint FORWARDED (ICMPv6 DestinationUnreachable(AddressUnreachable))

@chancez
Copy link
Contributor

chancez commented Apr 8, 2024

I'm running 1.15.1 in EKS and enable-endpoint-routes: true doesn't seem to be making any difference, FWIW.

@Victorion
Copy link

Victorion commented Apr 8, 2024

I'm running 1.15.1 in EKS and enable-endpoint-routes: true doesn't seem to be making any difference, FWIW.

same, I tried 1.15.2 and 1.15.3 on EKS.
native routing with and without geneve + enable-endpoint-routes: true

From the affected node the connection to NodePort is failing with timeout from envoy,
so envoy cannot connect to a service on the same affected node:

curl -s localhost:xxxxx
* Connected to localhost (127.0.0.1) port xxxxx
upstream connect error or disconnect/reset before headers. reset reason: connection timeout

It's working though, if traffic's coming from N to S on non-affected node (+ you app, i.e. backend, is running on another node than the NodePort that's accepted the connection)

It looks like related: #29967

issues you've encountered are scoped to one node failing at a time?

It's clearly one node at a time:
I have 3 nodes - tried to replace&recreate them multiple times, always the envoy behind a NodePort on a single(random?) node's failing to connect to a running app.

@nebula-it
Copy link

Tested enable-endpoint-routes: true with 1.15.2 can confirm it still does not resolve the issue.

@sayboras
Copy link
Member

👋

Appreciate if you can test with the image digest from the below.

It seems like the issue is happening for native routing, in which the reply package from backend to envoy proxy is dropped. Additionally, if you have bpf.masquerade enabled, you might need to set bpf.hostLegacyRouting as well.

I am still working on CI failures, however, appreciate if someone can test it out with the below image digest.

https://github.com/cilium/cilium/actions/runs/8648652179/job/23713378330?pr=31280

@nebula-it
Copy link

So just the cilium image from that? I am assuming the first one quay.io/cilium/cilium-ci:e14988e02ce8d0f3451b177a4f97a3833df65ab3@sha256:57082ec610e9132942f0be1ffd2469e338f72bd8905663c85d81a4b89ae232f9 not sure what race and unstripped one are.

By any chance, Do you have values.yaml file I'd need to update images if its more than one.

@acelinkio
Copy link

Still seeing the issue with Cilium version 1.15.4. Here are the values I used for deploying

autoDirectNodeRoutes: true
bandwidthManager:
  enabled: true
  bbr: false
bpf:
  masquerade: true
cluster:
  name: home-cluster
  id: 1
containerRuntime:
  integration: containerd
  socketPath: /var/run/k3s/containerd/containerd.sock
endpointRoutes:
  enabled: true
hubble:
  enabled: true
  metrics:
    enabled:
      - dns:query
      - drop
      - tcp
      - flow
      - port-distribution
      - icmp
      - http
  relay:
    enabled: true
    rollOutPods: true
  ui:
    enabled: true
    rollOutPods: true
    ingress:
      enabled: false
ipam:
  mode: kubernetes
ipv4NativeRoutingCIDR: "10.42.0.0/16"
k8sServiceHost: "192.168.1.195"
k8sServicePort: 6443
kubeProxyReplacement: true
kubeProxyReplacementHealthzBindAddr: 0.0.0.0:10256
l2announcements:
  enabled: true
loadBalancer:
  algorithm: maglev
  mode: dsr
localRedirectPolicy: true
operator:
  replicas: 1
  rollOutPods: true
rollOutCiliumPods: true
securityContext:
  privileged: true
routingMode: native
gatewayAPI:
  enabled: true
  secretsNamespace:
    create: false
    name: certificate

Attached are the logs for the single bad Cilium pod (1.15.4), where gateway ingress traffic was unable to route to pods on the same node. echo-679b5c479f-r9qmk and grafana-57474b9568-vx54p. Logs offer a lot more information with this release. cilium-agent(1.15.4).log

Thanks @sayboras Will do some testing later today with the CI image and the setting you suggested.

@nebula-it the way I find CI images is going to the job that builds operator-generic, https://github.com/cilium/cilium/actions/runs/8648652179/job/23712986461?pr=31280. Then finding the step called CI Build operator-generic and you'll see a --tag quay.io/cilium/operator-generic-ci: that you can find the tag from.

values

image:
  override: "quay.io/cilium/cilium-ci:e14988e02ce8d0f3451b177a4f97a3833df65ab3"
operator:
  image:
    override: "quay.io/cilium/operator-generic-ci:e14988e02ce8d0f3451b177a4f97a3833df65ab3"

@sayboras
Copy link
Member

Thanks, as part of PR build, we also push one dev chart version as well.

https://github.com/cilium/cilium/actions/runs/8648766222/job/23713392131

Example commands:
helm template -n kube-system oci://quay.io/cilium-charts-dev/cilium --version 1.16.0-dev-dev.134-tam-proxy-tunnel-e14988e02c
helm install cilium -n kube-system oci://quay.io/cilium-charts-dev/cilium --version 1.16.0-dev-dev.134-tam-proxy-tunnel-e14988e02c

@nebula-it
Copy link

Still same results with

image:
  override: "quay.io/cilium/cilium-ci:e14988e02ce8d0f3451b177a4f97a3833df65ab3"
operator:
  image:
    override: "quay.io/cilium/operator-generic-ci:e14988e02ce8d0f3451b177a4f97a3833df65ab3"

and I have tested with both:

bpf:
  masquerade: false

and

bpf:
  masquerade: true
  hostLegacyRouting: true

@acelinkio
Copy link

Using Cilium chart 1.15.4 with the CI images from above. testing via:

  • deleting all of the pods in the cilium daemonset + operator deployment
  • Waiting for pods to recreate and become healthy
  • Testing all of the websites behind Cilium's gateway
# still encounters bad Cilium pod that refuses to route to resources on same node.  Took about 5 tries to recreate.
bpf:
  masquerade: true
  hostLegacyRouting: false
# have not been able to recreate the issue.  Tried 20+ times
bpf:
  masquerade: true
  hostLegacyRouting: true

@JustinLex
Copy link

The issue seems to have come back for me now after doing some unrelated node restarts.

I get upstream connect error or disconnect/reset before headers. reset reason: connection timeout from Envoy, and Hubble shows that the upstream pod is receiving either an "ICMPv6 DestinationUnreachable" or an "ICMPv6 TimeExceeded(HopLimitExceeded)" during the TCP handshake with Ingress.

I have tried switching between VXLAN and native, and I have tried options like endpointRoutes, autoDirectNodeRoutes, and hostLegacyRouting. I have tried switching between envoy daemonset and embedded envoy, and I have tried disabling wireguard.

I tried the 1.15.4 chart with the quay.io/cilium/cilium-ci:e14988e02ce8d0f3451b177a4f97a3833df65ab3 ci images, and restarted all cilium and upstream pods, and it still doesn't work. All other cluster traffic is unaffected, upstream pods are just unable to respond to Envoy. Full values.yaml here

I suspect that I have a similar node-based intermittency issue as the others, but I use BGP load-balancing so my connections to the gateway are always hitting the same node.

@lukapetrovic-git
Copy link

Using Cilium chart 1.15.4 with the CI images from above. testing via:

  • deleting all of the pods in the cilium daemonset + operator deployment
  • Waiting for pods to recreate and become healthy
  • Testing all of the websites behind Cilium's gateway
# still encounters bad Cilium pod that refuses to route to resources on same node.  Took about 5 tries to recreate.
bpf:
  masquerade: true
  hostLegacyRouting: false
# have not been able to recreate the issue.  Tried 20+ times
bpf:
  masquerade: true
  hostLegacyRouting: true

This worked for me in initial tests as well (version 1.14.9), though it is less performant than eBPF host routing, at least it seems to solve the issue

@acelinkio
Copy link

acelinkio commented Apr 16, 2024

Hey @sayboras , please let us know how the community can best assist.

Really appreciate the progress made over the last couple months with optimizing envoy configurations, addressing race conditions in the operator, and other bug fixes along the way. Big thanks to the Cilium team!

@falmar
Copy link

falmar commented May 11, 2024

Hi all!

I have been trying all suggestions on this issue and the other similar issues related to ingress/gateway api problems, which all or most have in common is the use of l2 announcements and lb-ipam. I was curious about the leases after reading this issue #32148

And it turns out that my particular case, if the backend pods are running on the same node that has the lease for the service LoadBalancer IP, then upstream connect error occurs

using curl from outside the cluster would fail, and even inside the cluster (the node itself) curl'ing both to LoadBalancer ip and ClusterIP also fail. but from another node within the cluster then it worked.

I tried with and without DSR/Geneve, standalone or embedded envoys, I even tried using hostNetwork from this PR #30840 and the it would still not work.

Although I was focusing my effort on Gateway API

I have enabled (on 1.16.0-pre.2):

hostLegacyRouting: true

and seems to be working hopefully this is it.

v1.15.4 and 1.16.0-pre.2

k8sServiceHost: ~
k8sServicePort: ~
k8sClientRateLimit:
  qps: 50
  burst: 200

kubeProxyReplacement: true

ipam:
  operator:
    clusterPoolIPv4PodCIDRList: [ "10.32.0.0/16" ]

enableIPv4Masquerade: true
routingMode: native
ipv4NativeRoutingCIDR: "10.32.0.0/16"
autoDirectNodeRoutes: true
devices: eth+

endpointRoutes:
  enabled: true

bpf:
  masquerade: true
  tproxy: true
  hostLegacyRouting: true

nodePort:
  enabled: true

tunnelProtocol: ""
loadBalancer:
  mode: dsr
  dsrDispatch: opt
  acceleration: best-effort
  l7:
    backend: envoy

l2announcements:
  enabled: true

externalIps:
  enabled: true
  
rollOutCiliumPods: true

@BartoszGiza
Copy link

Hey, We have the same issue. Setting host legacy routing to true fixes issue with communicating cilium envoy to pod which is on the same host.
We are using now 1.16.pre.2

@farcaller
Copy link
Contributor

Same issue. I have masquerade turned off so I have to use native routing, and this breaks the ingress. hostLegacyRouting seems to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Impacts proxy components, including DNS, Kafka, Envoy and/or XDS servers. area/servicemesh GH issues or PRs regarding servicemesh kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests