Wrong Cilium L2Announce Holder #32148

phenomrascalov · 2024-04-24T00:25:55Z

Is there an existing issue for this?

I have searched the existing issues

What happened?

Hello,

I have a Kubernetes cluster with 1 master and 2 worker nodes.
I have Kubernetes version 1.29.4 and Cilium version 1.15.4 installed. I want to use L2 announcement.
Previously, I was using MetalLB, but I reinstalled it with the following Helm command and YAML files as part of the Cilium migration. I'm experiencing issues with Leases. For example, if the ingress controller is running on worker2, when I look at the leases, I see the holder of cilium-l2announce-ingress-nginx-ingress-nginx-controller as worker2, and I can access my application from the browser. So far, everything is fine. However, when I shut down or drain worker2, the ingress controller starts running on worker1 naturally. But the lease holder still appears as worker2, and the issue persists until I manually delete the lease or delete all Cilium pods. In this case, I cannot access the application. Can you help me with the solution to this problem?

helm install cilium cilium-1.15.4.tgz
--namespace kube-system
--set kubeProxyReplacement=true
--set l2announcements.enabled=true
--set loadBalancer.algorithm=maglev
--set externalIPs.enabled=true
--set nodePort.enabled=true
--set hostPort.enabled=true
--set ipam.mode=kubernetes
--set hubble.enabled=true
--set hubble.relay.enabled=true
--set hubble.ui.enabled=true
--set ipv4NativeRoutingCIDR=$(kubectl get cm -n kube-system kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' | yq '.networking.podSubnet')
--set k8sServiceHost=$(kubectl get nodes $(hostname) -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
--set k8sServicePort=6443
--set k8sClientRateLimit.qps=50
--set k8sClientRateLimit.burst=100
--set prometheus.enabled=true
--set operator.prometheus.enabled=true
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"
--set l2announcements.leaseDuration=20s
--set l2announcements.leaseRenewDeadline=10s
--set l2announcements.leaseRetryPeriod=200ms

apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "pool"
spec:
cidrs:
- cidr: "192.168.154.131/32"

apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: l2policy
spec:
loadBalancerIPs: true
externalIPs: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist

Best regards.

Cilium Version

v1.15.4

Kernel Version

5.15.0-105.125.6.2.2.el9uek.x86_64

Kubernetes Version

v1.29.4

Regression

No response

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Cilium Users Document

Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

I agree to follow this project's Code of Conduct

pstefka · 2024-04-24T21:28:24Z

I don't think there is a requirement to run L2 announcement lease on the same node as the targeting pod. The routing between L2 annoncement lease and target pod should be handled by kube-proxy (replacement).

In our setup, this seems to be working correctly for ingress L2 annoncement. However we're experiencing something similar, i.e. for syslog daemonset, the L2 annoncement lease is working only with the pod on the same node (though hubble says traffic to other pods on other nodes is working fine). Which is strange, because the behavior is different between two daemonset deployments on the same cluster.

We think, it might have something to do with #27151, though it is not working not only with externalIPs, but LoadBalancer services as well.

phenomrascalov · 2024-04-25T22:42:42Z

In my research and experiments on the subject, I noticed something interesting. I'm setting up an Ingress using Kubernetes Ingress cloud yaml below.

https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.9.4/deploy/static/provider/cloud/deploy.yaml

When I delete and re-expose the ingress-nginx-controller load balancer service created with yaml for testing, it works even though, as you mentioned, the lease and pods are on different nodes. I'm sharing the yaml files for both services.

---original---

apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.4
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 10.103.115.253
clusterIPs:

10.103.115.253
externalTrafficPolicy: Local
healthCheckNodePort: 32650
internalTrafficPolicy: Cluster
ipFamilies:

IPv4
ipFamilyPolicy: SingleStack
ports:

appProtocol: http
name: http
nodePort: 32226
port: 80
targetPort: http

appProtocol: https
name: https
nodePort: 31934
port: 443
targetPort: https
selector:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
type: LoadBalancer

---original---

---exposed---

apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.4
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 10.103.115.253
clusterIPs:

10.103.115.253
internalTrafficPolicy: Cluster
ipFamilies:

IPv4
ipFamilyPolicy: SingleStack
ports:

name: port-1
nodePort: 31997
port: 80

name: port-2
nodePort: 32233
port: 443

name: port-3
nodePort: 30641
port: 8443
selector:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
type: LoadBalancer

---exposed---

I don't understand exactly what's happening, but it works.

phenomrascalov · 2024-04-26T00:02:15Z

I think I finally understood the problem.
It seems that the solution is not to delete and re-expose the default lb service, but rather to set the externalTrafficPolicy to 'Cluster' instead of 'Local'.
But now I cannot get the client IP accessing the pods. When I check with echoserver, the pod ip appears. Because we set the policy to Cluster. I could only get the client IP address when the policy was Local. Looks like the problem will have to be solved through the lease again. :(

pstefka · 2024-04-26T07:21:34Z

This is my observation too. The real Client IP is visible only on the application pods colocated on the same node as the lease. The other pods get only the IP of the Cilium_host interface of the "lease" node. Which is logical, as the traffic must be transitioned to the overlay network (we're using).

The solution to the Client IP might be DSR mode. This is however only available in native-routing mode, not in the encapsulation mode

CorneJB · 2024-05-01T11:14:55Z

We're experiencing this in both DSR-opt and Geneve mode. L2announcement seems to be okay with both local and cluster externaltrafficpolicy, but traffic is only routed to the correct pod if both the L2announcement and pod are on the same node. We are running 1.14.0.

bgpControlPlane:
  enabled: false
bpf:
  masquerade: true
routingMode: native
ipv4NativeRoutingCIDR: <redacted>
enableIPv4Masquerade: true
autoDirectNodeRoutes: true
l2announcements:
  enabled: true
k8sClientRateLimit:
  qps: 10
  burst: 50
externalIPs:
  enabled: true
k8sServiceHost: <redacted>
k8sServicePort: 6443
gatewayAPI:
  enabled: false
containerRuntime:
  integration: containerd
  socketPath: "/var/run/k3s/containerd/containerd.sock"
hubble:
  metrics:
    enabled:
    - dns
    - drop
    - tcp
    - flow
    - icmp
    - http
  relay:
    enabled: true
  ui:
    enabled: false
encryption:
  enabled: true
  type: wireguard
kubeProxyReplacement: strict
loadBalancer:
  mode: dsr
  dsrDispatch: opt
operator:
  prometheus:
    enabled: true
prometheus:
  enabled: true

phenomrascalov added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Apr 24, 2024

ti-mo added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/l2-announcement and removed needs/triage This issue requires triaging to establish severity and next steps. labels Apr 25, 2024

falmar mentioned this issue May 11, 2024

Ingress Controller: upstream connect error or disconnect/reset before headers. reset reason: connection failure #20942

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong Cilium L2Announce Holder #32148

Wrong Cilium L2Announce Holder #32148

phenomrascalov commented Apr 24, 2024

pstefka commented Apr 24, 2024

phenomrascalov commented Apr 25, 2024

phenomrascalov commented Apr 26, 2024 •

edited

pstefka commented Apr 26, 2024

CorneJB commented May 1, 2024 •

edited

Wrong Cilium L2Announce Holder #32148

Wrong Cilium L2Announce Holder #32148

Comments

phenomrascalov commented Apr 24, 2024

Is there an existing issue for this?

What happened?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

pstefka commented Apr 24, 2024

phenomrascalov commented Apr 25, 2024

phenomrascalov commented Apr 26, 2024 • edited

pstefka commented Apr 26, 2024

CorneJB commented May 1, 2024 • edited

phenomrascalov commented Apr 26, 2024 •

edited

CorneJB commented May 1, 2024 •

edited