Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Cilium L2Announce Holder #32148

Open
2 of 3 tasks
phenomrascalov opened this issue Apr 24, 2024 · 5 comments
Open
2 of 3 tasks

Wrong Cilium L2Announce Holder #32148

phenomrascalov opened this issue Apr 24, 2024 · 5 comments
Labels
feature/l2-announcement kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@phenomrascalov
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Hello,

I have a Kubernetes cluster with 1 master and 2 worker nodes.
I have Kubernetes version 1.29.4 and Cilium version 1.15.4 installed. I want to use L2 announcement.
Previously, I was using MetalLB, but I reinstalled it with the following Helm command and YAML files as part of the Cilium migration. I'm experiencing issues with Leases. For example, if the ingress controller is running on worker2, when I look at the leases, I see the holder of cilium-l2announce-ingress-nginx-ingress-nginx-controller as worker2, and I can access my application from the browser. So far, everything is fine. However, when I shut down or drain worker2, the ingress controller starts running on worker1 naturally. But the lease holder still appears as worker2, and the issue persists until I manually delete the lease or delete all Cilium pods. In this case, I cannot access the application. Can you help me with the solution to this problem?

helm install cilium cilium-1.15.4.tgz
--namespace kube-system
--set kubeProxyReplacement=true
--set l2announcements.enabled=true
--set loadBalancer.algorithm=maglev
--set externalIPs.enabled=true
--set nodePort.enabled=true
--set hostPort.enabled=true
--set ipam.mode=kubernetes
--set hubble.enabled=true
--set hubble.relay.enabled=true
--set hubble.ui.enabled=true
--set ipv4NativeRoutingCIDR=$(kubectl get cm -n kube-system kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' | yq '.networking.podSubnet')
--set k8sServiceHost=$(kubectl get nodes $(hostname) -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
--set k8sServicePort=6443
--set k8sClientRateLimit.qps=50
--set k8sClientRateLimit.burst=100
--set prometheus.enabled=true
--set operator.prometheus.enabled=true
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}"
--set l2announcements.leaseDuration=20s
--set l2announcements.leaseRenewDeadline=10s
--set l2announcements.leaseRetryPeriod=200ms

apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "pool"
spec:
cidrs:
- cidr: "192.168.154.131/32"

apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: l2policy
spec:
loadBalancerIPs: true
externalIPs: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist

Best regards.

Cilium Version

v1.15.4

Kernel Version

5.15.0-105.125.6.2.2.el9uek.x86_64

Kubernetes Version

v1.29.4

Regression

No response

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@phenomrascalov phenomrascalov added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Apr 24, 2024
@pstefka
Copy link

pstefka commented Apr 24, 2024

I don't think there is a requirement to run L2 announcement lease on the same node as the targeting pod. The routing between L2 annoncement lease and target pod should be handled by kube-proxy (replacement).

In our setup, this seems to be working correctly for ingress L2 annoncement. However we're experiencing something similar, i.e. for syslog daemonset, the L2 annoncement lease is working only with the pod on the same node (though hubble says traffic to other pods on other nodes is working fine). Which is strange, because the behavior is different between two daemonset deployments on the same cluster.

We think, it might have something to do with #27151, though it is not working not only with externalIPs, but LoadBalancer services as well.

@ti-mo ti-mo added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/l2-announcement and removed needs/triage This issue requires triaging to establish severity and next steps. labels Apr 25, 2024
@phenomrascalov
Copy link
Author

In my research and experiments on the subject, I noticed something interesting. I'm setting up an Ingress using Kubernetes Ingress cloud yaml below.

https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.9.4/deploy/static/provider/cloud/deploy.yaml

When I delete and re-expose the ingress-nginx-controller load balancer service created with yaml for testing, it works even though, as you mentioned, the lease and pods are on different nodes. I'm sharing the yaml files for both services.

---original---

apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.4
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 10.103.115.253
clusterIPs:

  • 10.103.115.253
    externalTrafficPolicy: Local
    healthCheckNodePort: 32650
    internalTrafficPolicy: Cluster
    ipFamilies:
  • IPv4
    ipFamilyPolicy: SingleStack
    ports:
  • appProtocol: http
    name: http
    nodePort: 32226
    port: 80
    targetPort: http
  • appProtocol: https
    name: https
    nodePort: 31934
    port: 443
    targetPort: https
    selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
    type: LoadBalancer

---original---

---exposed---

apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.9.4
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 10.103.115.253
clusterIPs:

  • 10.103.115.253
    internalTrafficPolicy: Cluster
    ipFamilies:
  • IPv4
    ipFamilyPolicy: SingleStack
    ports:
  • name: port-1
    nodePort: 31997
    port: 80
  • name: port-2
    nodePort: 32233
    port: 443
  • name: port-3
    nodePort: 30641
    port: 8443
    selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
    type: LoadBalancer

---exposed---

I don't understand exactly what's happening, but it works.

@phenomrascalov
Copy link
Author

phenomrascalov commented Apr 26, 2024

I think I finally understood the problem.
It seems that the solution is not to delete and re-expose the default lb service, but rather to set the externalTrafficPolicy to 'Cluster' instead of 'Local'.
But now I cannot get the client IP accessing the pods. When I check with echoserver, the pod ip appears. Because we set the policy to Cluster. I could only get the client IP address when the policy was Local. Looks like the problem will have to be solved through the lease again. :(

@pstefka
Copy link

pstefka commented Apr 26, 2024

This is my observation too. The real Client IP is visible only on the application pods colocated on the same node as the lease. The other pods get only the IP of the Cilium_host interface of the "lease" node. Which is logical, as the traffic must be transitioned to the overlay network (we're using).

The solution to the Client IP might be DSR mode. This is however only available in native-routing mode, not in the encapsulation mode

@CorneJB
Copy link

CorneJB commented May 1, 2024

We're experiencing this in both DSR-opt and Geneve mode. L2announcement seems to be okay with both local and cluster externaltrafficpolicy, but traffic is only routed to the correct pod if both the L2announcement and pod are on the same node. We are running 1.14.0.

bgpControlPlane:
  enabled: false
bpf:
  masquerade: true
routingMode: native
ipv4NativeRoutingCIDR: <redacted>
enableIPv4Masquerade: true
autoDirectNodeRoutes: true
l2announcements:
  enabled: true
k8sClientRateLimit:
  qps: 10
  burst: 50
externalIPs:
  enabled: true
k8sServiceHost: <redacted>
k8sServicePort: 6443
gatewayAPI:
  enabled: false
containerRuntime:
  integration: containerd
  socketPath: "/var/run/k3s/containerd/containerd.sock"
hubble:
  metrics:
    enabled:
    - dns
    - drop
    - tcp
    - flow
    - icmp
    - http
  relay:
    enabled: true
  ui:
    enabled: false
encryption:
  enabled: true
  type: wireguard
kubeProxyReplacement: strict
loadBalancer:
  mode: dsr
  dsrDispatch: opt
operator:
  prometheus:
    enabled: true
prometheus:
  enabled: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/l2-announcement kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

No branches or pull requests

4 participants