Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium DSR with geneve in native routing mode does not work in Azure #31169

Open
2 of 3 tasks
QxBytes opened this issue Mar 5, 2024 · 2 comments
Open
2 of 3 tasks

Cilium DSR with geneve in native routing mode does not work in Azure #31169

QxBytes opened this issue Mar 5, 2024 · 2 comments
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.

Comments

@QxBytes
Copy link

QxBytes commented Mar 5, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Setup

image
The blue line path is how the http traffic flows (the tcp handshake is not shown)

I have an nginx pod on each vm and a LoadBalancer service for the nginx pods. This is running in azure on aks.

I take a packet capture on vm0 and vm1 and then ping the load balancer ip until the request is routed to cilium, which then routes it to the other vm (if cilium chooses the nginx pod on the same vm that the azure load balancer routed it to the request returns just fine with the load balancer ip as the source). Below are the packet captures:
(pcap: 20.29.225.195 is my ip; 20.112.45.212 is lb; 10.10.0.4 vm 0; 10.10.0.5 vm 1; 192.168.1.174 is nginx on vm 0; request hits vm 1 first)
VM 0
image
VM 1
image
Geneve shows the node ips (so if the packet goes to vm0 from vm1, the geneve src is 10.10.0.5 and the dst is 10.10.0.4). The client only sees the handshake and sending out a GET request, but no response after that. The TCP handshake occurs with the load balancer ip, but the directly returning packet has the node's ip (10.10.0.4) as the source.

Second Setup

We also tried with a different mode in Azure with similar results (HTTP OK is sent with the source as the node ip rather than the load balancer ip). Mode is still cilium dsr with geneve without tunnel. More details can be provided.

SNAT Behavior Setup

We compared the DSR behavior with regular SNAT mode (no dsr, no geneve) and found that the HTTP OK is sent back with the load balancer ip as the source.

Request hits vm 1 first, .76 is nginx pod on vm 0, 10.10.0.4 is vm0 and 10.10.0.5 is vm1
VM 0 (Starts at 48)
image

VM 1 (Starts at 85)
image

Expected Behavior

The packet successfully gets load balanced to one machine and cilium sends the packet to another machine, where the reply (HTTP OK) returns directly to the client successfully (we get a response). Right now there is no response if the packet is rerouted by cilium to a different vm.

Questions

  • Is it intended behavior that the returning packet has the node's ip rather than the load balancer's ip even though the handshake is with the load balancer ip?
  • How can packet No.148 have its source set as the load balancer ip? It's not sent in the previous packet No.147 or mentioned in the encapsulation. There are also no iptable or nftable rules that mention the load balancer ip.

Cilium Version

1.14.3

Kernel Version

Linux aks-nodepool1-11172287-vmss000000 5.15.0-1054-azure #62-Ubuntu SMP Mon Jan 15 15:51:19 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

1.27.9

Regression

No

Sysdump

cilium-sysdump-20240305-100604.zip

Relevant log output

No response

Anything else?

Cilium dsr geneve no tunnel config

apiVersion: v1
data:
  agent-not-ready-taint-key: node.cilium.io/agent-not-ready
  arping-refresh-period: 30s
  auto-direct-node-routes: "false"
  bpf-lb-dsr-dispatch: "geneve"
  bpf-lb-external-clusterip: "false"
  bpf-lb-map-max: "65536"
  bpf-lb-mode: dsr
  bpf-map-dynamic-size-ratio: "0.0025"
  bpf-policy-map-max: "16384"
  bpf-root: /sys/fs/bpf
  cgroup-root: /run/cilium/cgroupv2
  cilium-endpoint-gc-interval: 5m0s
  cluster-id: "0"
  cluster-name: default
  debug: "false"
  disable-cnp-status-updates: "true"
  disable-endpoint-crd: "false"
  enable-auto-protect-node-port-range: "true"
  enable-bgp-control-plane: "false"
  enable-bpf-clock-probe: "true"
  enable-endpoint-health-checking: "false"
  enable-endpoint-routes: "true"
  enable-health-check-nodeport: "true"
  enable-health-checking: "true"
  enable-host-legacy-routing: "true"
  enable-hubble: "false"
  enable-ipv4: "true"
  enable-ipv4-masquerade: "false"
  enable-ipv6: "false"
  enable-ipv6-masquerade: "false"
  enable-k8s-terminating-endpoint: "true"
  enable-l2-neigh-discovery: "true"
  enable-l7-proxy: "false"
  enable-local-node-route: "false"
  enable-local-redirect-policy: "false"
  enable-metrics: "true"
  enable-policy: default
  enable-remote-node-identity: "true"
  enable-session-affinity: "true"
  enable-svc-source-range-check: "true"
  enable-vtep: "false"
  enable-well-known-identities: "false"
  enable-xt-socket-fallback: "true"
  identity-allocation-mode: crd
  install-iptables-rules: "true"
  install-no-conntrack-iptables-rules: "false"
  ipam: delegated-plugin
  kube-proxy-replacement: "true"
  kube-proxy-replacement-healthz-bind-address: ""
  local-router-ipv4: 169.254.23.0
  metrics: +cilium_bpf_map_pressure
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  monitor-aggregation-interval: 5s
  node-port-bind-protection: "true"
  nodes-gc-interval: 5m0s
  operator-api-serve-addr: 127.0.0.1:9234
  operator-prometheus-serve-addr: :9963
  preallocate-bpf-maps: "false"
  procfs: /host/proc
  prometheus-serve-addr: :9962
  remove-cilium-node-taints: "true"
  routing-mode: "native"
  set-cilium-is-up-condition: "true"
  sidecar-istio-proxy-image: cilium/istio_proxy
  synchronize-k8s-nodes: "true"
  tofqdns-dns-reject-response-code: refused
  tofqdns-enable-dns-compression: "true"
  tofqdns-endpoint-max-ip-per-hostname: "50"
  tofqdns-idle-connection-grace-period: 0s
  tofqdns-max-deferred-connection-deletes: "10000"
  tofqdns-min-ttl: "3600"
  tofqdns-proxy-response-max-delay: 100ms
  unmanaged-pod-watcher-interval: "15"
  vtep-cidr: ""
  vtep-endpoint: ""
  vtep-mac: ""
  vtep-mask: ""
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: cilium
    meta.helm.sh/release-namespace: kube-system
  labels:
    app.kubernetes.io/managed-by: Helm
  name: cilium-config
  namespace: kube-system

Cilium no dsr (snat) config

apiVersion: v1
data:
  agent-not-ready-taint-key: node.cilium.io/agent-not-ready
  arping-refresh-period: 30s
  auto-direct-node-routes: "false"
  bpf-lb-external-clusterip: "false"
  bpf-lb-map-max: "65536"
  bpf-lb-mode: snat
  bpf-map-dynamic-size-ratio: "0.0025"
  bpf-policy-map-max: "16384"
  bpf-root: /sys/fs/bpf
  cgroup-root: /run/cilium/cgroupv2
  cilium-endpoint-gc-interval: 5m0s
  cluster-id: "0"
  cluster-name: default
  debug: "false"
  disable-cnp-status-updates: "true"
  disable-endpoint-crd: "false"
  enable-auto-protect-node-port-range: "true"
  enable-bgp-control-plane: "false"
  enable-bpf-clock-probe: "true"
  enable-endpoint-health-checking: "false"
  enable-endpoint-routes: "true"
  enable-health-check-nodeport: "true"
  enable-health-checking: "true"
  enable-host-legacy-routing: "true"
  enable-hubble: "false"
  enable-ipv4: "true"
  enable-ipv4-masquerade: "false"
  enable-ipv6: "false"
  enable-ipv6-masquerade: "false"
  enable-k8s-terminating-endpoint: "true"
  enable-l2-neigh-discovery: "true"
  enable-l7-proxy: "false"
  enable-local-node-route: "false"
  enable-local-redirect-policy: "false"
  enable-metrics: "true"
  enable-policy: default
  enable-remote-node-identity: "true"
  enable-session-affinity: "true"
  enable-svc-source-range-check: "true"
  enable-vtep: "false"
  enable-well-known-identities: "false"
  enable-xt-socket-fallback: "true"
  identity-allocation-mode: crd
  install-iptables-rules: "true"
  install-no-conntrack-iptables-rules: "false"
  ipam: delegated-plugin
  kube-proxy-replacement: strict
  kube-proxy-replacement-healthz-bind-address: ""
  local-router-ipv4: 169.254.23.0
  metrics: +cilium_bpf_map_pressure
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  monitor-aggregation-interval: 5s
  node-port-bind-protection: "true"
  nodes-gc-interval: 5m0s
  operator-api-serve-addr: 127.0.0.1:9234
  operator-prometheus-serve-addr: :9963
  preallocate-bpf-maps: "false"
  procfs: /host/proc
  prometheus-serve-addr: :9962
  remove-cilium-node-taints: "true"
  set-cilium-is-up-condition: "true"
  sidecar-istio-proxy-image: cilium/istio_proxy
  synchronize-k8s-nodes: "true"
  tofqdns-dns-reject-response-code: refused
  tofqdns-enable-dns-compression: "true"
  tofqdns-endpoint-max-ip-per-hostname: "50"
  tofqdns-idle-connection-grace-period: 0s
  tofqdns-max-deferred-connection-deletes: "10000"
  tofqdns-min-ttl: "3600"
  tofqdns-proxy-response-max-delay: 100ms
  #Replaces tunnel: disabled in v1.15
  routing-mode: "native"
  unmanaged-pod-watcher-interval: "15"
  vtep-cidr: ""
  vtep-endpoint: ""
  vtep-mac: ""
  vtep-mask: ""
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: cilium
    meta.helm.sh/release-namespace: kube-system
  labels:
    app.kubernetes.io/managed-by: Helm
  name: cilium-config
  namespace: kube-system

Nginx yaml

apiVersion: apps/v1
kind: Deployment
metadata:
   name: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1
        ports:
        - name: http
          containerPort: 80

Service yaml

apiVersion: v1
kind: Service
metadata:
  name: ngx
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80
  selector:
    app: nginx

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@QxBytes QxBytes added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels Mar 5, 2024
@QxBytes QxBytes changed the title Cilium DSR with geneve without tunnel does not return packets with load balancer ip in Azure Cilium DSR with geneve in native routing mode does not work in Azure Mar 5, 2024
@joestringer joestringer added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loadbalancing Impacts load-balancing and Kubernetes service implementations labels Mar 5, 2024
Copy link

github-actions bot commented May 5, 2024

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 5, 2024
@julianwiedmann
Copy link
Member

From the config you shared, it looks like you don't have BPF-based masquerading enabled. So most likely hitting the issue described in #32189.

@julianwiedmann julianwiedmann added the feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. label May 6, 2024
@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/loadbalancing Impacts load-balancing and Kubernetes service implementations feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
None yet
Development

No branches or pull requests

3 participants