node is nat'd and doesn't know its IP address on hybrid cluster use wireguard-native is wrong #1889

vast0906 · 2024-02-27T08:12:20Z

Cluster Configuration:

server:

master
EXTERNAL-IP: xx.xx.xx.xx
INTERNAL-IP: 10.0.8.17

node:

node-x86
node-x86 is NAT'd and doesn't know its IP address.
EXTERNAL-IP: xx.xx.xx.yy
INTERNAL-IP: 192.168.36.22
node-arm
EXTERNAL-IP: xx.xx.xx.zz
INTERNAL-IP: 10.0.1.217

Installed K3s:

export PUBLIC_IP=`curl -sSL https://ipconfig.sh`
export INSTALL_K3S_EXEC="--disable servicelb --kube-proxy-arg proxy-mode=ipvs  --kube-proxy-arg masquerade-all=true --kube-proxy-arg metrics-bind-address=0.0.0.0  --disable traefik --node-ip 10.0.8.17 --node-external-ip $PUBLIC_IP --flannel-backend wireguard-native --flannel-external-ip"
curl -sfL https://get.k3s.io | sh -

node-x86 configuration

/usr/local/bin/k3s \
    agent \
	'--node-ip' \
	'192.168.36.22' \
cat /etc/systemd/system/k3s-agent.service.env
K3S_TOKEN='K10f09c8dffcb10a0d83dbd3eb2875327de80ffe9c03a208fe68ffb5b32fa51d78e::server:5d3906836799daaa8b70851155c11190'
K3S_URL='https://xx.xx.xx.xx:6443'

node-arm configuration

/usr/local/bin/k3s \
    agent \
	'--node-ip' \
	'10.0.1.217' \
cat /etc/systemd/system/k3s-agent.service.env
K3S_TOKEN='K10f09c8dffcb10a0d83dbd3eb2875327de80ffe9c03a208fe68ffb5b32fa51d78e::server:5d3906836799daaa8b70851155c11190'
K3S_URL='https://xx.xx.xx.xx:6443'

master wg show

# wg show flannel-wg
interface: flannel-wg
  public key: Wxxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 25 seconds ago
  transfer: 11.72 MiB received, 6.53 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//Dxxx
  endpoint: 192.168.36.22:51820  # It's wrong
  allowed ips: 10.42.5.0/24
  transfer: 0 B received, 33.39 KiB sent
  persistent keepalive: every 25 seconds

node-x86 wg show

interface: flannel-wg
  public key: Ap//xxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 28 seconds ago
  transfer: 1.52 KiB received, 3.16 KiB sent
  persistent keepalive: every 25 seconds

peer: Ww7xx
  endpoint: xx.xx.xx.xx:51820
  allowed ips: 10.42.0.0/24
  transfer: 0 B received, 30.06 KiB sent
  persistent keepalive: every 25 seconds

node-arm wg show

interface: flannel-wg
  public key: hldi26xxxx
  private key: (hidden)
  listening port: 51820

peer: Ww7xxxx
  endpoint: xx.xx.xx.xx:51820
  allowed ips: 10.42.0.0/24
  latest handshake: 8 seconds ago
  transfer: 6.53 MiB received, 15.16 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//xxxx
  endpoint: xx.xx.xx.yy:8598 # that's right
  allowed ips: 10.42.5.0/24
  latest handshake: 1 minute, 12 seconds ago
  transfer: 2.86 KiB received, 2.04 KiB sent
  persistent keepalive: every 25 seconds

Expected Behavior

master wg show

# wg show flannel-wg
interface: flannel-wg
  public key: Wxxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 25 seconds ago
  transfer: 11.72 MiB received, 6.53 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//Dxxx
  endpoint: xx.xx.xx.yy:8598 # that's right
  allowed ips: 10.42.5.0/24
  transfer: 0 B received, 33.39 KiB sent
  persistent keepalive: every 25 seconds

Current Behavior

master wg show

# wg show flannel-wg
interface: flannel-wg
  public key: Wxxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 25 seconds ago
  transfer: 11.72 MiB received, 6.53 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//Dxxx
  endpoint: 192.168.36.22:51820  # It's wrong
  allowed ips: 10.42.5.0/24
  transfer: 0 B received, 33.39 KiB sent
  persistent keepalive: every 25 seconds

Possible Solution

The master and node use the WIREGUARD negotiated endpoint consistently.

Steps to Reproduce (for bugs)

Context

Your Environment

Flannel version:
Backend used (e.g. vxlan or udp):
Etcd version:
Kubernetes version (if used): k3s -v
k3s version v1.28.6+k3s2 (k3s-io/k3s@c9f49a3)
go version go1.20.13
Operating System and version:
Link to your project (optional):

My English is very poor, please refer to this issue for specific details. Thank you

The text was updated successfully, but these errors were encountered:

manuelbuil · 2024-02-27T09:02:18Z

Hey again! In your proposal, you are talking about server-client communication, where the client knows the endpoint of the server but the server only knows the public-key of the client. In this scenario, client can communicate with the server but the server can't communicate with client until the client contacts first, right?

The problem with the previous approach with Kubernetes is that the architecture is not a server-client when it comes to pod-pod communication. We are creating a mesh of tunnels between the nodes. Imagine a cluster of 3 nodes (node1, node2 and node3), I see for example two problems:
1 - When node3 comes up, should it know the endpoint of node1 and node2? Or only node1? How to decide on that?
2 - Imagine it knows the endpoint of both node1 and node2. But node1 and node2 don't know the endpoint of node3. If I understand correctly, node1 and node2 can't communicate with node3 unless node3 tries to communicate with them. That means that pods in node1 and node2 won't be able to contact node3 pods, right?

vast0906 · 2024-02-27T09:27:40Z

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

yes

server-client and pod-pod No conflict. The pod-pod network is a tunnel created through server-client. Pod-pod can communicate only after server-client establishes a connection and creates a tunnel.

manuelbuil · 2024-02-27T11:53:08Z

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

yes

server-client and pod-pod No conflict. The pod-pod network is a tunnel created through server-client. Pod-pod can communicate only after server-client establishes a connection and creates a tunnel.

Right, but the server needs to wait for the client to contact it. What if the client never contacts the server?

vast0906 · 2024-02-28T06:02:14Z

Right, but the server needs to wait for the client to contact it. What if the client never contacts the server?

WIREGUARD contacts the server when it starts up, if client never contacts the server , Represents this node is not ready

manuelbuil · 2024-02-28T13:24:09Z

Right, but the server needs to wait for the client to contact it. What if the client never contacts the server?

WIREGUARD contacts the server when it starts up, if client never contacts the server , Represents this node is not ready

Imagine we have 2 nodes. 1 node is the k8s control-plane and 1 node is the k8s agent and it is behind a NAT (let's call it node1). In this case, I can see your suggestion working.

However, what happens if we add a new k8s agent node behing a NAT (let's call it node2)? We need to know the endpoint of node1 or node2 to create that tunnel between both nodes, right?

vast0906 · 2024-02-29T07:36:04Z

Imagine we have 2 nodes. 1 node is the k8s control-plane and 1 node is the k8s agent and it is behind a NAT (let's call it node1). In this case, I can see your suggestion working.

However, what happens if we add a new k8s agent node behing a NAT (let's call it node2)? We need to know the endpoint of node1 or node2 to create that tunnel between both nodes, right?

I'm not sure if the wireguard master will synchronize all endpoint information to the other node

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node is nat'd and doesn't know its IP address on hybrid cluster use wireguard-native is wrong #1889

node is nat'd and doesn't know its IP address on hybrid cluster use wireguard-native is wrong #1889

vast0906 commented Feb 27, 2024

manuelbuil commented Feb 27, 2024

vast0906 commented Feb 27, 2024

manuelbuil commented Feb 27, 2024

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

vast0906 commented Feb 28, 2024

manuelbuil commented Feb 28, 2024

vast0906 commented Feb 29, 2024

node is nat'd and doesn't know its IP address on hybrid cluster use wireguard-native is wrong #1889

node is nat'd and doesn't know its IP address on hybrid cluster use wireguard-native is wrong #1889

Comments

vast0906 commented Feb 27, 2024

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

manuelbuil commented Feb 27, 2024

vast0906 commented Feb 27, 2024

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

manuelbuil commented Feb 27, 2024

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

vast0906 commented Feb 28, 2024

manuelbuil commented Feb 28, 2024

vast0906 commented Feb 29, 2024