Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node is nat'd and doesn't know its IP address on hybrid cluster use wireguard-native is wrong #1889

Open
vast0906 opened this issue Feb 27, 2024 · 6 comments

Comments

@vast0906
Copy link

Cluster Configuration:

server:

  1. master
    EXTERNAL-IP: xx.xx.xx.xx
    INTERNAL-IP: 10.0.8.17

node:

  1. node-x86
    node-x86 is NAT'd and doesn't know its IP address.
    EXTERNAL-IP: xx.xx.xx.yy
    INTERNAL-IP: 192.168.36.22

  2. node-arm
    EXTERNAL-IP: xx.xx.xx.zz
    INTERNAL-IP: 10.0.1.217

  • Installed K3s:
export PUBLIC_IP=`curl -sSL https://ipconfig.sh`
export INSTALL_K3S_EXEC="--disable servicelb --kube-proxy-arg proxy-mode=ipvs  --kube-proxy-arg masquerade-all=true --kube-proxy-arg metrics-bind-address=0.0.0.0  --disable traefik --node-ip 10.0.8.17 --node-external-ip $PUBLIC_IP --flannel-backend wireguard-native --flannel-external-ip"
curl -sfL https://get.k3s.io | sh -
  • node-x86 configuration
/usr/local/bin/k3s \
    agent \
	'--node-ip' \
	'192.168.36.22' \
cat /etc/systemd/system/k3s-agent.service.env
K3S_TOKEN='K10f09c8dffcb10a0d83dbd3eb2875327de80ffe9c03a208fe68ffb5b32fa51d78e::server:5d3906836799daaa8b70851155c11190'
K3S_URL='https://xx.xx.xx.xx:6443'
  • node-arm configuration
/usr/local/bin/k3s \
    agent \
	'--node-ip' \
	'10.0.1.217' \
cat /etc/systemd/system/k3s-agent.service.env
K3S_TOKEN='K10f09c8dffcb10a0d83dbd3eb2875327de80ffe9c03a208fe68ffb5b32fa51d78e::server:5d3906836799daaa8b70851155c11190'
K3S_URL='https://xx.xx.xx.xx:6443'
  • master wg show
# wg show flannel-wg
interface: flannel-wg
  public key: Wxxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 25 seconds ago
  transfer: 11.72 MiB received, 6.53 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//Dxxx
  endpoint: 192.168.36.22:51820  # It's wrong
  allowed ips: 10.42.5.0/24
  transfer: 0 B received, 33.39 KiB sent
  persistent keepalive: every 25 seconds
  • node-x86 wg show
interface: flannel-wg
  public key: Ap//xxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 28 seconds ago
  transfer: 1.52 KiB received, 3.16 KiB sent
  persistent keepalive: every 25 seconds

peer: Ww7xx
  endpoint: xx.xx.xx.xx:51820
  allowed ips: 10.42.0.0/24
  transfer: 0 B received, 30.06 KiB sent
  persistent keepalive: every 25 seconds
  • node-arm wg show
interface: flannel-wg
  public key: hldi26xxxx
  private key: (hidden)
  listening port: 51820

peer: Ww7xxxx
  endpoint: xx.xx.xx.xx:51820
  allowed ips: 10.42.0.0/24
  latest handshake: 8 seconds ago
  transfer: 6.53 MiB received, 15.16 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//xxxx
  endpoint: xx.xx.xx.yy:8598 # that's right
  allowed ips: 10.42.5.0/24
  latest handshake: 1 minute, 12 seconds ago
  transfer: 2.86 KiB received, 2.04 KiB sent
  persistent keepalive: every 25 seconds

Expected Behavior

  • master wg show
# wg show flannel-wg
interface: flannel-wg
  public key: Wxxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 25 seconds ago
  transfer: 11.72 MiB received, 6.53 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//Dxxx
  endpoint: xx.xx.xx.yy:8598 # that's right
  allowed ips: 10.42.5.0/24
  transfer: 0 B received, 33.39 KiB sent
  persistent keepalive: every 25 seconds

Current Behavior

  • master wg show
# wg show flannel-wg
interface: flannel-wg
  public key: Wxxxx
  private key: (hidden)
  listening port: 51820

peer: hldi2xxxx
  endpoint: xx.xx.xx.zz:51820
  allowed ips: 10.42.2.0/24
  latest handshake: 25 seconds ago
  transfer: 11.72 MiB received, 6.53 MiB sent
  persistent keepalive: every 25 seconds

peer: Ap//Dxxx
  endpoint: 192.168.36.22:51820  # It's wrong
  allowed ips: 10.42.5.0/24
  transfer: 0 B received, 33.39 KiB sent
  persistent keepalive: every 25 seconds

Possible Solution

The master and node use the WIREGUARD negotiated endpoint consistently.

Steps to Reproduce (for bugs)

Context

Your Environment

  • Flannel version:
  • Backend used (e.g. vxlan or udp):
  • Etcd version:
  • Kubernetes version (if used): k3s -v
    k3s version v1.28.6+k3s2 (k3s-io/k3s@c9f49a3)
    go version go1.20.13
  • Operating System and version:
  • Link to your project (optional):

My English is very poor, please refer to this issue for specific details. Thank you

@manuelbuil
Copy link
Collaborator

Hey again! In your proposal, you are talking about server-client communication, where the client knows the endpoint of the server but the server only knows the public-key of the client. In this scenario, client can communicate with the server but the server can't communicate with client until the client contacts first, right?

The problem with the previous approach with Kubernetes is that the architecture is not a server-client when it comes to pod-pod communication. We are creating a mesh of tunnels between the nodes. Imagine a cluster of 3 nodes (node1, node2 and node3), I see for example two problems:
1 - When node3 comes up, should it know the endpoint of node1 and node2? Or only node1? How to decide on that?
2 - Imagine it knows the endpoint of both node1 and node2. But node1 and node2 don't know the endpoint of node3. If I understand correctly, node1 and node2 can't communicate with node3 unless node3 tries to communicate with them. That means that pods in node1 and node2 won't be able to contact node3 pods, right?

@vast0906
Copy link
Author

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

yes

server-client and pod-pod No conflict. The pod-pod network is a tunnel created through server-client. Pod-pod can communicate only after server-client establishes a connection and creates a tunnel.

@manuelbuil
Copy link
Collaborator

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

client can communicate with the server but the server can't communicate with client until the client contacts first, right?

yes

server-client and pod-pod No conflict. The pod-pod network is a tunnel created through server-client. Pod-pod can communicate only after server-client establishes a connection and creates a tunnel.

Right, but the server needs to wait for the client to contact it. What if the client never contacts the server?

@vast0906
Copy link
Author

Right, but the server needs to wait for the client to contact it. What if the client never contacts the server?

WIREGUARD contacts the server when it starts up, if client never contacts the server , Represents this node is not ready

@manuelbuil
Copy link
Collaborator

Right, but the server needs to wait for the client to contact it. What if the client never contacts the server?

WIREGUARD contacts the server when it starts up, if client never contacts the server , Represents this node is not ready

Imagine we have 2 nodes. 1 node is the k8s control-plane and 1 node is the k8s agent and it is behind a NAT (let's call it node1). In this case, I can see your suggestion working.

However, what happens if we add a new k8s agent node behing a NAT (let's call it node2)? We need to know the endpoint of node1 or node2 to create that tunnel between both nodes, right?

@vast0906
Copy link
Author

Imagine we have 2 nodes. 1 node is the k8s control-plane and 1 node is the k8s agent and it is behind a NAT (let's call it node1). In this case, I can see your suggestion working.

However, what happens if we add a new k8s agent node behing a NAT (let's call it node2)? We need to know the endpoint of node1 or node2 to create that tunnel between both nodes, right?

I'm not sure if the wireguard master will synchronize all endpoint information to the other node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants