Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[provider-local] VPN tunnel check succeeds even if VPN is broken #9604

Open
timebertt opened this issue Apr 17, 2024 · 6 comments
Open

[provider-local] VPN tunnel check succeeds even if VPN is broken #9604

timebertt opened this issue Apr 17, 2024 · 6 comments
Assignees
Labels
area/networking Networking related area/testing Testing related kind/bug Bug

Comments

@timebertt
Copy link
Member

timebertt commented Apr 17, 2024

How to categorize this issue?

/area networking testing
/kind bug

What happened:

In the provider-local HA setup (tested with single-zone but should also apply to multi-zone), kube-apiserver talks directly to the kubelet API instead of using the VPN connection.
With this, operations like kubectl logs and kubectl port-forward (for which the kubelet API is called by kube-apiserver) work even if the VPN connection is broken.

As the VPN tunnel check performed by gardenlet uses a port-forward operation (code), the shoot can be reconciled successfully and be marked as healthy even if the VPN connection is broken.

This problem might cause bugs and regressions in the VPN setup to go unnoticed.
E.g., in #9597 there was a problem in the HA VPN configuration (fixed in a later commit).
Nevertheless, most test cases of pull-gardener-e2e-kind-ha-{single,multi}-zone succeeded. I.e., shoot creations were successful although the VPN connection was never working.
The problem was only discovered by chance in the credentials rotation test case (ref).

What you expected to happen:

If the VPN connection cannot be established successfully

  • shoot reconciliations should fail
  • shoot status should be set to unhealthy
  • e2e tests should fail accordingly

How to reproduce it (as minimally and precisely as possible):

  1. make kind-ha-single-zone-up gardener-ha-single-zone-up
  2. Apply the following patch to example/provider-local/shoot.yaml
--- a/example/provider-local/shoot.yaml
+++ b/example/provider-local/shoot.yaml
@@ -8,6 +8,10 @@ metadata:
     shoot.gardener.cloud/cloud-config-execution-max-delay-seconds: "0"
     authentication.gardener.cloud/issuer: "managed"
 spec:
+  controlPlane:
+    highAvailability:
+      failureTolerance:
+        type: node
   cloudProfileName: local
   secretBindingName: local # dummy, doesn't contain any credentials
   region: local
  1. kubectl apply -f example/provider-local/shoot.yaml
  2. Wait for the shoot to be reconciled successfully and healthy.
  3. Verify manually that the VPN connection works:
    • k -n kube-system logs deploy/metrics-server --request-timeout 2s
    • k -n kube-system port-forward svc/metrics-server 8443:443 --request-timeout 2s
    • k top no
  4. Break the VPN connection: k -n shoot--local--local scale sts vpn-seed-server --replicas 0
  5. Ensure there are no more open TCP connections from kube-apiserver to kubelet: k -n shoot--local--local delete po -l role=apiserver
  6. Repeat the VPN verification from step 5. logs and port-forward work, while connection to the metrics-server (k top no) doesn't work.
  7. Observe that the shoot status is healthy.

Anything else we need to know?:

This only applies to HA clusters, where routes to the shoot networks are configured explicitly in the kube-apiserver pods.
For non-HA clusters, there is an EgressSelectorConfiguration that connects to the envoy-proxy container in the vpn-seed-server using HTTPConnect instead of using explicitly configured IP routes.
E.g.:

$ k -n shoot--local--local exec -it deploy/kube-apiserver -c vpn-path-controller -- sh
~ # ip r
default via 169.254.1.1 dev eth0
10.3.0.0/16 via 192.168.123.195 dev bond0 # shoot pod network
10.4.0.0/16 via 192.168.123.195 dev bond0 # shoot service network
192.168.123.0/26 dev tap0 proto kernel scope link src 192.168.123.9 # VPN network
192.168.123.64/26 dev tap1 proto kernel scope link src 192.168.123.72 # VPN network
192.168.123.192/26 dev bond0 proto kernel scope link src 192.168.123.237 # VPN network
169.254.1.1 dev eth0 scope link
~ # ip r get 10.1.54.75 # node IP
10.1.54.75 via 169.254.1.1 dev eth0 src 10.1.178.85 uid 0
    cache

Note, that there is no route for the shoot node network. This is because Shoot.spec.networking.nodes is empty, as is overlaps with Seed.spec.networks.pods (provider-local starts pods in the seed as shoot nodes).
Hence, kube-apiserver can talk directly to the kubelet API via the seed pod network.

There are even multiple mechanisms for allowing this direct communication path from kube-apiserver to kubelet:

To verify that kube-apiserver of local HA shoots talks directly to the kubelet API, use the following steps:

  1. Create a HA shoot. Wait for the shoot to be reconciled successfully and healthy.
  2. k -n shoot--local--local delete netpol allow-machine-pods
  3. k -n shoot--local--local delete svc machines
  4. Ensure there are no more open TCP connections from kube-apiserver to kubelet: k -n shoot--local--local delete po -l role=apiserver
  5. Repeat the VPN verification from step 5 above. logs and port-forward don't work (don't use the VPN connection), while connection to the metrics-server (k top no) works (uses the working VPN connection).
  6. Observe that the shoot status is unhealthy because the port-forward operation doesn't work.

Environment:

  • Gardener version: v1.93.0-dev
@gardener-prow gardener-prow bot added area/networking Networking related area/testing Testing related kind/bug Bug labels Apr 17, 2024
@ScheererJ
Copy link
Contributor

This issue does also affect non-HA scenarios in the local setup. As there is no node range defined for shoots in the local setup the network connectivity will be the following for the VPN check in the reconciliation:

kube-apiserver -> envoy proxy container of vpn-seed-server pod -> machine pod via seed cluster network -> kubelet

In real scenarios, it should be like this:

kube-apiserver -> envoy proxy container of vpn-seed-server pod -> local route to vpn device created by vpn-seed-server container in same pod -> vpn-shoot -> actual node -> kubelet

Fixing this can prevent regressions, but there are also validations in place preventing shoot/seed network overlaps, which may make this somewhat challenging.

@timebertt
Copy link
Member Author

You're right. In the non-HA scenario, kube-apiserver will always connect to vpn-seed-server because of the EgressSelectorConfiguration. This one, however, routes only pod and service IPs via the VPN, but nodes are routed via the seed's pod network (again, because Shoot.spec.networking.nodes is empty):

$ k -n shoot--local--local exec -it deploy/vpn-seed-server -c vpn-seed-server -- sh
~ # ip r
default via 169.254.1.1 dev eth0
10.3.0.0/16 via 192.168.123.2 dev tun0
10.4.0.0/16 via 192.168.123.2 dev tun0
169.254.1.1 dev eth0 scope link
192.168.123.0/24 dev tun0 proto kernel scope link src 192.168.123.1
~ # ip r get 10.1.130.210 # node IP
10.1.130.210 via 169.254.1.1 dev eth0 src 10.1.131.18 uid 0
    cache

We can verify this route by breaking the VPN connection on the shoot-side this time:

  1. k -n shoot--local--local annotate mr shoot-core-vpn-shoot resources.gardener.cloud/ignore=true
  2. k -n kube-system scale deploy vpn-shoot --replicas 0
  3. Ensure there are no more open TCP connections from kube-apiserver to kubelet: k -n shoot--local--local delete po -l role=apiserver
  4. kubectl logs and kubectl port-forward still works (direct route to kubelet API without VPN) but k top no is broken as services are routed through the VPN.

@timebertt timebertt changed the title [provider-local] VPN tunnel check in HA setup succeeds even if VPN is broken [provider-local] VPN tunnel check succeeds even if VPN is broken Apr 18, 2024
@timebertt
Copy link
Member Author

To summarize:

Problem

In the provider-local setup, the VPN tunnel check performed by gardenlet (port-forward check) does not detect a broken VPN tunnel, because either kube-apiserver (HA clusters) or vpn-seed-server (non-HA clusters) route requests to the kubelet API directly via the seed's pod network.
When the VPN connection is broken, kubectl port-forward and kubectl logs continue to work, while k top no (APIServices, Webhooks, etc.) is broken.

We should strive towards resolving this discrepancy between the local setup and cloud setups regarding the VPN connection to prevent bugs by validating the real setup in e2e tests.

Proposal

1. Set Shoot.spec.networking.nodes

Setting Shoot.spec.networking.nodes ensures the VPN configures routes for the node network via the VPN tunnel.
Though, this network must not overlap with Seed.spec.networks.pods.
Right now, the Seed.spec.networks.pods field is only used for API validation to prevent obvious misconfigurations. I don't see a problem if the seed pod network is larger than what is configured in Seed.spec.networks.pods.

Hence, we could split the seed pod network into one default subnet (configured in Seed.spec.networks.pods) and add a dedicated calico IPPool for machine pods (configured in Seed.spec.networks.shootDefaults.pods -> Shoot.spec.networking.pods).
With this, the IP packets would still be routable to and between machine pods, but we would have disjoint networks to configure in the API objects and with this correct routes via the VPN tunnel.

2. Forbid direct communication of seed components with machine pods

We want to ensure that the VPN tunnel checks only succeed if the VPN is successfully established.
For this, we need to drop all NetworkPolicies allowing communication from seed components to machine pods.
E.g., when the VPN connection is broken in HA clusters, there will be no route to the shoot node network and hence packets will be routed via the seed pod network.

Additional Improvements

We might also consider augmenting gardenlet's tunnel check. In addition to testing an operation that talks to the node network for the kubelet API (e.g., port-forward), it could also test an operation targeting the pod/service network (e.g., metrics server).
This doesn't resolve the discrepancy in setups but might be useful in general to detect more failure cases in shoot health checks.

WDYT?

@Lappihuan

This comment was marked as off-topic.

@timebertt

This comment was marked as off-topic.

@timebertt
Copy link
Member Author

/assign @rfranzke @timebertt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking Networking related area/testing Testing related kind/bug Bug
Projects
None yet
Development

No branches or pull requests

4 participants