Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: support routing strategy "failure zone" #543

Open
6 tasks
jkh52 opened this issue Nov 30, 2023 · 2 comments
Open
6 tasks

FR: support routing strategy "failure zone" #543

jkh52 opened this issue Nov 30, 2023 · 2 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@jkh52
Copy link
Contributor

jkh52 commented Nov 30, 2023

Feature Request: add a new proxy strategy for k8s "failure zone" or similar.

Currently, konnectivity-server supports 3 --proxy-strategy flag cases: default, destHost, and defaultRoute (code), and agent supports 6 --agent-identifiers cases: ipv4, ipv6, host, cidr, uid, default-route. (code).

I see discussion of "failure zone" in the original PR adding proxy stragies (#144), but the reference implementation does not fully support it.

The rough task list as I imagine it:

  • API decisions
    • how does apiserver specify the hint (likely needs a dial protocol update)
    • how does konnectivity-agent specify zone
  • konnectivity-server changes (new proxy strategy)
  • konnectivity-agent changes (support new identifier)
  • apiserver changes (pass the dial hint)

In our specific use case, in a given cluster we run a single Deployment of agents across GCE zones, with topologySpreadConstraints that includes well-known label topologyKey: topology.kubernetes.io/zone. Supporting that same value would be ideal; but I see that the Downward API does not currently support providing Node labels to Pods (kubernetes/kubernetes#40610).

UPDATES:

There is lots of ambiguity here, the above task list might not be the right approach. It may be difficult to make apiserver support the hint, since it's dialers are often created far away from the associated resource(s). It may be more feasible to
calculate "failure zone" from the dial IP address. Another question is whether to build all the logic into Konnectivity components, or put some responsibility on the cloud provider.

@jkh52
Copy link
Contributor Author

jkh52 commented Nov 30, 2023

@cheftako, would you add anything?

@jkh52 jkh52 added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 30, 2023
@jkh52
Copy link
Contributor Author

jkh52 commented Dec 1, 2023

@andrewsykim, do you have any suggestions on approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

1 participant