Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Kubernetes Operator: support Gateway API to reduce the amount of LE resources used to expose cluster services over HTTPS #10656

Open
irbekrm opened this issue Dec 20, 2023 · 0 comments
Labels
fr Feature request kubernetes L3 Some users Likelihood needs-triage-eng Ready for triage by Engineering team P5 Halts deployment Priority level

Comments

@irbekrm
Copy link
Contributor

irbekrm commented Dec 20, 2023

What are you trying to do?

Currently a cluster resource can be exposed to a tailnet over HTTPS using an annotated Ingress resource.

A user can create a namespace-scoped Ingress resource with one or more Service backends in that namespace fronting workloads (in the same namespace) that they wish to expose to the tailnet.
We then create a cluster proxy which is a tailnet node and trigger LetsEncrypt cert issuance for node's MagicDNS name. We do not currently support issuing wildcard certificates or users providing their own certs.

Ingress and LE certs

Ingress resource is scoped to a namespace, see kubernetes/kubernetes#17088. (There is no way how to specify a namespace for the backend on Ingress spec).

This means that users have to create an Ingress per each namespace in which there are Kubernetes Services that they want to expose to a cluster over HTTPS. In large or/and ephemeral installations this means LetsEncrypt rate limit issues. Currently we create a new ACME account per each Ingress, which means that folks will likely hit the 10 accounts per IP address limit. We could fix this by making it easier to cache the ACME account key. If we do that, the next limit to hit would be 300 orders per account which can happen in installations with a large number of namespaces.

Gateway API

Gateway API has a model that allows to have a single Gateway that routes traffic to backends in different namespaces, see multiple applications behind a single Gateway use case.

How should we solve this?

Support Gateway API's Gateway resource alongside Ingress in a similar way ('tailscale' gateway class). Users then could specify backends for a single Gateway in different namespaces in the standard way recommeded by the Gateway API.

The one downside is that Gateway CRDs are not part of a default cluster installation, so any functionality in our operator that looks at Gateway resources will have to be opt-in.

What is the impact of not solving this?

  • if we don't solve this, the amount of LetsEncrypt certs that need to be issued will be one per each namespace which contains Service(s) that users want to expose over HTTPS, so installations that are large and/or ephemeral will not be able to expose their Services to tailnet over HTTPS because they will run into rate limiting issues for certs

  • large installations might be forced to choose solutions that put more strain on Tailscale's overall LetsEncrypt rate limit thus affecting other users

Anything else?

Alternatives:

  • we could look into whether, if we supported external name Services as Ingress backends, a cross-namespce setup could be achieved using an external name Service referring to another Service in a different namespace, like described in Cross-namespace Ingress kubernetes/kubernetes#17088 (comment) . It is not clear whether this would work and it would not be able to benefit from the Gateway API model where users scoped to a namespace can configure Services there to be exposed without touching resources in a shared namespace

  • (not thought through in depth) we could have some alternative way how users can specify that a Service in a namespace is a backend for our Ingress in a different namespace, for example we could watch annotated Services and add them to an existing Ingress. This though could be a confusing and error-prone

@irbekrm irbekrm added kubernetes L3 Some users Likelihood P5 Halts deployment Priority level fr Feature request needs-triage-eng Ready for triage by Engineering team labels Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fr Feature request kubernetes L3 Some users Likelihood needs-triage-eng Ready for triage by Engineering team P5 Halts deployment Priority level
Projects
None yet
Development

No branches or pull requests

1 participant