Skip to content

Brutus5000/k8s-config

Repository files navigation

FAF K8S config

This repository implements the FAF architecture in Kubernetes. It’s intended to supersede the current FAF Docker-Compose stack.

Requirements & tools

  • k3s or k3d installed

    • The main node must run with some arguments:
      --node-label=storage-id=main-01 (determines the main storage node)

  • jq installed (required for scripts)

  • Recommended: A k8s ui such as Lens (GUI) or k9s (CLI)

Motivation

Immediate goals

  • Allow more people access to logs, configuration and deployments for certain services without giving them full server access.

    • Role based access control

    • Direct cluster access via kubectl for all authorized developers

    • Improved configuration and secret management

  • Advanced resource controls thanks to cpu and ram limits (no app shall consume all CPU ever again)

  • Easier debugging on test environment due to port-forwarding of pods without compromising production config

Long-term goals

  • Better integration into CI pipelines with automated deployments

Non-Goals

  • We do not move to k8s because it is cool and fancy!

    • K8s has much more complexity compared to our docker-compose stack

    • We would avoid it, if docker-compose could solve our problems (which it can’t).

    • It was a very conscious trade-off decision.

  • We do not move to k8s because we want to deploy FAF on a managed cloud provider.

    • Cloud providers are super expensive. We’d have nothing to gain here.

  • We do not move to k8s become highly available!

    • High availability only works if all components are highly available. Most of our apps are not built in that way at all.

    • Deployments with less downtime might be a benefit for some services.

Decision Log

Distribution selection

  • We’ll use k3s.

    • It is fully supported by NixOS and is a simplified distribution which should be easier to maintain.

    • It also runs on developer machines.

    • It uses few resources.

  • Running the same distribution on prod and on local machines makes things more predictable and scripts more stable.

    • Minikube should be mostly compatible, if some devs insist on using it

Volume management

  • We’ll run with manual managed persistent volumes and claims, because we need predictable paths.

  • Predictable paths are a necessity for managing the volumes with ZFS.

  • Using k3s local-path-provisioner we can define the prefix (in the configmap local-path-config) and the suffix (in the mounting options in the pod), but in between these there is a random uuid we can’t know beforehand.
    This breaks predefined setups and scripts.

  • K8s builtin local-path with node affinity ensures all data of a volume can be stored on a selected node (the node label with storage-id=main-01)

Traefik IngressRoute over default Ingress definitions

  • K3s comes with Traefik as Ingress controller by default.

  • The default Ingress controller in the outside world is nginx.

  • Traefik is well known to FAF since we use it as revers proxy in our faf-stack extensively

  • Traefik offers support for

    • classic Ingress definitions, but requires ingress annotations to use more advanced features (similar to Traefik labels in our current docker-compose.yml)

    • custom IngressRoute definitions which maps the exact Traefik feature set into a yaml format (no annotations required)

We have to select which resource type we use and we should stick to it consistently. As always it’s a tradeoff:

  • Pro classic Ingress

    • Class ingress is stable by (not so long) now, while Traefik IngressRoutes are still marked as alpha (yet we use Traefik for quite a while and there were rarely changes even from 1.x to 2.x)

    • Classic Ingress is well-known syntax and understood by most external K8s users. So the entry barrier for external contributions is lower. However a lot of functionality would hide behind the Traefik annotations which would still need people to learn it to understand it all.

    • Using classic ingress would allow us to swap out Traefik anytime and still have a mostly working setup

  • Pro Traefik IngressRoute

    • We (the FAF responsible Ops guys) see Traefik as superior compared to Nginx (and moved from Nginx to Traefik as reversy proxy years ago)

      • Thus we do not expect moving back

    • We have an existing stack we need to migrate 1:1

    • Since we use Traefik features anyway using the IngressRoute reduces the overall yaml complexity as we do not split logic and annotations

    • Traefik syntax seems easier to understand than regular Ingress, so using Traefik syntax might lower the barrier for external contributors who never used classic Ingress.

Decision: We’ll use Traefik IngressRoutes.

Certificate management & Let’s encrypt

  • We could run for Traefik certificate resolvers or use cert-manager

  • Cert-Manager works with classic Ingress routes and Traefik specific IngressRoutes

    • Needs additional software

    • Has a short support cycle (6 months per point release)

    • ⇒ More maintenance overhead

  • Traefik internal let’s encrypt resolver needs to be manually configured on the node

    • It stores certificates somewhere on disk

    • The easiest approach is a persistent volume on the main storage node

      • This effectively restricts Traefik to run on a single node

    • More sophisticate approach is storing the certificates in a persistent remote / network volume

    • Once we have full Cloudflare access, we can do Cloudflare DNS challenge using a Cloudflare token. Then Traefik does not need to issue one certificate per subdomain. It’s unclear though if this makes persisting the certificate obsolete.

Decision: We’ll use Traefik as long as we don’t run into any problems, since it seems less maintenance buurden. Cert-manager can still be introduced later if required.

RabbitMQ

For RabbitMQ there are 3 potential ways of implementing:

  • Manually define a single-node statefulset as a 1:1 copy of faf-stack.

  • A Helm chart from Bitnami

  • Deploying the RabbitMQ operator

Decision: We’ll run for the Bitnami helm charts. It is really awesome configurable so that it can read our secrets, so the template can be perfectly configure. This simplifies coding compared to a manual stefulset. The RabbitMQ operator seems much more complex for now.

User access and RBAC

  • We want to give access to multiple people with potentially different permissions.

  • Handing out service account certificates is quite annoying.

  • An SSO login via OIDC is preferred and supported by K8s / K3s.

    • The preferred identity provider would be Github as all developers are there and its outside the system itself. Unfortunately Gitlab only supports OAuth2 and not OIDC.

    • Google accounts would be an alternative, but we don’t want to force people on Google.

    • We’ll use FAFs custom login instead.

    • As a fallback (in case the FAF login is broken) we still have the main service account.

  • RBAC t.b.d.

Developer environment & reproducibility

  • No service shall go live if its initial configuration or installation can’t be scripted.

  • Everything must be runnable on a single-node cluster.

  • Scripts shall be idempotent / re-runnable without fatal consequences. We will use k8s annotations to keep track of the state.

Releases

No releases published

Languages