Skip to content

davidB/sandbox_axum_observability

Repository files navigation

sandbox_axum_observability

!!! WIP !!!

Sandbox I used to experiment [axum] and observability (for target platform), observability via infra (as most as possible). The stack and framework selected:

App (Rust http service)

The setup of the app (microservice) defined under /app. The Goals of the app

  • Use axum, async api,...
  • Delegate collect of metrics, logs,... to the infra as much as possible (eg http status, rps, ...)
  • Try to be a cloud native app, follow 12 factor app recommendation via:
    • Configuration dependent of the platform / stack override via Environment variable (use clap)
    • Health-check via a GET /health endpoint
    • Log printed on std output, in json format
    • Log include trace_id to easily link response, log and trace
      • on first log of the span, when incoming request has trace_id
      • on following log of the span, when incoming request has trace_id
      • on first log of the span, when incoming request has NO trace_id (imply start an new one)
      • on following log of the span, when incoming request has trace_id
  • To simulate a multi-level microservice architecture, the service can call APP_REMOTE_URL (to define as it-self in the infra)
  • Provide a endpoint GET /depth/{:depth} that wait a duration then call endpoint defined by APP_REMOTE_URL with the path parameter depth equals to current depth - 1
    • depth: value between 0 and 10, if undefined a random value will be used.
    • duration_level_max: duration in seconds, if undefined a random between 0.0 and 2.0
    • the response of APP_REMOTE_URL is returned as wrapped response
    • if depth is 0, then it returns the { "trace_id": "...."}
    • if failure, then it returns the { "err_trace_id": "...."}
    • call GET / is like calling GET /depth/{:depth} with a random depth between 0 and 10
  • To simulate error
    • GET /health can failed randomly via configuration APP_HEALTH_FAILURE_PROBABILITY (value between 0.0 and 1.0)
    • GET /depth/{} can failed randomly via query parameter failure_probability (value between 0.0 and 1.0)
  • add test to validate and to demo feature above

Main components for the app

Usage on local shell

Launch the server

cd app
cargo run

Send http request from a curl client

# without client trace
# FIXME the log on the server include an empty trace_id
❯ curl -i "http://localhost:8080/depth/0"
HTTP/1.1 200 OK
content-type: application/json
content-length: 67
access-control-allow-origin: *
vary: origin
vary: access-control-request-method
vary: access-control-request-headers
date: Sat, 21 May 2022 15:35:32 GMT

{"simulation":"DONE","trace_id":"522e44c536fec8020790c59f20560d1a"}⏎

# with client trace
# for traceparent see [Trace Context](https://www.w3.org/TR/trace-context/#trace-context-http-headers-format)
❯ curl -i "http://localhost:8080/depth/2" -H 'traceparent: 00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-00'
HTTP/1.1 200 OK
content-type: application/json
content-length: 113
access-control-allow-origin: *
vary: origin
vary: access-control-request-method
vary: access-control-request-headers
date: Sat, 21 May 2022 15:33:54 GMT

{"depth":2,"response":{"depth":1,"response":{"simulation":"DONE","trace_id":"0af7651916cd43dd8448eb211c80319c"}}}⏎

on jaeger web ui, service example-opentelemetry should be listed and trace should be like

trace in jaeger

direct to Jaeger

Launch a local jaeger (nased on Jaeger > Getting Started > All in One)

## docker cli can be used instead of nerdctl
## to start jaeger (and auto remove on stop)
(nerdctl run --name jaeger --rm
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411
  -e COLLECTOR_OTLP_ENABLED=true
  -p 6831:6831/udp
  -p 6832:6832/udp
  -p 5778:5778
  -p 16686:16686
  -p 4317:4317
  -p 4318:4318
  -p 14250:14250
  -p 14268:14268
  -p 14269:14269
  -p 9411:9411
  jaegertracing/all-in-one:1.38
)

## Ctrl-C to stop it

open Jaeger web UI

Configure the exporter via environment variable sdk-environment-variables

# replace let-env by "export" for bash...
let-env OTEL_EXPORTER_OTLP_PROTOCOL = "grpc"

# send trace via jaeger protocol to local jaeger (agent)
cargo run -- --tracing-collector-kind jaeger

Infra

Kubernetes

The setup of the infrastructure (cluster) defined under /infra/kubernetes.

  • Try to be more like a target / live environment, so it requires more resources on local than using "local dev approach":
    • use distributed solution (loki, tempo,...)
    • use S3 backend (minio).
  • no ingress or api gateway setup, access will be via port forward
  • use wrapper/adapter helm chart to install components, like if it is deployed by a gitops (pull mode) system
  • keep components separated to allow partial reuse and to identify integration point

Main components for the infra

Infra setup

Required:

  • kubectl, helm v3 : to manage the k8s cluster

Optional:

  • nushell: to use tools.nu and to avoid too many manual commands
  • Lens / OpenLens / k9s / your favorite UI: to explore states of k8s cluster
# launch nushell
nu
# after launch of your local (or remote) cluster, configure kubectl to access it as current context
cd infra/kubernetes
use tools.nu
tools install_all_charts
# to uninstall stuff ;-)
tools uninstall_all_charts
# to have the list of subcommand
tools <tab>
  • manual creation of loki bucket

sample list of components

❯ kubectl get service -A
NAMESPACE                 NAME                                             TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                                                   AGE
default                   kubernetes                                       ClusterIP      10.43.0.1       <none>         443/TCP                                                   95m
kube-system               kube-dns                                         ClusterIP      10.43.0.10      <none>         53/UDP,53/TCP,9153/TCP                                    95m
kube-system               metrics-server                                   ClusterIP      10.43.155.46    <none>         443/TCP                                                   95m
kube-system               traefik                                          LoadBalancer   10.43.205.20    192.168.5.15   80:30073/TCP,443:31505/TCP                                95m
cert-manager              cert-manager-webhook                             ClusterIP      10.43.208.146   <none>         443/TCP                                                   94m
cert-manager              cert-manager                                     ClusterIP      10.43.60.191    <none>         9402/TCP                                                  94m
minio                     minio                                            ClusterIP      10.43.19.151    <none>         9000/TCP                                                  94m
grafana                   grafana                                          ClusterIP      10.43.171.106   <none>         80/TCP                                                    93m
kube-system               kube-prometheus-stack-kube-scheduler             ClusterIP      None            <none>         10251/TCP                                                 93m
kube-system               kube-prometheus-stack-coredns                    ClusterIP      None            <none>         9153/TCP                                                  93m
kube-system               kube-prometheus-stack-kube-proxy                 ClusterIP      None            <none>         10249/TCP                                                 93m
kube-system               kube-prometheus-stack-kube-controller-manager    ClusterIP      None            <none>         10257/TCP                                                 93m
kube-system               kube-prometheus-stack-kube-etcd                  ClusterIP      None            <none>         2379/TCP                                                  93m
kube-prometheus-stack     kube-prometheus-stack-alertmanager               ClusterIP      10.43.114.25    <none>         9093/TCP                                                  93m
kube-prometheus-stack     kube-prometheus-stack-operator                   ClusterIP      10.43.244.229   <none>         443/TCP                                                   93m
kube-prometheus-stack     kube-prometheus-stack-prometheus-node-exporter   ClusterIP      10.43.173.60    <none>         9100/TCP                                                  93m
kube-prometheus-stack     kube-prometheus-stack-kube-state-metrics         ClusterIP      10.43.147.90    <none>         8080/TCP                                                  93m
kube-prometheus-stack     kube-prometheus-stack-prometheus                 ClusterIP      10.43.139.178   <none>         9090/TCP                                                  93m
kube-system               kube-prometheus-stack-kubelet                    ClusterIP      None            <none>         10250/TCP,10255/TCP,4194/TCP                              93m
loki-distributed          loki-distributed-memberlist                      ClusterIP      None            <none>         7946/TCP                                                  93m
loki-distributed          loki-distributed-ingester-headless               ClusterIP      None            <none>         3100/TCP,9095/TCP                                         93m
loki-distributed          loki-distributed-query-frontend                  ClusterIP      None            <none>         3100/TCP,9095/TCP,9096/TCP                                93m
loki-distributed          loki-distributed-querier-headless                ClusterIP      None            <none>         3100/TCP,9095/TCP                                         93m
loki-distributed          loki-distributed-distributor                     ClusterIP      10.43.235.183   <none>         3100/TCP,9095/TCP                                         93m
loki-distributed          loki-distributed-querier                         ClusterIP      10.43.35.214    <none>         3100/TCP,9095/TCP                                         93m
loki-distributed          loki-distributed-gateway                         ClusterIP      10.43.245.76    <none>         80/TCP                                                    93m
loki-distributed          loki-distributed-ingester                        ClusterIP      10.43.168.198   <none>         3100/TCP,9095/TCP                                         93m
tempo-distributed         tempo-distributed-gossip-ring                    ClusterIP      None            <none>         7946/TCP                                                  93m
tempo-distributed         tempo-distributed-query-frontend-discovery       ClusterIP      None            <none>         3100/TCP,9095/TCP,16686/TCP,16687/TCP                     93m
tempo-distributed         tempo-distributed-query-frontend                 ClusterIP      10.43.85.84     <none>         3100/TCP,9095/TCP,16686/TCP,16687/TCP                     93m
tempo-distributed         tempo-distributed-ingester                       ClusterIP      10.43.242.5     <none>         3100/TCP,9095/TCP                                         93m
tempo-distributed         tempo-distributed-querier                        ClusterIP      10.43.20.61     <none>         3100/TCP,9095/TCP                                         93m
tempo-distributed         tempo-distributed-distributor                    ClusterIP      10.43.13.183    <none>         3100/TCP,9095/TCP,4317/TCP,55680/TCP                      93m
tempo-distributed         tempo-distributed-memcached                      ClusterIP      10.43.106.141   <none>         11211/TCP,9150/TCP                                        93m
tempo-distributed         tempo-distributed-compactor                      ClusterIP      10.43.10.39     <none>         3100/TCP                                                  93m
tempo-distributed         tempo-distributed-metrics-generator              ClusterIP      10.43.129.131   <none>         9095/TCP,3100/TCP                                         93m
opentelemetry-collector   opentelemetry-collector                          ClusterIP      10.43.15.153    <none>         6831/UDP,14250/TCP,14268/TCP,4317/TCP,4318/TCP,9411/TCP   93m
linkerd                   linkerd-dst                                      ClusterIP      10.43.126.243   <none>         8086/TCP                                                  92m
linkerd                   linkerd-dst-headless                             ClusterIP      None            <none>         8086/TCP                                                  92m
linkerd                   linkerd-sp-validator                             ClusterIP      10.43.41.57     <none>         443/TCP                                                   92m
linkerd                   linkerd-policy                                   ClusterIP      None            <none>         8090/TCP                                                  92m
linkerd                   linkerd-policy-validator                         ClusterIP      10.43.225.36    <none>         443/TCP                                                   92m
linkerd                   linkerd-identity                                 ClusterIP      10.43.136.50    <none>         8080/TCP                                                  92m
linkerd                   linkerd-identity-headless                        ClusterIP      None            <none>         8080/TCP                                                  92m
linkerd                   linkerd-proxy-injector                           ClusterIP      10.43.51.211    <none>         443/TCP                                                   92m
app                       app                                              ClusterIP      10.43.108.47    <none>         80/TCP                                                    91m
linkerd-viz               metrics-api                                      ClusterIP      10.43.179.165   <none>         8085/TCP                                                  67m
linkerd-viz               tap-injector                                     ClusterIP      10.43.71.201    <none>         443/TCP                                                   67m
linkerd-viz               tap                                              ClusterIP      10.43.191.138   <none>         8088/TCP,443/TCP                                          67m
linkerd-viz               web                                              ClusterIP      10.43.18.39     <none>         8084/TCP,9994/TCP                                         67m
linkerd-jaeger            jaeger-injector                                  ClusterIP      10.43.72.101    <none>         443/TCP                                                   67m

Use port forward to access UI and service

# access grafana UI on http://127.0.0.1:8040
kubectl port-forward -n grafana service/grafana 8040:80

# access grafana UI on http://127.0.0.1:9009 (user/pass: minio/minio123)
kubectl port-forward -n minio service/minio 9009:9000

# access linerd-viz UI on http://127.0.0.1:8084
kubectl port-forward -n linkerd-viz service/web 8084:8084

# On rancher-desktop only
# access traefik dashboard on http://127.0.0.1:9000/dashboard/#/
bash -c 'kubectl port-forward -n kube-system $(kubectl -n kube-system get pods --selector "app.kubernetes.io/name=traefik" --output=name) 9000:9000'

Setup the app and call it

kubectl port-forward -n app service/app 8080:80
curl -i "http://localhost:8080/depth/2"

log+loki+grafana

trace+tempo+grafana

node+tempo+grafana

But when using port-forward request doesn't go through linkerd proxy (so no monitoring of route,...) (see port-forward traffic skips the proxy · Issue #2352 · linkerd/linkerd2) So If you don't have ingress setup,... you send request from inside the cluster:

kubectl run tmp-shell -n default --restart=Never --rm -i -tty --image curlimages/curl:7.84.0 -- curl -L -v http://app.app.svc.cluster.local/depth/3


# Or via an interactive shell if you want
kubectl run tmp-shell -n default --restart=Never --rm -i --tty --image curlimages/curl:7.84.0 -- sh
> curl -L -v http://app.app.svc.cluster.local/depth/3
...
> exit

kubectl delete pod tmp-shell -n default

linkerd_route

pod_usage

Sample list of other dashboards pod_usage

Links & inspiration