Skip to content

dennyzhang/challenges-k8s-monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

1 Kubernets Monitoring

linkedin
github
slack


PRs Welcome

Blog URL: https://kubernetes.dennyzhang.com/challenges-k8s-monitoring, Category: concept

1.1 Summary

NameSummary
kube-state-metricAdd-on agent to generate and expose cluster-level metrics
cadvisora standalone container/node metrics collection and monitoring tool.
cluster eventsheapster, resource metrics
Metrics Defined By Metrics APIkubernets/pkg/apis/metrics/v1beta1/types.go
Heapsterk8s add-on
Kubernetes APIdoes not track metrics. But can get real time metrics

1.2 [#A] Questions

1.2.1 [#A] What things which compoents collect, and how? events, audit, cAdvisors, kube-state-metrics, metric server, api server

1.2.2 Why community want to switch from heapster to metric server?

  • Heapster serves the API using go http library which doesn’t offer a number of functionality that is offered by Kubernetes API server like authorization/authentication or client generation. (link)
  • Heapster is not compatible with Prometheus. It assumes that the data store is a bare time-series database and allows a direct write path to it. But Prometheus is a pull based model.
  • Heapster serves the API using go http library which doesn’t offer a number of functionality that is offered by Kubernetes API server like authorization/authentication or client generation. (link
  • Design problem makes heapster hard to maintain.

https://brancz.com/2018/01/05/prometheus-vs-heapster-vs-kubernetes-metrics-apis/

In Kubernetes 1.12, heapster will be tenatively removed. In 1.13, it will be moved to kubernetes-retired organization. (link)

Consider using metrics-server and a third party metrics pipeline to gather Prometheus-format metrics instead.

1.2.3 How I can send my pod/application metrics? Metric server won’t be a good idea?

1.2.4 Data storage for prometheus

1.3 k8s monitoring architecture

  • a per-node agent and a cluster-level aggregator (link)
  • Metrics: system metrics and service metrics. Futhermore, system metrics are divided into core metrics and non-core metrics. (link)

1.4 Metric Server

1.4.1 Basic Intro

  1. Metric server is sort of a stripped-down version of Heapster
  • The goal for the effort is to provide resource usage metrics for pods and nodes through the API server. (link)
  • It will be a cluster level component which periodically scrapes metrics from all Kubernetes nodes served by Kubelet through Summary API. Then metrics will be aggregated, stored in memory (see Scalability limitations) and served in Metrics API format. (link)

1.4.2 Design Goals

  • The data for a given set of pods (defined either by pod list or label selector) should be accessible in one request due to performance issues. (link)

1.4.3 Limitations

  • Metrics Server supports up to 30 pods per cluster node. (From link)
  • Assume to collect up to 10 metrics from each pod and node running in a cluster (From link)

1.4.4 TODO What is API aggregation layer in metric server?

https://github.com/kubernetes/apiserver

1.4.5 TODO try metric server in minikube

https://docs.giantswarm.io/guides/kubernetes-heapster/

http://192.168.99.102:30000/metrics

1.4.6 TODO How to query metric server manually

1.5 kube-state-metrics

  • Useful links
https://brancz.com/2017/11/13/kube-state-metrics-the-past-the-present-and-the-future/

1.6 heapster

Heapster is an add on to Kubernetes that collects and forwards both node, namespace, pod and container level metrics to one or more “sinks” (e.g. InfluxDB).

It also provides REST endpoints to gather those metrics. The metrics are constrained to CPU, filesystem, memory, network and uptime.

Heapster queries the kubelet for its data.

Today, heapster is the source of the time-series data for the Kubernetes Dashboard.

  • Useful links
https://brancz.com/2017/11/13/kube-state-metrics-the-past-the-present-and-the-future/

1.7 [#A] Prometheus

1.8 Cadvisor

Cadvisor monitors node and container core metrics in addition to container events. It natively provides a Prometheus metrics endpoint The Kubernetes kublet has an embedded Cadvisor that only exposes the metrics, not the events.

1.9 More Resources

License: Code is licensed under MIT License.

https://github.com/kubernetes-incubator/metrics-server

https://github.com/kubernetes-incubator/metrics-server/tree/master/deploy/1.8%2B

https://blog.freshtracks.io/what-is-the-the-new-kubernetes-metrics-server-849c16aa01f4

https://blog.outlyer.com/monitoring-kubernetes-with-heapster-and-prometheus

https://www.outcoldman.com/en/archive/2017/07/09/kubernetes-monitoring-resources/

https://banzaicloud.com/blog/prometheus-application-monitoring/

linkedin github slack

2 TODO Blog: What things which compoents collect, and how? events, audit, cAdvisors, kube-state-metrics, metric server, api server

https://github.com/GoogleCloudPlatform/click-to-deploy/blob/master/k8s/prometheus/resources/prometheus-grafana-architecture.png

2.1 basic usage

https://sematext.com/kubernetes/

Metrics Cluster Metrics aggregated over all nodes displayed in SPM overview Host / node level Metrics aggregated per node Pod level Metrics aggregated by pod name Docker Container level Metrics aggregated for a single container

2.2 TODO [#A] k8s events

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/events/event.go https://blog.appdynamics.com/product/monitoring-kubernetes-events/

By monitoring these events, our extension enables enterprises to troubleshoot everything that goes wrong in the Kubernetes orchestration platform—from scaling up/scaling down, new deployments, deleting applications, creating new applications, and so on. If an event goes to a warning state, users can drill down into the warning to see where it occurred, making troubleshooting easier.

kubernetes/kubernetes#36304 kubernetes/kubernetes#4796

kubectl get events -n wordpress

   /Users/zdenny  kubectl get events                                                                                                                                                                                                                          ✔ 0
No resources found.

   /Users/zdenny  kubectl get events -n wordpress                                                                                                                                                                                                             ✔ 0
LAST SEEN   FIRST SEEN   COUNT     NAME                                       KIND      SUBOBJECT   TYPE      REASON                 SOURCE               MESSAGE
10m         10m          1         prometheus-1-prometheus.155bc314b145766f   Service               Normal    Type                   service-controller   ClusterIP -> LoadBalancer
10m         10m          1         prometheus-1-prometheus.155bc314b146689c   Service               Normal    EnsuringLoadBalancer   service-controller   Ensuring load balancer
9m          9m           1         prometheus-1-prometheus.155bc31d53e2d53c   Service               Normal    EnsuredLoadBalancer    service-controller   Ensured load balancer

   /Users/zdenny  kubectl get events -n elasticsearch                                                                                                                                                                                                         ✔ 0
No resources found.

   /Users/zdenny  kubectl get events -n                                                                                                                                                                                                                       ✔ 0

   /Users/zdenny  kubectl get ns                                                                                                                                                                                                                              ✘ 1
NAME            STATUS    AGE
default         Active    23h
elasticsearch   Active    17h
kube-public     Active    23h
kube-system     Active    23h
wordpress       Active    23h

   /Users/zdenny  kubectl get events -n default                                                                                                                                                                                                               ✔ 0
No resources found.
/Users/mac/git_code/kubernets_community/kubernetes/:
find . \( -iname event\* \) -ls
88846683       16 -rw-r--r--    1 mac              staff                4382 Oct  8 22:37 api/swagger-spec/events.k8s.io.json
88846684      168 -rw-r--r--    1 mac              staff               82420 Oct  8 22:37 api/swagger-spec/events.k8s.io_v1beta1.json
88846944        8 -rw-r--r--    1 mac              staff                2253 Oct  8 22:37 cluster/addons/fluentd-gcp/event-exporter.yaml
88846952        8 -rw-r--r--    1 mac              staff                 427 Oct  8 22:37 cluster/addons/fluentd-gcp/podsecuritypolicies/event-exporter-psp-binding.yaml
88846953        8 -rw-r--r--    1 mac              staff                 356 Oct  8 22:37 cluster/addons/fluentd-gcp/podsecuritypolicies/event-exporter-psp-role.yaml
88846954        8 -rw-r--r--    1 mac              staff                1265 Oct  8 22:37 cluster/addons/fluentd-gcp/podsecuritypolicies/event-exporter-psp.yaml
88848125        0 drwxr-xr-x    3 mac              staff                 102 Oct  8 22:37 docs/api-reference/events.k8s.io
88848857        0 drwxr-xr-x    6 mac              staff                 204 Oct  8 22:37 pkg/api/events
88849333        8 -rw-r--r--    1 mac              staff                3909 Oct  8 22:37 pkg/apis/core/validation/events.go
88849334       24 -rw-r--r--    1 mac              staff                9846 Oct  8 22:37 pkg/apis/core/validation/events_test.go
88849338        0 drwxr-xr-x    7 mac              staff                 238 Oct  8 22:37 pkg/apis/events
88849763       16 -rw-r--r--    1 mac              staff                4689 Oct  8 22:37 pkg/client/clientset_generated/internalclientset/typed/core/internalversion/event.go
88849764       16 -rw-r--r--    1 mac              staff                7647 Oct  8 22:37 pkg/client/clientset_generated/internalclientset/typed/core/internalversion/event_expansion.go
88849807        0 drwxr-xr-x    3 mac              staff                 102 Oct  8 22:37 pkg/client/clientset_generated/internalclientset/typed/events
88849811        8 -rw-r--r--    1 mac              staff                2455 Oct  8 22:37 pkg/client/clientset_generated/internalclientset/typed/events/internalversion/events_client.go
88849984        8 -rw-r--r--    1 mac              staff                3544 Oct  8 22:37 pkg/client/informers/informers_generated/internalversion/core/internalversion/event.go
88850120        8 -rw-r--r--    1 mac              staff                2986 Oct  8 22:37 pkg/client/listers/core/internalversion/event.go
88850799        0 drwxr-xr-x    4 mac              staff                 136 Oct  8 22:37 pkg/controller/volume/events
88850801        8 -rw-r--r--    1 mac              staff                1269 Oct  8 22:37 pkg/controller/volume/events/event.go
88852078        0 drwxr-xr-x    4 mac              staff                 136 Oct  8 22:37 pkg/kubelet/events
88852080        8 -rw-r--r--    1 mac              staff                4032 Oct  8 22:37 pkg/kubelet/events/event.go
88852942        0 drwxr-xr-x    7 mac              staff                 238 Oct  8 22:37 pkg/registry/core/event
88853115        0 drwxr-xr-x    5 mac              staff                 170 Oct  8 22:37 pkg/registry/events
88853117        0 drwxr-xr-x    5 mac              staff                 170 Oct  8 22:37 pkg/registry/events/event
88854282        0 drwxr-xr-x   12 mac              staff                 408 Oct  8 22:37 plugin/pkg/admission/eventratelimit
88854287        0 drwxr-xr-x   11 mac              staff                 374 Oct  8 22:37 plugin/pkg/admission/eventratelimit/apis/eventratelimit
88854736        0 drwxr-xr-x    4 mac              staff                 136 Oct  8 22:37 staging/src/k8s.io/api/events
88856452        8 -rw-r--r--    1 mac              staff                1402 Oct  8 22:37 staging/src/k8s.io/apiserver/pkg/storage/etcd3/event.go
88856901        8 -rw-r--r--    1 mac              staff                3348 Oct  8 22:38 staging/src/k8s.io/client-go/informers/core/v1/event.go
88856915        0 drwxr-xr-x    5 mac              staff                 170 Oct  8 22:38 staging/src/k8s.io/client-go/informers/events
88856920        8 -rw-r--r--    1 mac              staff                3410 Oct  8 22:38 staging/src/k8s.io/client-go/informers/events/v1beta1/event.go
88857290       16 -rw-r--r--    1 mac              staff                4638 Oct  8 22:38 staging/src/k8s.io/client-go/kubernetes/typed/core/v1/event.go
88857291       16 -rw-r--r--    1 mac              staff                6661 Oct  8 22:38 staging/src/k8s.io/client-go/kubernetes/typed/core/v1/event_expansion.go
88857336        0 drwxr-xr-x    3 mac              staff                 102 Oct  8 22:38 staging/src/k8s.io/client-go/kubernetes/typed/events
88857340       16 -rw-r--r--    1 mac              staff                4705 Oct  8 22:38 staging/src/k8s.io/client-go/kubernetes/typed/events/v1beta1/event.go
88857341        8 -rw-r--r--    1 mac              staff                2490 Oct  8 22:38 staging/src/k8s.io/client-go/kubernetes/typed/events/v1beta1/events_client.go
88857639        8 -rw-r--r--    1 mac              staff                2938 Oct  8 22:38 staging/src/k8s.io/client-go/listers/core/v1/event.go
88857656        0 drwxr-xr-x    3 mac              staff                 102 Oct  8 22:38 staging/src/k8s.io/client-go/listers/events
88857659        8 -rw-r--r--    1 mac              staff                3005 Oct  8 22:38 staging/src/k8s.io/client-go/listers/events/v1beta1/event.go
88858013       32 -rw-r--r--    1 mac              staff               12492 Oct  8 22:38 staging/src/k8s.io/client-go/tools/record/event.go
88858014       56 -rw-r--r--    1 mac              staff               28106 Oct  8 22:38 staging/src/k8s.io/client-go/tools/record/event_test.go
88858015       32 -rw-r--r--    1 mac              staff               15353 Oct  8 22:38 staging/src/k8s.io/client-go/tools/record/events_cache.go
88858016       24 -rw-r--r--    1 mac              staff               10647 Oct  8 22:38 staging/src/k8s.io/client-go/tools/record/events_cache_test.go
88859610       16 -rw-r--r--    1 mac              staff                4865 Oct  8 22:38 test/e2e/common/events.go
88859797       16 -rw-r--r--    1 mac              staff                4429 Oct  8 22:38 test/e2e/node/events.go
88859820        8 -rw-r--r--    1 mac              staff                1188 Oct  8 22:38 test/e2e/scheduling/events.go
88861342        8 -rw-r--r--    1 mac              staff                1237 Oct  8 22:38 vendor/github.com/Azure/go-ansiterm/event_handler.go
88862868        8 -rw-r--r--    1 mac              staff                1762 Oct  8 22:38 vendor/github.com/coreos/etcd/store/event.go
88862869        8 -rw-r--r--    1 mac              staff                3012 Oct  8 22:38 vendor/github.com/coreos/etcd/store/event_history.go
88862870        8 -rw-r--r--    1 mac              staff                 925 Oct  8 22:38 vendor/github.com/coreos/etcd/store/event_queue.go
88863102        0 drwxr-xr-x    4 mac              staff                 136 Oct  8 22:38 vendor/github.com/docker/docker/api/types/events
88863104        8 -rw-r--r--    1 mac              staff                1767 Oct  8 22:38 vendor/github.com/docker/docker/api/types/events/events.go
88863211        8 -rw-r--r--    1 mac              staff                2224 Oct  8 22:38 vendor/github.com/docker/docker/client/events.go
88864097        0 drwxr-xr-x    4 mac              staff                 136 Oct  8 22:38 vendor/github.com/google/cadvisor/events
88866315       16 -rw-r--r--    1 mac              staff                4548 Oct  8 22:38 vendor/github.com/storageos/go-api/event.go
88866348        8 -rw-r--r--    1 mac              staff                1788 Oct  8 22:38 vendor/github.com/storageos/go-api/types/events.go
88866697       24 -rw-r--r--    1 mac              staff                8314 Oct  8 22:38 vendor/github.com/vmware/govmomi/simulator/esx/event_manager.go
88866709       24 -rw-r--r--    1 mac              staff               10076 Oct  8 22:38 vendor/github.com/vmware/govmomi/simulator/event_manager.go
88866892       24 -rw-r--r--    1 mac              staff               11104 Oct  8 22:38 vendor/github.com/xanzy/go-cloudstack/cloudstack/EventService.go
88867181       32 -rw-r--r--    1 mac              staff               12549 Oct  8 22:38 vendor/golang.org/x/net/trace/events.go
88867464        8 -rw-r--r--    1 mac              staff                 824 Oct  8 22:38 vendor/golang.org/x/sys/windows/eventlog.go
88867482        8 -rw-r--r--    1 mac              staff                 979 Oct  8 22:38 vendor/golang.org/x/sys/windows/svc/event.go

find finished at Mon Oct  8 22:40:15

2.3 HALF audit

https://kubernetes.io/docs/tasks/debug-application-cluster/audit/ Each request on each stage of its execution generates an event, which is then pre-processed according to a certain policy and written to a backend

2.4 GKE 1.11

➜  raas-secrets git:(master) kubectl get pods --all-namespaces
NAMESPACE       NAME                                                  READY     STATUS    RESTARTS   AGE
elasticsearch   elasticsearch-1-elasticsearch-0                       1/1       Running   0          13h
elasticsearch   elasticsearch-1-elasticsearch-1                       1/1       Running   0          13h
kube-system     event-exporter-v0.2.1-5f5b89fcc8-vvtpc                2/2       Running   0          1d
kube-system     fluentd-gcp-scaler-7c5db745fc-zmpvl                   1/1       Running   0          1d
kube-system     fluentd-gcp-v3.1.0-79lc9                              2/2       Running   0          1d
kube-system     fluentd-gcp-v3.1.0-f9nmh                              2/2       Running   0          1d
kube-system     fluentd-gcp-v3.1.0-fx7w4                              2/2       Running   0          1d
kube-system     fluentd-gcp-v3.1.0-xcgpz                              2/2       Running   0          1d
kube-system     fluentd-gcp-v3.1.0-zbfxx                              2/2       Running   0          1d
kube-system     fluentd-gcp-v3.1.0-zlxvh                              2/2       Running   0          1d
kube-system     heapster-v1.5.3-85b85f4fbf-w2lfb                      3/3       Running   0          1d
kube-system     kube-dns-788979dc8f-fb2hp                             4/4       Running   0          1d
kube-system     kube-dns-788979dc8f-qs782                             4/4       Running   0          1d
kube-system     kube-dns-autoscaler-79b4b844b9-6858t                  1/1       Running   0          1d
kube-system     kube-proxy-gke-cluster-1-default-pool-36da1c6a-4356   1/1       Running   0          1d
kube-system     kube-proxy-gke-cluster-1-default-pool-36da1c6a-6wx8   1/1       Running   0          1d
kube-system     kube-proxy-gke-cluster-1-default-pool-36da1c6a-rbxc   1/1       Running   0          1d
kube-system     kube-proxy-gke-cluster-1-default-pool-36da1c6a-skkd   1/1       Running   0          1d
kube-system     kube-proxy-gke-cluster-1-pool-1-e95a10b3-5gl5         1/1       Running   0          1d
kube-system     kube-proxy-gke-cluster-1-pool-1-e95a10b3-jx2r         1/1       Running   0          1d
kube-system     l7-default-backend-5d5b9874d5-89xj5                   1/1       Running   0          1d
kube-system     metrics-server-v0.2.1-7486f5bd67-c6fxz                2/2       Running   0          1d

2.5 kube-state-metrics: Add-on agent to generate and expose cluster-level metrics.

https://github.com/kubernetes/kube-state-metrics

2.7 TODO events vs audit

2.8 cAdvisors -> kube-state-metrics->metric server->api server

2.9 heapster -> metric server

2.10 node exporter

2.11 TODO cAdvisors: http://localhost:8080/containers/

2.12 Controller manager metrics

https://kubernetes.io/docs/concepts/cluster-administration/controller-metrics/

2.13 useful link

https://sematext.com/kubernetes/ https://www.datadoghq.com/blog/how-to-collect-and-graph-kubernetes-metrics/ https://blog.freshtracks.io/what-is-the-the-new-kubernetes-metrics-server-849c16aa01f4

2.14 Kubernetes Integration: wavefront

https://docs.wavefront.com/kubernetes.html