Skip to content
This repository has been archived by the owner on Jul 1, 2023. It is now read-only.

Latest commit

 

History

History
922 lines (797 loc) · 26.4 KB

monitoring-5.x.md

File metadata and controls

922 lines (797 loc) · 26.4 KB

Gravity Monitoring & Alerts (for Gravity 5.5 and earlier)

Prerequisites

Docker 101, Kubernetes 101, Gravity 101.

Introduction

Note: This part of the training pertains to Gravity 5.5 and earlier.

Gravity Clusters come with a fully configured and customizable monitoring and alerting systems by default. The system consists of various components, which are automatically included into a Cluster Image that is built with a single command tele build.

Overview

Before getting into Gravity’s monitoring and alerts capability in more detail, let’s first discuss the various components that are involved.

There are 4 main components in the monitoring system: InfluxDB, Heapster, Grafana, and Kapacitor.

InfluxDB

Is an open source time series database which is used for the main data store for monitoring time series data. Provides the Kubernetes service influxdb.monitoring.svc.cluster.local.

Heapster

Monitors Kubernetes components in generating a collection of not only performance metrics about workloads, nodes, and pods, but also events generated by Clusters. The statistics captured are reported to InfluxDB.

Grafana

Is an open source metrics suite which provides the dashboard in the Gravity monitoring and alerts system. The dashboard provides a visual to the information stored in InfluxDB, which is exposed as the service grafana.monitoring.svc.cluster.local. Credentials generated are placed into a secret grafana in the monitoring namespace

Gravity is shipped with 2 pre-configured dashboards providing a visual of machine and pod-level overview of the installed cluster. Within the Gravity control panel, you can access the dashboard by navigating to the Monitoring page.

By default, Grafana is running in anonymous read-only mode. Anyone who logs into Gravity can view but not modify the dashboards.

Kapacitor

Is the data processing engine for InfluxDB, which streams data from InfluxDB and sends alerts to the end user exposed as the service kapacitor.monitoring.svc.cluster.local.

Metrics Overview

All monitoring components are running in the “monitoring” namespace in Gravity. Let’s take a look at them:

$ kubectl -nmonitoring get pods
NAME                         READY   STATUS    RESTARTS   AGE
grafana-8cb94d5dc-6dc2h      2/2     Running   0          10m
heapster-57fbfbbc7-9xtm6     1/1     Running   0          10m
influxdb-599c5f5c45-6hqmc    2/2     Running   1          10m
kapacitor-68f6d76878-8m26x   3/3     Running   0          10m
telegraf-75487b79bd-ptvzd    1/1     Running   0          10m
telegraf-node-master-x9v48   1/1     Running   0          10m

Most of the cluster metrics are collected by Heapster. Heapster runs as a part of a Deployment and collects metrics from the cluster nodes and persists them into the configured “sinks”.

The Heapster pod collects metrics from kubelets running on the cluster nodes, which in turn queries the data from cAdvisors - a container resource usage collector integrated into kubelet that supports Docker containers natively. cAdvisor agent running on a node discovers all running containers and collects their CPU, memory, filesystem and network usage statistics.

Both of these collectors operate on their own intervals - kubelet queries cAdvisor every 15 seconds, while Heapster scrapes metrics from all kubelets every minute.

Heapster by itself does not store any data - instead, it ships all scraped metrics to the configured sinks. In Gravity clusters the sink is an InfluxDB database that is deployed as a part of the monitoring application.

All metrics collected by Heapster are placed into the k8s database in InfluxDB. In InfluxDB the data is organized into "measurements". A measurement acts as a container for "fields" and a few other things. Applying a very rough analogy with relational databases, a measurement can be thought of as a "table" whereas the fields are "columns" of the table. In addition, each measurement can have tags attached to it which can be used to add various metadata to the data.

Each metric is stored as a separate “series” in InfluxDB. A series in InfluxDB is the collection of data that share a retention policy, a measurement and a tag set. Heapster tags each metrics with different labels, such as host name, pod name, container name and others, which become “tags” on the stored series. Tags are indexed so queries on tags are fast.

When troubleshooting problems with metrics, it is sometimes useful to look into the Heapster container logs where it can be seen if it experiences communication issues with InfluxDB service or has other issues:

$ kubectl -nmonitoring logs heapster-57fbfbbc7-9xtm6

In addition, any other apps that collect metrics should also submit them into the same DB in order for proper retention policies to be enforced.

Exploring InfluxDB

Like mentioned above, InfluxDB is exposed via a cluster-local Kubernetes service influxdb.monitoring.svc.cluster.local and serves its HTTP API on port 8086 so we can use it to explore the database from the CLI.

Let's enter the Gravity master container to make sure the services are resolvable and to get access to additional CLI tools:

$ sudo gravity shell

Let's ping the database to make sure it's up and running:

$ curl -sl -I http://influxdb.monitoring.svc.cluster.local:8086/ping
// Should return 204 response.

InfluxDB API endpoint requires authentication so to make actual queries to the database we need to determine the credentials first. The generated credentials are kept in the influxdb secret in the monitoring namespace:

$ kubectl -nmonitoring get secrets/influxdb -oyaml

Note that the credentials in the secret are base64-encoded so you'd need to decode them:

$ echo <encoded-password> | base64 -d
$ export PASS=xxx

Once the credentials have been decoded (the username is root and the password is generated during installation), they can be supplied via a cURL command. For example, let's see what databases we currently have:

$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query --data-urlencode 'q=show databases' | jq

Now we can also see which measurements are currently being collected:

$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=show measurements' | jq

Finally, we can query specific metrics if we want to using InfluxDB's SQL-like query language:

$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=select * from uptime limit 10' | jq

Refer to the InfluxDB API documentation if you want to learn more about querying the database.

Metric Retention Policy & Rollups

Let's now talk about durations the measurements are stored for. During initial installation Gravity pre-configures InfluxDB with the following retention policies:

  • default = 24 hours - is used for high precision metrics.
  • medium = 4 weeks - is used for medium precision metrics.
  • long = 52 weeks - keeps metrics aggregated over even larger intervals.

We can use the same InfluxDB API to see the retention policies configured in the database:

$ curl -s -u root:$PASS http://influxdb.monitoring.svc.cluster.local:8086/query?db=k8s --data-urlencode 'q=show retention policies' | jq

All metrics sent to InfluxDB by Heapster are saved using the default retention policy which means that all the high-resolution metrics collected are kept intact for 24 hours.

To provide historical overview some of the most commonly helpful metrics (such as CPU/memory usage, network transfer rates) are rolled up to lower resolutions and stored using the longer retention policies mentioned above.

In order to provide such downsampled metrics, Gravity uses InfluxDB “continuous queries” which are programmed to run automatically and aggregate metrics over a certain interval.

The Gravity monitoring system allows two types of rollup configurations for collecting metrics:

  • medium = aggregates data over 5 minute intervals
  • long = aggregates data over 1 hour intervals

Each of the two rollups mentioned above, continue to their respective retention policy following. For example the long rollup aggregates data over 1 hour interval and goes into the long retention policy.

Preconfigured rollups that Gravity clusters come with are stored in the rollups-default config map in the monitoring namespace:

$ kubectl -nmonitoring get configmaps/rollups-default -oyaml

The configuration of retention policies and rollups is handled by a “watcher” service that runs in a container as a part of the InfluxDB pod so all these configurations can be seen in its logs:

$ kubectl -nmonitoring logs influxdb-599c5f5c45-6hqmc watcher

Custom Rollups

In addition to the rollups pre-configured by Gravity, applications can downsample their own metrics (or create different rollups for standard metrics) by configuring their own rollups through ConfigMaps.

Custom rollup ConfigMaps should be created in the monitoring namespace and assigned a monitoring label with value of rollup.

An example ConfigMap is shown below with a Custom Metric Rollups:

apiVersion: v1
kind: ConfigMap
metadata:
  name: myrollups
  namespace: monitoring
  labels:
    monitoring: rollup
data:
  rollups: |
    [
      {
        "retention": "medium",
        "measurement": "cpu/usage_rate",
        "name": "cpu/usage_rate/medium",
        "functions": [
          {
            "function": "max",
            "field": "value",
            "alias": "value_max"
          },
          {
            "function": "mean",
            "field": "value",
            "alias": "value_mean"
          }
        ]
      }
    ]

The watcher process will detect the new ConfigMap and configure an appropriate continuous query for the new rollup:

$ kubectl -nmonitoring logs influxdb-599c5f5c45-6hqmc watcher
...
time="2020-01-24T05:40:13Z" level=info msg="Detected event ADDED for configmap \"myrollups\"" label="monitoring in (rollup)" watch=configmap
time="2020-01-24T05:40:13Z" level=info msg="New rollup." query="create continuous query \"cpu/usage_rate/medium\" on k8s begin select max(\"value\") as value_max, mean(\"value\") as value_mean into k8s.\"medium\".\"cpu/usage_rate/medium\" from k8s.\"default\".\"cpu/usage_rate\" group by *, time(5m) end"

Custom Dashboards

Along with the dashboards mentioned above, your applications can use their own Grafana dashboards by using ConfigMaps.

Similar to creating custom rollups, in order to use a custom dashboard, the ConfigMap should be created in the monitoring namespace, assigned a monitoring label with a value dashboard.

Under the specified namespace, the ConfigMap will be recognized and loaded when installing the application. It is possible to add new ConfigMaps at a later time as the watcher will then pick it up and create it in Grafana. Similarly, if you delete the ConfigMap, the watcher will delete it from Grafana.

Dashboard ConfigMaps may contain multiple keys with dashboards as key names are not relevant.

An example ConfigMap is shown below:

apiVersion: v1
kind: ConfigMap
metadata:
  name: mydashboard
  namespace: monitoring
  labels:
    monitoring: dashboard
data:
  mydashboard: |
    { ... dashboard JSON ... }

Note: by default Grafana is run in read-only mode, a separate Grafana instance is required to create custom dashboards.

Default Metrics

The following are the default metrics captured by the Gravity Monitoring & Alerts system:

Heapster Metrics

Below are a list of metrics captured by Heapster which are exported to the backend:

Metric Name Description
cpu limit CPU hard limit in millicores.
node_capacity CPU capacity of a node.
node_allocatable CPU allocatable of a node.
node_reservation Share of CPU that is reserved on the node allocatable.
node_utilization CPU utilization as a share of node allocatable.
request CPU request (the guaranteed amount of resources) in millicores.
usage Cumulative amount of consumed CPU time on all cores in nanoseconds.
usage_rate CPU usage on all cores in millicores.
load CPU load in milliloads, i.e., runnable threads * 1000
ephemeral_storage limit Local ephemeral storage hard limit in bytes.
request Local ephemeral storage request (the guaranteed amount of resources) in bytes.
usage Total local ephemeral storage usage.
node_capacity Local ephemeral storage capacity of a node.
node_allocatable Local ephemeral storage allocatable of a node.
node_reservation Share of local ephemeral storage that is reserved on the node allocatable.
node_utilization Local ephemeral utilization as a share of ephemeral storage allocatable.
filesystem usage Total number of bytes consumed on a filesystem.
limit The total size of filesystem in bytes.
available The number of available bytes remaining in a the filesystem
inodes The number of available inodes in a the filesystem
inodes_free The number of free inodes remaining in a the filesystem
disk io_read_bytes Number of bytes read from a disk partition
io_write_bytes Number of bytes written to a disk partition
io_read_bytes_rate Number of bytes read from a disk partition per second
io_write_bytes_rate Number of bytes written to a disk partition per second
memory limit Memory hard limit in bytes.
major_page_faults Number of major page faults.
major_page_faults_rate Number of major page faults per second.
node_capacity Memory capacity of a node.
node_allocatable Memory allocatable of a node.
node_reservation Share of memory that is reserved on the node allocatable.
node_utilization Memory utilization as a share of memory allocatable.
page_faults Number of page faults.
page_faults_rate Number of page faults per second.
request Memory request (the guaranteed amount of resources) in bytes.
usage Total memory usage.
cache Cache memory usage.
rss RSS memory usage.
working_set Total working set usage. Working set is the memory being used and not easily dropped by the kernel.
accelerator memory_total Memory capacity of an accelerator.
memory_used Memory used of an accelerator.
duty_cycle Duty cycle of an accelerator.
request Number of accelerator devices requested by container.
network rx Cumulative number of bytes received over the network.
rx_errors Cumulative number of errors while receiving over the network.
rx_errors_rate Number of errors while receiving over the network per second.
rx_rate Number of bytes received over the network per second.
tx Cumulative number of bytes sent over the network
tx_errors Cumulative number of errors while sending over the network
tx_errors_rate Number of errors while sending over the network
tx_rate Number of bytes sent over the network per second.
uptime - Number of milliseconds since the container was started.

Satellite

Satellite is an open-source tool prepared by Gravitational that collects health information related to the Kubernetes cluster. Satellite runs on each Gravity Cluster node and has various checks assessing the health of a Cluster.

Satellite collects several metrics related to cluster health and exposes them over the Prometheus endpoint. Among the metrics collected by Satellite are:

  • Etcd related metrics:
    • Current leader address
    • Etcd cluster health
  • Docker related metrics:
    • Overall health of the Docker daemon
  • Sysctl related metrics:
    • Status of IPv4 forwarding
    • Status of netfilter
  • Systemd related metrics:
    • State of various systemd units such as etcd, flannel, kube-*, etc.

Telegraf

The nodes also run Telegraf - an agent for collecting, processing, aggregating, and writing metrics. Some system input plugins related to cpu and memory are captured as default metrics as well.

Metric Name Description
load1 (float) Warning threshold for load over 1 min
load15 (float) Warning threshold for load over 15 mins
load5 (float) Warning threshold for load over 5 mins
n_users (integer) Number of users
n_cpus (integer) Number of CPU cores
uptime (integer, seconds) Number of milliseconds since the system was started

In addition to the default metrics, Telegraf also queries the Satellite Prometheus endpoint described above and ships all metrics to the same “k8s” database in InfluxDB.

Telegraf configuration can be found here. The respective configuration files show which input plugins each Telegraf instance has enabled.

More about Kapacitor

As mentioned Kapacitor is the alerting system that streams data from InfluxDB and handles alerts sent to users. Kapacitor can also be configured to send email alerts, or customized with other alerts.

The following are alerts that Gravity Monitoring & Alerts system ships with by default:

Component Alert Description
CPU High CPU usage Warning at > 75% used

Critical error at > 90% used

Memory High Memory usage Warning at > 80% used

Critical error at > 90% used

Systemd Individual Error when unit not loaded/active
Overall systemd health Error when systemd detects a failed service
Filesystem High disk space usage Warning at > 80% used

Critical error at > 90% used

High inode usage Warning at > 90% used

Critical error at > 95% used

System Uptime Warning node uptime < 5 mins
Kernel params Error if param not set
Etcd Etcd instance health Error when etcd master down > 5 mins
Etcd latency check Warning when follower <-> leader latency > 500 ms

Error when > 1 sec over period of 1 min

Docker Docker daemon health Error when docker daemon is down
InfluxDB InfluxDB instance health Error when InfluxDB is inaccessible
Kubernetes Kubernetes node readiness Error when the node is not ready

Kapacitor Email Configuration

In order to configure email alerts via Kapacitor you will need to create Gravity resources of type smtp and alerttarget.

An example of the configuration is shown below:

kind: smtp
version: v2
metadata:
  name: smtp
spec:
  host: smtp.host
  port: <smtp port> # 465 by default
  username: <username>
  password: <password>
---
kind: alerttarget
version: v2
metadata:
  name: email-alerts
spec:
  email: triage@example.com # Email address of the alert recipient

Creating these resources will accordingly update and reload Kapacitor configuration:

$ gravity resource create -f smtp.yaml

In order to view the current SMTP settings or alert target:

$ gravity resource get smtp
$ gravity resource get alerttarget

Only a single alert target can be configured. To remove the current alert target, you can execute the following kapacitor command inside the designated pod:

$ kapacitor delete alerttarget email-alerts 

Testing Kapacitor Email Configuration

To test a Kapacitor SMTP configuration you can execute the following:

$ kubectl exec -n monitoring $POD_ID -c kapacitor -- /bin/bash -c "kapacitor service-tests smtp"

If the settings are set up appropriately, the recipient should receive an email with the subject “test subject”.

Kapacitor Custom Alerts

Creating new alerts is as easy as using another Gravity resource of type alert. The alerts are written in TICKscript and are automatically detected, loaded, and enabled for Gravity Monitoring and Alerts system.

For demonstration purposes let’s define an alert that always fires:

kind: alert
version: v2
metadata:
  name: my-formula
spec:
  formula: |
    var period = 5m
    var every = 1m
    var warnRate = 2
    var warnReset = 1
    var usage_rate = stream
        |from()
            .measurement('cpu/usage_rate')
            .groupBy('nodename')
            .where(lambda: "type" == 'node')
        |window()
            .period(period)
            .every(every)
    var cpu_total = stream
        |from()
            .measurement('cpu/node_capacity')
            .groupBy('nodename')
            .where(lambda: "type" == 'node')
        |window()
            .period(period)
            .every(every)
    var percent_used = usage_rate
        |join(cpu_total)
            .as('usage_rate', 'total')
            .tolerance(30s)
            .streamName('percent_used')
        |eval(lambda: (float("usage_rate.value") * 100.0) / float("total.value"))
            .as('percent_usage')
        |mean('percent_usage')
            .as('avg_percent_used')
    var trigger = percent_used
        |alert()
            .message('{{ .Level}} / Node {{ index .Tags "nodename" }} has high cpu usage: {{ index .Fields "avg_percent_used" }}%')
            .warn(lambda: "avg_percent_used" > warnRate)
            .warnReset(lambda: "avg_percent_used" < warnReset)
            .stateChangesOnly()
            .details('''
    <b>{{ .Message }}</b>
    <p>Level: {{ .Level }}</p>
    <p>Nodename: {{ index .Tags "nodename" }}</p>
    <p>Usage: {{ index .Fields "avg_percent_used"  | printf "%0.2f" }}%</p>
    ''')
            .email()
            .log('/var/lib/kapacitor/logs/high_cpu.log')
            .mode(0644)

And create it :

$ gravity resource create -f formula.yaml

Custom alerts are being monitored by another “watcher” type of service that runs inside the Kapacitor pod:

$ kubectl -nmonitoring logs kapacitor-68f6d76878-8m26x watcher
time="2020-01-24T06:18:10Z" level=info msg="Detected event ADDED for configmap \"my-formula\"" label="monitoring in (alert)" watch=configmap

We can confirm the alert is running checking the logs after a few seconds:

$ kubectl -nmonitoring exec -ti kapacitor-68f6d76878-8m26x -c kapacitor cat -- /var/lib/kapacitor/logs/high_cpu.log
{"id":"percent_used:nodename=10.0.2.15","message":"WARNING / Node 10.0.2.15 has high cpu usage: 15%","details":"\n\u003cb\u003eWARNING / Node 10.0.2.15 has high cpu usage: 15%\u003c/b\u003e\n\u003cp\u003eLevel: WARNING\u003c/p\u003e\n\u003cp\u003eNodename: 10.0.2.15\u003c/p\u003e\n\u003cp\u003eUsage: 15.00%\u003c/p\u003e\n","time":"2020-01-24T06:30:00Z","duration":0,"level":"WARNING","data":{"series":[{"name":"percent_used","tags":{"nodename":"10.0.2.15"},"columns":["time","avg_percent_used"],"values":[["2020-01-24T06:30:00Z",15]]}]},"previousLevel":"OK","recoverable":true}

To view all currently configured custom alerts you can run:

$ gravity resource get alert my-formula

In order to remove a specific alert you can execute the following kapacitor command inside the designated pod:

$ kapacitor delete alert my-formula

This concludes our monitoring training.