Skip to content

Latest commit

 

History

History

metrics-monitoring-kafka

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Monitoring metrics in OpenShift Streams for Apache Kafka

As a developer or administrator, you can view metrics in OpenShift Streams for Apache Kafka to visualize the performance and data usage for Kafka instances and topics that you have access to. You can view metrics directly in the Streams for Apache Kafka web console, or use the metrics API endpoint provided by Streams for Apache Kafka to import the data into your own metrics monitoring tool, such as Prometheus.

Supported metrics in Streams for Apache Kafka

OpenShift Streams for Apache Kafka supports the following metrics for Kafka instances and topics. In the Streams for Apache Kafka web console, the Dashboard page of a Kafka instance displays a subset of these metrics. To learn more about the limits associated with both trial and production Kafka instance types, see Red Hat OpenShift Streams for Apache Kafka Service Limits.

Cluster metrics
kafka_namespace:haproxy_server_bytes_in_total:rate5m

Number of incoming bytes per second for the cluster in the last five minutes. This ingress metric represents all the data that producers are sending to topics in the cluster.

The Kafka instance type determines the maximum incoming byte rate.

kafka_namespace:haproxy_server_bytes_out_total:rate5m

Number of outgoing bytes per second for the cluster in the last five minutes. This egress metric represents all the data that consumers are receiving from topics in the cluster.

The Kafka instance type determines the maximum outgoing byte rate.

kafka_namespace:kafka_server_socket_server_metrics_connection_count:sum

Number of current client connections to the cluster. Kafka clients use persistent connections to interact with brokers in the cluster. For example, a consumer holds a connection to each broker it is receiving data from and a connection to its group coordinator.

The Kafka instance type determines the maximum number of active connections.

kafka_namespace:kafka_server_socket_server_metrics_connection_creation_rate:sum

Number of client connection creations per second for the cluster. Kafka clients use persistent connections to interact with brokers in the cluster. A constant high number of connection creations might indicate a client issue.

The Kafka instance type determines the maximum connection creation rate.

kafka_topic:kafka_topic_partitions:count

Number of topics in the cluster. This metric does not include internal Kafka topics, such as __consumer_offsets and __transaction_state.

kafka_topic:kafka_topic_partitions:sum

Number of partitions across all topics in the cluster. This metric does not include partitions from internal Kafka topics, such as __consumer_offsets and __transaction_state.

The Kafka instance type determines the maximum number of partitions.

kas_broker_partition_log_size_bytes_top50

Sizes, in bytes, of the fifty largest topic partitions on each broker in the cluster. The total amount of storage being used by all topic partitions on a broker is shown by the kafka_broker_quota_totalstorageusedbytes broker metric. The total usage for a broker must stay below the kafka_broker_quota_softlimitbytes value to avoid throttling of producers.

kas_topic_partition_log_size_bytes

Size, in bytes, of each topic partition on each broker in the cluster. The total amount of storage being used by all topic partitions on a broker is shown by the kafka_broker_quota_totalstorageusedbytes broker metric. The total usage for a broker must stay below the kafka_broker_quota_softlimitbytes value to avoid throttling of producers.

Broker metrics
kafka_broker_quota_softlimitbytes

Maximum amount of storage, in bytes, for this broker before producers are throttled. When this limit is reached, the broker starts throttling producers to prevent them from sending additional data.

The Kafka instance type determines the maximum storage in the broker.

kafka_broker_quota_totalstorageusedbytes

Amount of storage, in bytes, that is currently used by partitions in the broker. The storage usage depends on the number and retention configurations of the partitions. This metric must stay below the kafka_broker_quota_softlimitbytes value.

kafka_controller_kafkacontroller_global_partition_count

Number of partitions in the cluster. Only the broker that is the current controller in the cluster reports this metric. Any other brokers report a value of 0. This count includes partitions from internal Kafka topics, such as __consumer_offsets and __transaction_state. This metric is similar to the kafka_topic:kafka_topic_partitions:sum cluster metric.

kafka_controller_kafkacontroller_offline_partitions_count

Number of partitions in the cluster that are currently offline. Offline partitions cannot be used by clients for producing or consuming data. Only the broker that is the current controller in the cluster reports this metric. Any other brokers report 0.

kubelet_volume_stats_available_bytes

Amount of disk space, in bytes, that is available in the broker.

kubelet_volume_stats_used_bytes

Amount of disk space, in bytes, that is currently used in the broker. This metric is similar to the kafka_broker_quota_totalstorageusedbytes broker metric.

Topic metrics
kafka_server_brokertopicmetrics_bytes_in_total

Number of incoming bytes to topics in the instance.

kafka_server_brokertopicmetrics_bytes_out_total

Number of outgoing bytes from topics in the instance.

kafka_server_brokertopicmetrics_messages_in_total

Number of messages per second received by one or more topics in the instance.

kafka_topic:kafka_server_brokertopicmetrics_bytes_in_total:rate5m

Number of incoming bytes to topics in the instance in the last five minutes.

kafka_topic:kafka_server_brokertopicmetrics_bytes_out_total:rate5m

Number of outgoing bytes from topics in the instance in the last five minutes.

kafka_topic:kafka_server_brokertopicmetrics_messages_in_total:rate5m

Number of messages per second received by one or more topics in the instance in the last five minutes.

kafka_topic:kafka_log_log_size:sum

Log size, in bytes, of each topic and replica across all brokers in the cluster.

Viewing metrics for a Kafka instance in Streams for Apache Kafka

After you produce and consume messages in your services using methods such as Kafka scripts, Kcat, or a Quarkus application, you can return to the Kafka instance in the web console and use the Dashboard page to view metrics for the instance and topics. The metrics help you understand the performance and data usage for your Kafka instance and topics.

Prerequisites
Procedure
  • In the Kafka Instances page of the web console, click the name of the Kafka instance and select the Dashboard tab.

    When you create a Kafka instance and add new topics, the Dashboard page is initially empty. After you start producing and consuming messages in your services, you can return to this page to view related metrics. For example, to use Kafka scripts to produce and consume messages, see Configuring and connecting Kafka scripts with OpenShift Streams for Apache Kafka.

Note
In some cases, after you start producing and consuming messages, you might need to wait several minutes for the latest metrics to appear. You might also need to wait until your instance and topics contain enough data for metrics to appear.

Configuring metrics monitoring for a Kafka instance in Prometheus

As an alternative to viewing metrics for a Kafka instance in the OpenShift Streams for Apache Kafka web console, you can export your metrics to Prometheus and integrate the metrics with your own metrics monitoring platform. Streams for Apache Kafka provides a kafkas/{id}/metrics/federate API endpoint that you can configure as a scrape target for Prometheus to use to collect and store metrics. You can then access the metrics in the Prometheus expression browser or in a data-graphing tool such as Grafana.

This procedure follows the Configuration File method defined by Prometheus for integrating third-party metrics. If you use the Prometheus Operator in your monitoring environment, you can also follow the Additional Scrape Configuration method.

Prerequisites
  • You have access to a running Kafka instance that contains topics in Streams for Apache Kafka. For more information about access management in Streams for Apache Kafka, see Managing account access in OpenShift Streams for Apache Kafka.

  • You have the ID and the SASL/OAUTHBEARER token endpoint for the Kafka instance. To locate the Kafka instance ID and the token endpoint, select your Kafka instance in the Streams for Apache Kafka web console, select the options menu (three vertical dots) and click:

    • Details to locate the ID which is the ID of the Kafka instance.

    • Connection tab to locate the SASL/OAUTHBEARER Token endpoint URL.

  • You have the generated credentials for your service account that has access to the Kafka instance. To reset the credentials, use the Service Accounts page in the Application Services section of the Red Hat Hybrid Cloud Console.

  • You have installed a Prometheus instance in your monitoring environment. For installation instructions, see Getting Started in the Prometheus documentation.

Procedure
  1. In your Prometheus configuration file, add the following information. Replace the variable values with your own Kafka instance and service account information.

    The <kafka_instance_id> is the ID of the Kafka instance. The <client_id> and <client_secret> are the generated credentials for your service account that you copied previously. The <token_url> is the SASL/OAUTHBEARER token endpoint for the Kafka instance.

    Required information for Prometheus configuration file
    - job_name: "kafka-federate"
      static_configs:
      - targets: ["api.openshift.com"]
      scheme: "https"
      metrics_path: "/api/kafkas_mgmt/v1/kafkas/<kafka_instance_id>/metrics/federate"
      oauth2:
        client_id: "<client_id>"
        client_secret: "<client_secret>"
        token_url: "<token_url>"

    The new scrape target becomes available after the configuration has reloaded.

  2. View your collected metrics in the Prometheus expression browser at http://<host>:<port>/graph, or integrate your Prometheus data source with a data-graphing tool such as Grafana. For information about Prometheus metrics in Grafana, see Grafana Support for Prometheus in the Grafana documentation.

    If you use Grafana with your Prometheus instance, you can import the predefined OpenShift Streams for Apache Kafka Grafana dashboard to set up your metrics display. For import instructions, see Importing a dashboard in the Grafana documentation.

When you create a Kafka instance and add new topics, the metrics are initially empty. After you start producing and consuming messages in your services, you can return to your monitoring tool to view related metrics. For example, to use Kafka scripts to produce and consume messages, see Configuring and connecting Kafka scripts with OpenShift Streams for Apache Kafka.

Note
In some cases, after you start producing and consuming messages, you might need to wait several minutes for the latest metrics to appear. You might also need to wait until your instance and topics contain enough data for metrics to appear.
Note

If you use the Prometheus Operator in your monitoring environment, you can alternatively create a kafka-federate.yaml file as an additional scrape configuration in your Prometheus custom resource as shown in the following example commands. For more information about this method, see Additional Scrape Configuration in the Prometheus documentation.

Example kafka-federate.yaml file
- job_name: "kafka-federate"
  static_configs:
  - targets: ["api.openshift.com"]
  scheme: "https"
  metrics_path: "/api/kafkas_mgmt/v1/kafkas/<kafka_instance_id>/metrics/federate"
  oauth2:
    client_id: "<client_id>"
    client_secret: "<client_secret>"
    token_url: "<token_url>"
Example command to create and apply a Kubernetes secret
kubectl create secret generic additional-scrape-configs --from-file=<~/kafka-federate.yaml> --dry-run -o yaml \
kubectl apply -f - -n <namespace>
Example Prometheus custom resource with new secret
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
    ...
spec:
    ...
    additionalScrapeConfigs:
        name: additional-scrape-configs
        key: kafka-federate.yaml

Configuring Prometheus alerts for Kafka instance limits

Prerequisites
  • You have successfully configured metrics monitoring for a Kafka instance in Prometheus.

  • You use the Prometheus Operator in your monitoring environment.

  • You can define alerting rules in Prometheus and can deploy an Alertmanager cluster in Prometheus Operator.

Procedure
  1. Create a PrometheusRule custom resource with alerts defined for the capacity of your Kafka instance.

  2. Apply the PrometheusRule to the cluster that you are federating the metrics to.

Example PrometheusRule custom resource for a Kafka broker storage limit alert
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
spec:
  groups:
    - name: limits
      rules:
        - alert: KafkaBrokerStorageFillingUp
          expr: predict_linear(kubelet_volume_stats_available_bytes{persistentvolumeclaim=~"data-(.+)-kafka-[0-9]+"}[1h], 4 * 3600)
          labels:
            severity: <SOME_SEVERITY>
          annotations:
            summary: 'Broker PersistentVolume is filling up.'
            description: 'Based on recent sampling, the Broker PersistentVolume claimed by {{ $labels.persistentvolumeclaim }} is expected to fill up within four days.