Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when compiling kube-prometheus-stack #983

Open
jyepesr1 opened this issue Apr 3, 2023 · 3 comments
Open

Issue when compiling kube-prometheus-stack #983

jyepesr1 opened this issue Apr 3, 2023 · 3 comments

Comments

@jyepesr1
Copy link

jyepesr1 commented Apr 3, 2023

Describe the bug/feature
In the latest release 0.31.0, we have an issue when building kube-prometheus-stack where the alert rules are created with a wrong syntax

Expected behavior
When we run on version 0.30 we got this correct output

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: kube-prometheus-stack
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 34.9.0
    chart: kube-prometheus-stack-34.9.0
    heritage: Helm
    prom-rule: kube-prometheus-stack
    release: kube-prometheus-stack
  name: kube-prometheus-stack-k8s.rules
  namespace: monitoring
spec:
  groups:
  - name: k8s.rules
    rules:
    - expr: "sum by (cluster, namespace, pod, container) (\n  rate(container_cpu_usage_seconds_total{job=\"\
        kubelet\", metrics_path=\"/metrics/cadvisor\", image!=\"\", container!=\"\
        POD\"}[5m])\n) * on (cluster, namespace, pod) group_left(node) topk by (cluster,\
        \ namespace, pod) (\n  1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=\"\
        \"})\n)"
      record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
    - expr: "container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\"\
        , image!=\"\", container!=\"POD\"}\n* on (namespace, pod) group_left(node)\
        \ topk by(namespace, pod) (1,\n  max by(namespace, pod, node) (kube_pod_info{node!=\"\
        \"})\n)"
      record: node_namespace_pod_container:container_memory_working_set_bytes

Now after upgrading to 0.31 we are getting this output

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: kube-prometheus-stack
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: kube-prometheus-stack
    app.kubernetes.io/version: 34.9.0
    chart: kube-prometheus-stack-34.9.0
    heritage: Helm
    prom-rule: kube-prometheus-stack
    release: kube-prometheus-stack
  name: kube-prometheus-stack-k8s.rules
  namespace: monitoring
spec:
  groups:
  - name: k8s.rules
    rules:
    - alert: {}
      expr: "sum by (cluster, namespace, pod, container) (\n  rate(container_cpu_usage_seconds_total{job=\"\
        kubelet\", metrics_path=\"/metrics/cadvisor\", image!=\"\", container!=\"\
        POD\"}[5m])\n) * on (cluster, namespace, pod) group_left(node) topk by (cluster,\
        \ namespace, pod) (\n  1, max by(cluster, namespace, pod, node) (kube_pod_info{node!=\"\
        \"})\n)"
      record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate
    - alert: {}
      expr: "container_memory_working_set_bytes{job=\"kubelet\", metrics_path=\"/metrics/cadvisor\"\
        , image!=\"\", container!=\"POD\"}\n* on (namespace, pod) group_left(node)\
        \ topk by(namespace, pod) (1,\n  max by(namespace, pod, node) (kube_pod_info{node!=\"\
        \"})\n)"
      record: node_namespace_pod_container:container_memory_working_set_bytes

There is a new entry added to all items "alerts" set to an empty map {}, which causes the apply to fail.

Any suggestions?

Screenshots
If applicable, add screenshots to help explain your problem/idea.

If it's a bug (please complete the following information):

  • python --version:
  • pip3 --version:
  • Are you using pyenv or virtualenv?

Additional context
Add any other context about the problem here.

@MatteoVoges
Copy link
Contributor

Hey @jyepesr1
could you add some steps to reproduce this issue, maybe sharing your inventory-specification of the relevant file.
Note that there were several issues regarding the kube-prometheus-stack with pyYaml.
Is it just the empty alert: {} property, or is there anything else, that changed?

@jyepesr1
Copy link
Author

Hi @MatteoVoges, The most relevant issue at least with kube-prometheus-stack is this empty map, we haven't tested other components since we had to roll back to v0.30.0, we are using the following configuration:

 chart_name: kube-prometheus-stack
 chart_version: 34-9-0-kp-0-9
    - name: helm3-template-prometheus
       input_type: kadet
       output_path: ${prometheus:manifests_output_location}
       input_paths:
         - kadet_functions/helm3_template
       input_params:
         chart_location: shared/${prometheus:chart_name}/${prometheus:chart_version}
         clean_manifests: false
         helm_values_files:
           - ${prometheus:helm_values_files_location}/common.yml
           - ${prometheus:helm_values_files_location}/grafana.yml
         helm_values: ${prometheus:helm_values}
         helm_params:
           - --namespace ${prometheus:namespace}
           - --name-template ${prometheus:chart_name}
           - --api-versions 1.22.0
           - --include-crds
           
    - name: remove-prometheus-rules
       input_type: kadet
       output_path: .
       input_paths:
         - kadet_functions/remove_prometheus_rules
       input_params:
         alerting_rules_to_remove:
           - KubePersistentVolumeFillingUp # noisy alert that we will change in the future
           - TargetDown # Too broad of an alert. We don't control all pods in the cluster
         chart_compiled_dir: ${prometheus:manifests_output_location}/${prometheus:chart_name}

We realized the issue is happening in the Kadet function we use to delete some Prometheus rules. This script works as expected as I mentioned in the 0.30.0, but in this new version produces this output. Not sure if is related as you mentioned to the PyYaml version. This is the script content

import os
from kapitan.inputs import kadet
import yaml
import sys

curr_dir = os.path.dirname(__file__)
sys.path.append(os.path.abspath(os.path.join(curr_dir, "..", "common")))
import common

inventory = kadet.inventory()


def is_prometheus_rule(resource):
    if resource.root.kind == "PrometheusRule":
        return True

    return False


def remove_alerting_rule_if_present(resource, alerting_rules):
    for g_idx, group in enumerate(resource.root.spec.groups):
        for r_idx, rule in enumerate(group.rules):
            if rule.alert in alerting_rules:
                resource.root.spec.groups[g_idx].rules.pop(r_idx)

    return resource


def main(input_params):
    if "chart_compiled_dir" in input_params:
        chart_compiled_dir = input_params.get("chart_compiled_dir")
    else:
        raise ValueError("'chart_compiled_dir' key not found in 'input_params'")

    if "alerting_rules_to_remove" in input_params:
        alerting_rules_to_remove = input_params.get("alerting_rules_to_remove")
    else:
        raise ValueError(
            "'alerting_rules_to_remove' key not found in 'alerting_rules_to_remove'"
        )

    # get path where files have been compiled on this run
    inventory.parameters.kapitan.vars.target

    compile_path = input_params.get("compile_path")

    all_objects = {}
    prometheus_sub_chart = os.path.join(chart_compiled_dir, "templates", "prometheus")
    manifest_files = common.get_compiled_manifest_files_for_component(
        compile_path, prometheus_sub_chart
    )

    for file in manifest_files:
        file_name_kadet_format = common.format_file_name_for_kadet(compile_path, file)
        with open(file) as fp:
            yaml_stream = yaml.safe_load_all(fp)
            objects_for_file = []
            for obj in yaml_stream:
                o = kadet.BaseObj.from_dict(obj)
                # ensure yaml object is valid k8s resource
                # cleans up any cruft from helm template
                if common.kadet_object_has_kind(o) is not True:
                    continue
                if is_prometheus_rule(o):
                    o = remove_alerting_rule_if_present(o, alerting_rules_to_remove)

                objects_for_file.append(o)
            all_objects.update({file_name_kadet_format: objects_for_file})
        fp.close()

    output = kadet.BaseObj()
    for file_name, obj in all_objects.items():
        # TODO: output the object with the naming style we want
        # kind = obj.root.kind
        # name = obj.root.metadata.name
        output.root[file_name] = obj
    return output

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants