Allow users/community to define healthy status conditions per kubernetes resources per versions #4140

Cajga · 2023-11-20T13:16:59Z

Problem

Currently, weave gitops reports "red" status instead of "green" at graph view for several resource types while the resource is in fact healthy.

Example:
HorizontalPodAutoscaler (apiVersion: autoscaling/v2)
It has 3 status conditions:

AbleToScale: should be True,
ScalingActive: should be True,
ScalingLimited: should be False

Solution

Define a way/procedure how/where users/projects can define the healthy status of a resource on a specific version. It could be a configuration or sending a PR etc.

Additional context

I would be willing to contribute the definition of several resources if it would be well defined how to do it

makkes · 2023-11-20T13:52:49Z

To determine the health of HPAs Weave GitOps compares .status.currentReplicas with .status.desiredReplicas and checks each existing condition for a "Failed" or "Invalid" reason.

@Cajga would you mind posting the complete .status object of your HPA here?

kubectl get hpa -n NAMESPACE HPA -o jsonpath={.status}

Cajga · 2023-11-20T14:26:40Z

@makkes thanks for looking into this. Sure, here we are:

# kubectl get hpa -n istio-system istiod -o jsonpath={.status}|jq
{
  "conditions": [
    {
      "lastTransitionTime": "2023-11-17T15:15:50Z",
      "message": "recent recommendations were higher than current one, applying the highest recent recommendation",
      "reason": "ScaleDownStabilized",
      "status": "True",
      "type": "AbleToScale"
    },
    {
      "lastTransitionTime": "2023-11-17T15:16:20Z",
      "message": "the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)",
      "reason": "ValidMetricFound",
      "status": "True",
      "type": "ScalingActive"
    },
    {
      "lastTransitionTime": "2023-11-18T19:10:45Z",
      "message": "the desired count is within the acceptable range",
      "reason": "DesiredWithinRange",
      "status": "False",
      "type": "ScalingLimited"
    }
  ],
  "currentMetrics": [
    {
      "resource": {
        "current": {
          "averageUtilization": 0,
          "averageValue": "3m"
        },
        "name": "cpu"
      },
      "type": "Resource"
    }
  ],
  "currentReplicas": 1,
  "desiredReplicas": 1
}

NOTE: this is a default installation of Istio (with metrics-server) into production, which would scale automatically in case needed

Cajga · 2023-11-20T14:58:15Z

Let me drop here another example also from Istio's default installation:

# kubectl get poddisruptionbudgets.policy -n istio-system istiod -o jsonpath={.status}|jq
{
  "conditions": [
    {
      "lastTransitionTime": "2023-11-17T15:15:45Z",
      "message": "",
      "observedGeneration": 1,
      "reason": "InsufficientPods",
      "status": "False",
      "type": "DisruptionAllowed"
    }
  ],
  "currentHealthy": 1,
  "desiredHealthy": 1,
  "disruptionsAllowed": 0,
  "expectedPods": 1,
  "observedGeneration": 1
}

While we could argue on the fact that this shows that "disruption would not be allowed in this case" but this is still a healthy installation of Istio and the "red status" of the poddisruptionbudget does not look very nice on the graph

Cajga · 2023-11-20T15:07:35Z

@makkes hmm... looking into the code, it seems, your test data for HPA is in fact does not look good:

    message: the desired replica count is less than the minimum replica count
    reason: TooFewReplicas
    status: "True"
    type: ScalingLimited

I believe, that this means basically that the HPA would like to scale down but it reached the minReplicas. You should take action and reduce the minReplicas to allow it to scale down...
Red Hat guys have a good documentation about this.

foot · 2023-11-24T17:06:43Z

Thanks for raising the issue!

Sounds like HPA health checking could be improved.

Doing good health checking for built in k8s resources and at least flux resources too would be great to maintain and we might not need an extensible system for this.
Having a more extensible system that allows declaring red/green mapping for less common CustomResources would be neat but need some thought

weave gitops reports "red" status instead of "green" at graph view for several resource types

Are the other resource types CustomResources or builtin k8s resources?

Cajga · 2024-03-13T12:19:15Z

Hi @foot,

Sorry for not coming back. We stopped using/evaluating waeve-gitops as it does not support flux multi-tenant config (more details in this ticket).

As far as I remember there were few more resources reported red in our env but unfortunately, cannot recall which ones.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users/community to define healthy status conditions per kubernetes resources per versions #4140

Allow users/community to define healthy status conditions per kubernetes resources per versions #4140

Cajga commented Nov 20, 2023

makkes commented Nov 20, 2023

Cajga commented Nov 20, 2023 •

edited

Cajga commented Nov 20, 2023

Cajga commented Nov 20, 2023 •

edited

foot commented Nov 24, 2023

Cajga commented Mar 13, 2024

Allow users/community to define healthy status conditions per kubernetes resources per versions #4140

Allow users/community to define healthy status conditions per kubernetes resources per versions #4140

Comments

Cajga commented Nov 20, 2023

makkes commented Nov 20, 2023

Cajga commented Nov 20, 2023 • edited

Cajga commented Nov 20, 2023

Cajga commented Nov 20, 2023 • edited

foot commented Nov 24, 2023

Cajga commented Mar 13, 2024

Cajga commented Nov 20, 2023 •

edited

Cajga commented Nov 20, 2023 •

edited