Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Labels kubernetes_node unavailable with kube-state-metrics #2695

Open
jgourmelen opened this issue Apr 6, 2024 · 2 comments
Open

Labels kubernetes_node unavailable with kube-state-metrics #2695

jgourmelen opened this issue Apr 6, 2024 · 2 comments
Assignees
Labels
E2 Estimated level of Effort (1 is easiest, 4 is hardest) kubecost Relevant to Kubecost's downstream project needs-follow-up needs-triage opencost OpenCost issues vs. external/downstream P2 Estimated Priority (P0 is highest, P4 is lowest)

Comments

@jgourmelen
Copy link

I'm utilizing a setup involving KSM and Opencost, in addition to a Grafana-agent installation akin to what is showcased in https://github.com/grafana/k8s-monitoring-helm, specifically the example provided at https://github.com/grafana/k8s-monitoring-helm/blob/main/examples/eks-fargate/metrics.river.

I'm encountering an error with Opencost:

2024-04-06T11:32:52.678207703Z INF GPU without cost found for fr-par-1,DEV1-XL, calculating...
2024-04-06T11:32:52.678263604Z INF GPU without cost found for fr-par-1,DEV1-XL, calculating...
2024-04-06T11:32:52.787530765Z ERR CostDataRange: Request Error: Prometheus communication error: 422 (Unprocessable Entity) Headers: { Server: [ nginx/1.25.4 ], Date: [ Sat, 06 Apr 2024 11:32:51 GMT ], Content-Type: [ application/json ], Content-Length: [ 660 ], Connection: [ keep-alive ], Vary: [ Accept-Encoding ] }, Body: {"status":"error","errorType":"execution","error":"found duplicate series for the match group {cluster=\"wizops\", namespace=\"monitoring\", persistentvolumeclaim=\"data-loki-write-2\"} on the left hand-side of the operation: [{cluster=\"wizops\", namespace=\"monitoring\", persistentvolumeclaim=\"data-loki-write-2\", storageclass=\"scw-bssd\", volumename=\"pvc-7204a498-4ecb-4b9b-8708-3c0e97e06113\"}, {cluster=\"wizops\", namespace=\"monitoring\", persistentvolumeclaim=\"data-loki-write-2\", storageclass=\"scw-bssd\", volumename=\"pvc-3a86c379-5abf-4a42-8857-0633c8c97bb3\"}];many-to-many matching not allowed: matching labels must be unique on one side"} Query: avg(avg(kube_persistentvolumeclaim_info{volumename != "", }) by (persistentvolumeclaim, storageclass, namespace, volumename, cluster, kubernetes_node)
    *
    on (persistentvolumeclaim, namespace, cluster, kubernetes_node) group_right(storageclass, volumename)
    sum(kube_persistentvolumeclaim_resource_requests_storage_bytes{}) by (persistentvolumeclaim, namespace, cluster, kubernetes_node, kubernetes_name)) by (persistentvolumeclaim, storageclass, namespace, cluster, volumename, kubernetes_node)
2024-04-06T11:32:52.787625966Z ERR CostDataRange: Parsing Error: Prometheus communication error: avg(avg(kube_persistentvolumeclaim_info{volumename != "", }) by (persistentvolumeclaim, storageclass, namespace, volumename, cluster, kubernetes_node)
    *
    on (persistentvolumeclaim, namespace, cluster, kubernetes_node) group_right(storageclass, volumename)
    sum(kube_persistentvolumeclaim_resource_requests_storage_bytes{}) by (persistentvolumeclaim, namespace, cluster, kubernetes_node, kubernetes_name)) by (persistentvolumeclaim, storageclass, namespace, cluster, volumename, kubernetes_node)
2024-04-06T11:32:52.787752907Z INF Error building cache [2024-04-05T11:31:47+0000, 2024-04-06T11:31:47+0000): Error Collection:
0) Errors:
  Request Error: Prometheus communication error: 422 (Unprocessable Entity) Headers: { Server: [ nginx/1.25.4 ], Date: [ Sat, 06 Apr 2024 11:32:51 GMT ], Content-Type: [ application/json ], Content-Length: [ 660 ], Connection: [ keep-alive ], Vary: [ Accept-Encoding ] }, Body: {"status":"error","errorType":"execution","error":"found duplicate series for the match group {cluster=\"wizops\", namespace=\"monitoring\", persistentvolumeclaim=\"data-loki-write-2\"} on the left hand-side of the operation: [{cluster=\"wizops\", namespace=\"monitoring\", persistentvolumeclaim=\"data-loki-write-2\", storageclass=\"scw-bssd\", volumename=\"pvc-7204a498-4ecb-4b9b-8708-3c0e97e06113\"}, {cluster=\"wizops\", namespace=\"monitoring

\", persistentvolumeclaim=\"data-loki-write-2\", storageclass=\"scw-bssd\", volumename=\"pvc-3a86c379-5abf-4a42-8857-0633c8c97bb3\"}];many-to-many matching not allowed: matching labels must be unique on one side"} Query: avg(avg(kube_persistentvolumeclaim_info{volumename != "", }) by (persistentvolumeclaim, storageclass, namespace, volumename, cluster, kubernetes_node)
    *
    on (persistentvolumeclaim, namespace, cluster, kubernetes_node) group_right(storageclass, volumename)
    sum(kube_persistentvolumeclaim_resource_requests_storage_bytes{}) by (persistentvolumeclaim, namespace, cluster, kubernetes_node, kubernetes_name)) by (persistentvolumeclaim, storageclass, namespace, cluster, volumename, kubernetes_node)
  Parse Error: Prometheus communication error: avg(avg(kube_persistentvolumeclaim_info{volumename != "", }) by (persistentvolumeclaim, storageclass, namespace, volumename, cluster, kubernetes_node)
    *
    on (persistentvolumeclaim, namespace, cluster, kubernetes_node) group_right(storageclass, volumename)
    sum(kube_persistentvolumeclaim_resource_requests_storage_bytes{}) by (persistentvolumeclaim, namespace, cluster, kubernetes_node, kubernetes_name)) by (persistentvolumeclaim, storageclass, namespace, cluster, volumename, kubernetes_node)
for Query: avg(avg(kube_persistentvolumeclaim_info{volumename != "", }) by (persistentvolumeclaim, storageclass, namespace, volumename, cluster, kubernetes_node)
    *
    on (persistentvolumeclaim, namespace, cluster, kubernetes_node) group_right(storageclass, volumename)
    sum(kube_persistentvolumeclaim_resource_requests_storage_bytes{}) by (persistentvolumeclaim, namespace, cluster, kubernetes_node, kubernetes_name)) by (persistentvolumeclaim, storageclass, namespace, cluster, volumename, kubernetes_node)
2024-04-06T11:32:53.322864141Z INF caching 1d cluster costs for 11m0s

The core issue seems to be that KSM is not providing "kubernetes_node" labels in its metrics.

Any insights on how to navigate this issue?

@jgourmelen
Copy link
Author

@mattray
Copy link
Collaborator

mattray commented Apr 10, 2024

Reading through that KSM link, they suggest you get kubernetes_node from your container. What is "data-loki-write-2"? I see the 422 (Unprocessable Entity) and I'm curious if there's something that needs fixing here.

@mattray mattray added opencost OpenCost issues vs. external/downstream P2 Estimated Priority (P0 is highest, P4 is lowest) kubecost Relevant to Kubecost's downstream project E2 Estimated level of Effort (1 is easiest, 4 is hardest) labels Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E2 Estimated level of Effort (1 is easiest, 4 is hardest) kubecost Relevant to Kubecost's downstream project needs-follow-up needs-triage opencost OpenCost issues vs. external/downstream P2 Estimated Priority (P0 is highest, P4 is lowest)
Projects
None yet
Development

No branches or pull requests

3 participants