Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs enricher when use prometheus does not show the graph for node utilisation #1168

Open
antikilahdjs opened this issue Nov 14, 2023 · 10 comments
Labels
needs-triage This issue should be reviewed and tagged appropriately

Comments

@antikilahdjs
Copy link

Describe the bug

Hey, I am using the Robusta some days ago with Prometheus plus Alertmanager but the enricher for prometheus like memory or cpu is empty when fired to teams as alert. Only the graph for memory usage appears in the graph but the node utilisation is not showing.

The image below will demonstrate it

image

To Reproduce
Steps to reproduce the behavior:
1 - Install using the official helm charts
3 - Configure Prometheus and Alertmanager
4 - Configure the SINK to use Teams
5 - In my global env has been configured like below

globalConfig:
  grafana_url: ""
  grafana_api_key: ""
  grafana_dashboard_uid: ""
  alertmanager_url: "http://alertmanager-operated.thanos:9093"
  prometheus_url: "http://thanos-query-frontend.thanos:9090"
  signing_key: ""
  account_id: 695c3053-0e56-xxxxxxxxxxxxxxxxxxxxxx
  custom_annotations: []

5 - My trigger and action is:

- triggers:
  - on_pod_oom_killed:
      rate_limit: 3600
  actions:
  - pod_oom_killer_enricher: {}
  - logs_enricher: {}
  - pod_node_graph_enricher:
      resource_type: Memory
      display_limits: true
  - oomkilled_container_graph_enricher:
      resource_type: Memory
      display_limits: true
  stop: true

Expected behavior

The graph woks for both sides, node utilisation and pod utilisation

Screenshots
It was added above

Desktop (please complete the following information):

  • OS: RedHat 8.5 and Ubunut 20.04LTS
  • Browser: Chrome
  • Version: 119

Additional context
Add any other context about the problem here.

@pavangudiwada pavangudiwada added the needs-triage This issue should be reviewed and tagged appropriately label Nov 15, 2023
@saireddyb
Copy link

Yes I to receive the node graph empty.

@wrbbz
Copy link

wrbbz commented May 16, 2024

Same here on Robusta 0.12.0 without UI integration

@Bobses
Copy link

Bobses commented May 16, 2024

Same here on Robusta 0.12.0 without UI integration

Same here.

@arikalon1
Copy link
Contributor

@Bobses @wrbbz do you see any exception in the robusta-runner pod logs ?

@aantn
Copy link
Collaborator

aantn commented May 17, 2024

Hi all, I believe this is because robusta is using the recording rule instance:node_memory_utilisation:ratio which isn't present in your environment.

If that is the case, we should be able to fix this by replacing instance:node_memory_utilisation:ratio with it's definition or possibly just by
container_memory_working_set_bytes{node="${node_name}", container!=""}

@aantn
Copy link
Collaborator

aantn commented May 17, 2024

To help us get to the bottom of this, can each of you please verify that the metric instance:node_memory_utilisation:ratio is in fact missing from your environment.

@wrbbz
Copy link

wrbbz commented May 17, 2024

Yeah. I can confirm that we do not have instance:node_memory_utilisation:ratio. Only container_memory_working_set_bytes

@Bobses
Copy link

Bobses commented May 20, 2024

I confirm that we don't have that record.

So, I'll add the following record:

record: instance:node_memory_utilisation:ratio
expr: 1 - (node_memory_MemAvailable_bytes{job="node-exporter"} or (node_memory_Buffers_bytes{job="node-exporter"} + node_memory_Cached_bytes{job="node-exporter"} + node_memory_MemFree_bytes{job="node-exporter"} + node_memory_Slab_bytes{job="node-exporter"} ) / node_memory_MemTotal_bytes{job="node-exporter"}) 

Thank you!

@aantn
Copy link
Collaborator

aantn commented May 20, 2024

Yep, that will fix the problem. (Please confirm!)

I think we should also change this on our side to query the expr instead and not rely on that recording rule.

@wrbbz
Copy link

wrbbz commented May 21, 2024

I've created a PR on usage definitions instead of records

Also, adding record to the Prom instance solved No Data error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage This issue should be reviewed and tagged appropriately
Projects
None yet
Development

No branches or pull requests

7 participants