Add Grafana dashboard #2537

zamazan4ik · 2022-11-27T19:44:42Z

Describe the feature request

We want to monitor Unleash on-prem installation with Prometheus. And we want to observe the collected metrics via Grafana dashboard.

Background

No response

Solution suggestions

Would be awesome to have a ready-to-use Grafana dashboard. In this case, would be much easier to observe Unleash from the box.

Tymek · 2022-11-28T14:01:30Z

I assume a ready-to-use dashboard is something to be used with dashboard management - import/export. Seems like a common setup to have, and a template definitely can help. For Unleash hosted we have a bit different needs, supporting many clients in multiple regions, so I don't think it's helpful as a starter. Maybe we can share some queries and chart definitions. CC @chriswk

This could work as a blog post or snippet, because I don't feel like we're able to support it long term in the repository.

gastonfournier · 2022-12-26T08:28:24Z

Hi @zamazan4ik I'm trying to understand the use case: what kind of metrics are you interested in?

We do expose some application level metrics (api docs) which could be connected with Prometheus and drawn in Grafana. Are these the kind of metrics you wanted? or are you interested in other types of metrics (more operational, such as CPU and memory)?

zamazan4ik · 2022-12-26T08:37:33Z

@gastonfournier I am interested in both of them. Application-level metrics are interesting for more business-aligned stakeholders, I suppose. (just a note - the description for "application-level metrics could be improved, I guess. it's not clear, what is described by each metric).

Operational metrics are interesting for the Unleash maintainers (admins, devops, etc.). Memory/CPU usage of the whole process (or a bunch of processes/microservices - I am not familiar with the whole Unleash stack yet) would be useful for them. If you know more metrics that are useful for Unleash admin - would be awesome to put them too to the dashboard.

gastonfournier · 2022-12-26T09:46:17Z

We do build our dashboards for our operations based on metrics exposed by one endpoint. I just double-checked because I did not remember if it was open-sourced, and yes it is: https://docs.getunleash.io/reference/api/legacy/unleash/internal/prometheus

What we do is have a Prometheus instance scraping this endpoint, and we build Grafana dashboards based on that information. Recently, I've set this up to test some new metrics and this is the configuration I've used:

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'example'

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    scrape_timeout: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: node
    # If prometheus-node-exporter is installed, grab stats about the local
    # machine by default.
    static_configs:
      - targets: ['localhost:9100']

  - job_name: local_unleash
    metrics_path: /internal-backstage/prometheus
    static_configs:
      - targets: ['localhost:4242']

Of course, this is for local testing and how you configure it might vary depending on your environment.

Now, on the type of dashboards you can have, it might be interesting to have a repository with community-maintained Grafana dashboards. We do have most of ours built around multitenancy, so we'd have to clean those up of client-id variables and other infra (such as our API GW of choice or our cloud provider metrics) for them to be useful for on-prem installations.

I went over our operational dashboards and they need some work but a good starting point could be a list of things to monitor, I can start with the first draft (feel free to suggest other metrics):

Total requests per second per url (line chart showing the change over time)
Error requests per second per url (line chart showing the change over time)
Process CPU usage (line chart showing the change over time)
Process memory usage (line chart showing the change over time)
Database connection pool (line chart showing the change over time)
Eventloop lag (99th percentile line chart showing the change over time)

Others that would require Prometheus node exporter:

Instance CPU utilization (line chart showing the change over time)
Instance load (line chart showing the change over time)
Instance memory usage (line chart showing the change over time)

Let me know if this helps. I'll bring this to the team when we get together next year.

zamazan4ik · 2022-12-26T10:21:23Z

Let me know if this helps. I'll bring this to the team when we get together next year.

Yes, this helps a lot!

I went over our operational dashboards and they need some work but a good starting point could be a list of things to monitor, I can start with the first draft (feel free to suggest other metrics)

That would be awesome. It's much easier for the users just download Grafana dashboards and import them to the local setup.

Thanks in advance!

gastonfournier · 2023-02-07T14:31:16Z

Maybe a simple bare bone example of most important metrics could be added to https://grafana.com/grafana/dashboards/

rakshitgondwal · 2023-10-31T13:09:38Z

Is this open for grabs?

gastonfournier · 2023-11-03T08:52:28Z

Hi @rakshitgondwal, we haven't prioritized this yet, so any contribution will be welcomed

ogunleye0720 · 2024-01-21T17:20:49Z

hello @gastonfournier , I found your suggestion on the Grafana dashboard and the types of metrics that could be scraped by Prometheus both at Application level, and Operational (Infrastructure) level, insightful. But I do think the type of data to be visualized using Grafana depends heavily on the organization. Although, we have the commonly scraped metrics such as, CPU/Memory and Storage , at operational level. A lot of processes can be monitored and depends on the organizations needs. I would suggest that @zamazan4ik carries out a survey within the organization to determine the kind of data that suits the business needs.

zamazan4ik added the enhancement label Nov 27, 2022

Tymek added help wanted good first issue ideas labels Nov 28, 2022

nunogois assigned chriswk Dec 6, 2022

gastonfournier assigned gastonfournier and unassigned chriswk Dec 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Grafana dashboard #2537

Add Grafana dashboard #2537

zamazan4ik commented Nov 27, 2022

Tymek commented Nov 28, 2022 •

edited

gastonfournier commented Dec 26, 2022 •

edited

zamazan4ik commented Dec 26, 2022

gastonfournier commented Dec 26, 2022

zamazan4ik commented Dec 26, 2022

gastonfournier commented Feb 7, 2023

rakshitgondwal commented Oct 31, 2023

gastonfournier commented Nov 3, 2023

ogunleye0720 commented Jan 21, 2024

Add Grafana dashboard #2537

Add Grafana dashboard #2537

Comments

zamazan4ik commented Nov 27, 2022

Describe the feature request

Background

Solution suggestions

Tymek commented Nov 28, 2022 • edited

gastonfournier commented Dec 26, 2022 • edited

zamazan4ik commented Dec 26, 2022

gastonfournier commented Dec 26, 2022

zamazan4ik commented Dec 26, 2022

gastonfournier commented Feb 7, 2023

rakshitgondwal commented Oct 31, 2023

gastonfournier commented Nov 3, 2023

ogunleye0720 commented Jan 21, 2024

Tymek commented Nov 28, 2022 •

edited

gastonfournier commented Dec 26, 2022 •

edited