WIP: Create optional monitoring configuration for Mastodon using Grafana #12

ywwg · 2022-12-09T19:29:04Z

Hi all, this is a draft PR of a change to add optional Grafana+Prometheus monitoring to Mastodon instances deployed with this chart set. The goal is to enable instance admins to very quickly add monitoring with all of the tricky parts of wiring together the various scrapers and config files done automatically.

Includes automatic scraping of kubernetes node_exporter, Mastodon Statsd output, and Postgres data.
Includes four default dashboards.
Includes ingress changes to create routes for grafana.hostname

This is a work in progress, but is functional. The main question I'd like to ask with this PR is, is this a feature the Mastodon project is interested in including? I don't want to do the work of continuing to polish and refactor the config if it's not a direction you want to go. The current major issues are the hardcoded values inside values.yaml that I would like to replace with auto-generated URLs, but I am having trouble figuring out how to do that in Helm and given the available ways to configure the services.

Major TODOS:

make the default dashboards fully functional / usable. Right now there are some incorrect queries
do more parameterization, esp where release name and namespace are currently hard-coded
move as much boilerplate out of values.yaml as possible

…nd Prometheus. Includes automatic scraping of kubernetes node_exporter, Mastodon Statsd output, and Postgres data. Includes four default dashboards. Includes ingress changes to create routes for grafana.hostname

ineffyble · 2022-12-14T00:01:53Z

@ywwg It looks like Helm is trying to template your txt file 🙃

ywwg · 2022-12-14T19:16:56Z

@ywwg It looks like Helm is trying to template your txt file upside_down_face

any chance you can re-trigger the workflow? I think I might have fixed it but I can't reproduce the issue locally so I'm not sure

Seen here: prometheus-community/helm-charts#2723

ywwg · 2022-12-15T19:01:15Z

I am new to helm, but it appears that NOTES.txt is supposed to be a templateable yaml file that helm can process.

I believe the version of helm in the CI is too old: prometheus-community/helm-charts#2723

updated helm to 3.7 to see if that fixes it

deepy · 2022-12-15T21:53:56Z

templates/ingress.yaml

@@ -67,5 +68,24 @@ spec:
            pathType: Exact
            {{- end }}
          {{- end }}
+    - host: {{ printf "grafana.%s" .host | quote }}


we shouldn't be setting up the ingress for Grafana, it's better to use their Values.ingress instead of having to maintain a copy of it

good point, I'll work on that

deepy · 2022-12-15T21:54:39Z

Doesn't it seem a little bit heavy to add 4 more subcharts here?

ywwg · 2022-12-16T18:19:00Z

Doesn't it seem a little bit heavy to add 4 more subcharts here?

That's one reason I wanted to post the in-progress chart, to get a sense of what the goal of this chart is and how full-featured y'all want it to be. For instance, if this chart exists just to deploy mastodon, then it's probably superfluous to include the monitoring stack. I was thinking this chart could be used by admins who want to deploy a solid mastodon instance and are maybe less familiar with running an important service like this. I have seen a lot of messages in my feeds by admins who were surprised to run out of disk, or that the server went down over night. In that light, this chart could be used to help these less-experienced admins set up a more robust instance without having to worry about configuring monitoring themselves. (And those who don't need the monitoring stack or want to roll their own can disable it easily).

To answer your question specifically, in order to have a functional monitoring stack we need at least three of the charts: prometheus, which gathers and stores the metrics; the statsd exporter, to record metrics from mastodon; and grafana, to view the metrics. We could easily drop the postgresql chart if we don't care about providing access to postgres stats.

It also might help to talk about what "heavy" means in this case -- do you mean the maintenance burden of keeping them functional?

renchap · 2023-02-15T11:09:25Z

Hi there, and first of all thanks @ywwg for this PR!

I am working on moving mastodon.social & mastodon.online to K8s. For now mastodon.online has been deployed using home-made K8s resources (we needed to move it fast) but we are looking into using the official chart for those servers.

We have our existing monitoring deployment, and we dont want to have the Prometheus stack deployed using this chart. But it would probably a good idea to have a way to get the dashboards, and the various Monitor resources created by the chart so an existing prometheus-operator deployed on the cluster would start scraping the resources.

Would this be a valid alternative to what you have done?

I was thinking this chart could be used by admins who want to deploy a solid mastodon instance and are maybe less familiar with running an important service like this.
For this concern, I think we could have a "meta" chart, that has more sub-chart dependencies and can (optionnaly) install all the parts of a Mastodon chart, including postgres, redis, prometheus… so people who want a fully-managed setup have this option.

On another topic, I see you chose to deploy the statsd_exporter as its own deployment. Did you consider having the statsd_exporter as a sidecar for Puma & Sidekiq pods, and all of those being scraped by Prometheus? This is the setup recommended in the exporters's README and what I deployed on mastodon.online, with good results so far.

The last issue I am looking to solve is being able to host multiple Mastodon instances on the same cluster / Prometheus monitoring. This is important for us, so I am looking into adding a label to all Mastodon-related metrics to specify the Mastodon server/instance the metrics refer to, and patching the dashboard to be able to view metrics only from the selected instance. Does this makes sense to you?

ywwg added 5 commits December 9, 2022 13:39

Provide link back to statsd mapping source at ipng.ch

ab19d70

Parameterize grafana datasources

81f5098

restore dashboardproviders, we do need it

9c9ef50

templatize statsd address setup

c5252a7

ywwg changed the title ~~Create optional monitoring configuration for Mastodon using Grafana~~ WIP: Create optional monitoring configuration for Mastodon using Grafana Dec 13, 2022

ywwg marked this pull request as ready for review December 13, 2022 16:42

Ensure subchart notes are not treated like templates

ec6d624

Update minimum helm version to fix prometheus chart issue

1527380

Seen here: prometheus-community/helm-charts#2723

deepy reviewed Dec 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Create optional monitoring configuration for Mastodon using Grafana #12

WIP: Create optional monitoring configuration for Mastodon using Grafana #12

ywwg commented Dec 9, 2022 •

edited

ineffyble commented Dec 14, 2022

ywwg commented Dec 14, 2022

ywwg commented Dec 15, 2022

deepy Dec 15, 2022

ywwg Dec 16, 2022

deepy commented Dec 15, 2022

ywwg commented Dec 16, 2022 •

edited

renchap commented Feb 15, 2023

WIP: Create optional monitoring configuration for Mastodon using Grafana #12

Are you sure you want to change the base?

WIP: Create optional monitoring configuration for Mastodon using Grafana #12

Conversation

ywwg commented Dec 9, 2022 • edited

ineffyble commented Dec 14, 2022

ywwg commented Dec 14, 2022

ywwg commented Dec 15, 2022

deepy Dec 15, 2022

Choose a reason for hiding this comment

ywwg Dec 16, 2022

Choose a reason for hiding this comment

deepy commented Dec 15, 2022

ywwg commented Dec 16, 2022 • edited

renchap commented Feb 15, 2023

ywwg commented Dec 9, 2022 •

edited

ywwg commented Dec 16, 2022 •

edited