Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Create optional monitoring configuration for Mastodon using Grafana #12

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

ywwg
Copy link

@ywwg ywwg commented Dec 9, 2022

Hi all, this is a draft PR of a change to add optional Grafana+Prometheus monitoring to Mastodon instances deployed with this chart set. The goal is to enable instance admins to very quickly add monitoring with all of the tricky parts of wiring together the various scrapers and config files done automatically.

  • Includes automatic scraping of kubernetes node_exporter, Mastodon Statsd output, and Postgres data.
  • Includes four default dashboards.
  • Includes ingress changes to create routes for grafana.hostname

This is a work in progress, but is functional. The main question I'd like to ask with this PR is, is this a feature the Mastodon project is interested in including? I don't want to do the work of continuing to polish and refactor the config if it's not a direction you want to go. The current major issues are the hardcoded values inside values.yaml that I would like to replace with auto-generated URLs, but I am having trouble figuring out how to do that in Helm and given the available ways to configure the services.

Major TODOS:

  • make the default dashboards fully functional / usable. Right now there are some incorrect queries
  • do more parameterization, esp where release name and namespace are currently hard-coded
  • move as much boilerplate out of values.yaml as possible

…nd Prometheus.

Includes automatic scraping of kubernetes node_exporter, Mastodon Statsd output, and Postgres data.
Includes four default dashboards.
Includes ingress changes to create routes for grafana.hostname
@ywwg ywwg changed the title Create optional monitoring configuration for Mastodon using Grafana WIP: Create optional monitoring configuration for Mastodon using Grafana Dec 13, 2022
@ywwg ywwg marked this pull request as ready for review December 13, 2022 16:42
@ineffyble
Copy link
Member

@ywwg It looks like Helm is trying to template your txt file 🙃

@ywwg
Copy link
Author

ywwg commented Dec 14, 2022

@ywwg It looks like Helm is trying to template your txt file upside_down_face

any chance you can re-trigger the workflow? I think I might have fixed it but I can't reproduce the issue locally so I'm not sure

@ywwg
Copy link
Author

ywwg commented Dec 15, 2022

I am new to helm, but it appears that NOTES.txt is supposed to be a templateable yaml file that helm can process.

I believe the version of helm in the CI is too old: prometheus-community/helm-charts#2723

updated helm to 3.7 to see if that fixes it

@@ -67,5 +68,24 @@ spec:
pathType: Exact
{{- end }}
{{- end }}
- host: {{ printf "grafana.%s" .host | quote }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't be setting up the ingress for Grafana, it's better to use their Values.ingress instead of having to maintain a copy of it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll work on that

@deepy
Copy link
Contributor

deepy commented Dec 15, 2022

Doesn't it seem a little bit heavy to add 4 more subcharts here?

@ywwg
Copy link
Author

ywwg commented Dec 16, 2022

Doesn't it seem a little bit heavy to add 4 more subcharts here?

That's one reason I wanted to post the in-progress chart, to get a sense of what the goal of this chart is and how full-featured y'all want it to be. For instance, if this chart exists just to deploy mastodon, then it's probably superfluous to include the monitoring stack. I was thinking this chart could be used by admins who want to deploy a solid mastodon instance and are maybe less familiar with running an important service like this. I have seen a lot of messages in my feeds by admins who were surprised to run out of disk, or that the server went down over night. In that light, this chart could be used to help these less-experienced admins set up a more robust instance without having to worry about configuring monitoring themselves. (And those who don't need the monitoring stack or want to roll their own can disable it easily).

To answer your question specifically, in order to have a functional monitoring stack we need at least three of the charts: prometheus, which gathers and stores the metrics; the statsd exporter, to record metrics from mastodon; and grafana, to view the metrics. We could easily drop the postgresql chart if we don't care about providing access to postgres stats.

It also might help to talk about what "heavy" means in this case -- do you mean the maintenance burden of keeping them functional?

@renchap
Copy link
Sponsor Member

renchap commented Feb 15, 2023

Hi there, and first of all thanks @ywwg for this PR!

I am working on moving mastodon.social & mastodon.online to K8s. For now mastodon.online has been deployed using home-made K8s resources (we needed to move it fast) but we are looking into using the official chart for those servers.

We have our existing monitoring deployment, and we dont want to have the Prometheus stack deployed using this chart. But it would probably a good idea to have a way to get the dashboards, and the various Monitor resources created by the chart so an existing prometheus-operator deployed on the cluster would start scraping the resources.

Would this be a valid alternative to what you have done?

I was thinking this chart could be used by admins who want to deploy a solid mastodon instance and are maybe less familiar with running an important service like this.
For this concern, I think we could have a "meta" chart, that has more sub-chart dependencies and can (optionnaly) install all the parts of a Mastodon chart, including postgres, redis, prometheus… so people who want a fully-managed setup have this option.

On another topic, I see you chose to deploy the statsd_exporter as its own deployment. Did you consider having the statsd_exporter as a sidecar for Puma & Sidekiq pods, and all of those being scraped by Prometheus? This is the setup recommended in the exporters's README and what I deployed on mastodon.online, with good results so far.

The last issue I am looking to solve is being able to host multiple Mastodon instances on the same cluster / Prometheus monitoring. This is important for us, so I am looking into adding a label to all Mastodon-related metrics to specify the Mastodon server/instance the metrics refer to, and patching the dashboard to be able to view metrics only from the selected instance. Does this makes sense to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants