Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase monitoring and visibility of kafka/confluent #80

Open
sjahl opened this issue Jun 15, 2021 · 2 comments
Open

Increase monitoring and visibility of kafka/confluent #80

sjahl opened this issue Jun 15, 2021 · 2 comments
Labels
quality Relating to quality control and/or validation efforts sre

Comments

@sjahl
Copy link
Contributor

sjahl commented Jun 15, 2021

I think it's desireable from a cost projection and performance perspective to spend some time figuring out how to monitor and track metrics for our kafka streams in confluent.

Best case scenario is probably to see if we can get the metrics from confluent plumbed into our GCP monitoring account, so that we can dashboard and alert from there with the rest of our app metrics. Next best, is to turn on any monitoring and alerting capabilities that we have in confluent to ensure that we're aware of cost problems and performance issues without manually checking a dashboard.

@sjahl sjahl added the sre label Jun 15, 2021
@sjahl sjahl self-assigned this Jun 15, 2021
@theferrit32
Copy link
Contributor

@sjahl the ccloud CLI can be used to list the clusters we have, list the topics+partitions in the clusters, and then a micro python/clojure program can be used to find info for each topic, like how many messages there are, what is the timestamp of the first and last message, and other stuff we might want to know that's not readily available in the Confluent UI or client. Maybe do some sampling from the topic and estimate the average message size. Could make it a kubernetes job to run once a day or something.

@sjahl
Copy link
Contributor Author

sjahl commented Jun 15, 2021

@theferrit32 Thanks! I'll take a look.

Confluent does have an API for metrics, which might be easier to work with, depending on what format the ccloud cli is outputting metrics in: https://docs.confluent.io/cloud/current/monitoring/metrics-api.html

I also found this: https://github.com/Dabz/ccloudexporter, which exposes the metrics on an HTTP api appropriate for Prometheus to scrape (which I think can use google monitoring as long term storage for the metrics it collects). Prometheus is something that I'm considering deploying anyway for other reasons, so this might be the way to go if that ends up being the case.

@KelseaChang5 KelseaChang5 added the quality Relating to quality control and/or validation efforts label Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quality Relating to quality control and/or validation efforts sre
Projects
None yet
Development

No branches or pull requests

3 participants