-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Prometheus metrics to AMQP Operator #2
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regards to the kopf index: As discussed offline, using a (dummy) resume handler to set the count for number of resources should work as according to the docs the index is complete before any resource handlers are called.
Please also add a paragraph to the README and mention that the prometheus operator is needed for metrics to work (because of the ServiceMonitor CRD).
Note: For the changes that are the same in all handlers I only commented on the first one, but they apply to all.
for value in index.values(): | ||
resources += value | ||
|
||
for resource in resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this is done in two loops (collecting resources first, then couting them)?
) | ||
|
||
_HANDLER_NAME = "broker" | ||
_RESOURCE_TYPE = "AMQPBroker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to take the name from the type object (k8s.AMQPBroker.kind), that way the name is only written in one place.
type: {{ .Values.service.type }} | ||
ports: | ||
- port: {{ .Values.prometheus.port }} | ||
targetPort: {{ .Values.prometheus.port }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is hardcoded in the python source, it makes no sense to parametrize it here.
interval: {{ .Values.prometheus.scrapeInterval }} | ||
path: {{ .Values.prometheus.path | default "/" }} | ||
--- | ||
apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer not to have a separate Service object and just add a second port to the main service.
name: {{ .Release.Name }} | ||
namespace: {{ .Release.Namespace }} | ||
labels: | ||
name: {{ .Release.Name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the same labels as the deployment ({{- include "operator.labels" . | nindent 4 }}
).
@@ -77,3 +77,9 @@ affinity: {} | |||
|
|||
strategy: | |||
type: Recreate | |||
|
|||
prometheus: | |||
enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be false, to not interfere with the quickstart installation that does not require prometheus.
_ACTION_NAME = ACTIONS.RESUME | ||
PROMETHEUS_HANDLER_CALLS_TOTAL_COUNTER.labels(handler=_HANDLER_NAME, action=_ACTION_NAME).inc() | ||
|
||
with (PROMETHEUS_HANDLER_EXCEPTION_COUNTER.labels(handler=_HANDLER_NAME, action=_ACTION_NAME)).count_exceptions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to do this with a (custom) decorator function? Would be nicer to read
|
||
PROMETHEUS_HANDLER_CALLS_TOTAL_COUNTER = Counter( | ||
'handler_calls_total', | ||
'Amount of times a handler has been called', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick. Number of times sounds better than amount. Also for the other descriptions below.
'Amount of times a handler has been called', | |
'Number of times a handler has been called', |
["type"] | ||
) | ||
|
||
class ACTIONS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nicer as an enum.
@kopf.index(*k8s.AMQPTopicSubscription.kopf_on()) | ||
@kopf.index(*k8s.AMQPTopic.kopf_on()) | ||
def resource_index(namespace, body, **_): | ||
return { namespace: body.get("kind") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the kind be the key with namespace+name as the value?
Something like return {body.get("kind"): f"{namespace}:{name}"}
This PR introduces prometheus metrics to the operator. The supported metrics are the following prometheus metrics:
While handler calls and exceptions are maintained manually using calls like
the created resources are monitored using Kopfs In Memory Index feature. This enables us to monitor resources of any kind without having to query the Kube API on a regular basis.
Note: right now, it seems like the in-memory indices are only populated when a new resource is applied , but not when the operator starts up. This is unexpected behavior, since the docs suggest otherwise here. Feel free to contribute or comment if you know more about this.
The metrics that are scraped by prometheus look like this in practice: