Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Prometheus metrics to AMQP Operator #2

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lieberlois
Copy link

This PR introduces prometheus metrics to the operator. The supported metrics are the following prometheus metrics:

PROMETHEUS_HANDLER_CALLS_TOTAL_COUNTER = Counter(
    'handler_calls_total', 
    'Amount of times a handler has been called', 
    ["handler", "action"]
)

PROMETHEUS_HANDLER_EXCEPTION_COUNTER = Counter(
    'handler_exceptions_total', 
    'Amount of exceptions thrown in kopf handlers', 
    ["handler", "action"]
)

PROMETHEUS_RESOURCES_CREATED_TOTAL_GAUGE = Gauge(
    'resources_created_total', 
    'Amount of resources created using the operator', 
    ["type"]
)

While handler calls and exceptions are maintained manually using calls like

PROMETHEUS_HANDLER_CALLS_TOTAL_COUNTER.labels(handler=_HANDLER_NAME, action=_ACTION_NAME).inc()

the created resources are monitored using Kopfs In Memory Index feature. This enables us to monitor resources of any kind without having to query the Kube API on a regular basis.

Note: right now, it seems like the in-memory indices are only populated when a new resource is applied , but not when the operator starts up. This is unexpected behavior, since the docs suggest otherwise here. Feel free to contribute or comment if you know more about this.

The metrics that are scraped by prometheus look like this in practice:
Bildschirmfoto 2022-12-28 um 11 39 54

Copy link
Collaborator

@swoehrl-mw swoehrl-mw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regards to the kopf index: As discussed offline, using a (dummy) resume handler to set the count for number of resources should work as according to the docs the index is complete before any resource handlers are called.

Please also add a paragraph to the README and mention that the prometheus operator is needed for metrics to work (because of the ServiceMonitor CRD).

Note: For the changes that are the same in all handlers I only commented on the first one, but they apply to all.

for value in index.values():
resources += value

for resource in resources:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is done in two loops (collecting resources first, then couting them)?

)

_HANDLER_NAME = "broker"
_RESOURCE_TYPE = "AMQPBroker"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to take the name from the type object (k8s.AMQPBroker.kind), that way the name is only written in one place.

type: {{ .Values.service.type }}
ports:
- port: {{ .Values.prometheus.port }}
targetPort: {{ .Values.prometheus.port }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is hardcoded in the python source, it makes no sense to parametrize it here.

interval: {{ .Values.prometheus.scrapeInterval }}
path: {{ .Values.prometheus.path | default "/" }}
---
apiVersion: v1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to have a separate Service object and just add a second port to the main service.

name: {{ .Release.Name }}
namespace: {{ .Release.Namespace }}
labels:
name: {{ .Release.Name }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use the same labels as the deployment ({{- include "operator.labels" . | nindent 4 }}).

@@ -77,3 +77,9 @@ affinity: {}

strategy:
type: Recreate

prometheus:
enabled: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default should be false, to not interfere with the quickstart installation that does not require prometheus.

_ACTION_NAME = ACTIONS.RESUME
PROMETHEUS_HANDLER_CALLS_TOTAL_COUNTER.labels(handler=_HANDLER_NAME, action=_ACTION_NAME).inc()

with (PROMETHEUS_HANDLER_EXCEPTION_COUNTER.labels(handler=_HANDLER_NAME, action=_ACTION_NAME)).count_exceptions():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to do this with a (custom) decorator function? Would be nicer to read


PROMETHEUS_HANDLER_CALLS_TOTAL_COUNTER = Counter(
'handler_calls_total',
'Amount of times a handler has been called',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick. Number of times sounds better than amount. Also for the other descriptions below.

Suggested change
'Amount of times a handler has been called',
'Number of times a handler has been called',

["type"]
)

class ACTIONS:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nicer as an enum.

@kopf.index(*k8s.AMQPTopicSubscription.kopf_on())
@kopf.index(*k8s.AMQPTopic.kopf_on())
def resource_index(namespace, body, **_):
return { namespace: body.get("kind") }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the kind be the key with namespace+name as the value?
Something like return {body.get("kind"): f"{namespace}:{name}"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants