Prometheus client metrics support #711

libretto · 2023-09-06T22:48:00Z

This code provides the option to use Prometheus for collecting statistics in Karapace. The branch is based on karapace-metrics branch.

libretto · 2023-09-14T07:09:42Z

@jjaakola-aiven @aiven-anton could You review this PR too?

aiven-anton · 2023-11-08T09:03:49Z

Hello @libretto,

Sorry for the long wait for a response here. We've convened internally about how to move forward with your contribution, and I'll summarize it here.

There are three issues we that must be addressed for this to be mergable:

~~Expose the Prometheus exporter under existing server instead of starting a separate one. Remove the configuration options for Prometheus host and port.~~ Let's leave this as-is.
Incorrect metric semantics. This has been iterated a couple times in review already. This means for instance that request-size needs to be changed into two separate metrics: request_size_total + request_count`. This applies to other metrics in the PR as well.
Remove the psutil connection count probing. This should rather be gathered by an OS-level metric, we don't want the application probing the OS for this itself. If the application is to report this, it needs to be based on some application level metric, like some actual internal counter of handled requests, and not by probing the OS for amount of process connections.

Since the psutil and schedule dependencies were brought in for this purpose only, let's also skip adding those.
Let's skip the updates to README.md, so that we can let these changes be undocumented and subject to change while we're stabilizing it.

Since the branches have diverged, let's also close the original PR now, and make any future changes to this branch only.

libretto · 2023-11-15T19:02:56Z

Hello @aiven-anton, I just try answer Your comments:

Hello @libretto,

Sorry for the long wait for a response here. We've convened internally about how to move forward with your contribution, and I'll summarize it here.

There are three issues we that must be addressed for this to be mergable:
* ~Expose the Prometheus exporter under existing server instead of starting a separate one. Remove the configuration options for Prometheus host and port.~ _Let's leave this as-is._

* Incorrect metric semantics. This has been iterated a couple times in review already. This means for instance that `request-size` needs to be changed into two separate metrics: `request_size_total + `request_count`. This applies to other metrics in the PR as well.

The metric semantics applied in this context are determined by the SchemaRegistry product, as detailed in their documentation. To achieve exact compatibility with SchemaRegistry, adherence to their specified metric names is necessary.

* Remove the psutil connection count probing. This should rather be gathered by an OS-level metric, we don't want the application probing the OS for this itself. If the application is to report this, it needs to be based on some application level metric, like some actual internal counter of handled requests, and not by probing the OS for amount of process connections.

Is there a method to determine the number of connections in the Karapace application? I haven't been able to find a way to access the list of connections within our app. Is it possible I overlooked something?

  Since the `psutil` and `schedule` dependencies were brought in for this purpose only, let's also skip adding those.

* Let's skip the updates to README.md, so that we can let these changes be undocumented and subject to change while we're stabilizing it.

Ok

Since the branches have diverged, let's also close the original PR now, and make any future changes to this branch only.

Ok

aiven-anton · 2023-11-16T09:33:39Z

@libretto

The metric semantics applied in this context are determined by the SchemaRegistry product, as detailed in their documentation. To achieve exact compatibility with SchemaRegistry, adherence to their specified metric names is necessary.

Metrics is not an area where we'll aim for 1-1 compatibility with Confluent Schema Registry.

Is there a method to determine the number of connections in the Karapace application? I haven't been able to find a way to access the list of connections within our app. Is it possible I overlooked something?

It's possible that this would have to be developed. For now, I would I just skip this and address it later. Do note that we have a long-term plan to switch from the homegrown web framework to FastAPI, so any such work might become moot.

libretto · 2023-12-21T06:39:28Z

Incorrect metric semantics. This has been iterated a couple times in review already. This means for instance that request-size needs to be changed into two separate metrics: request_size_total + request_count`. This applies to other metrics in the PR as well.

@aiven-anton Do You mean it must be coded in the following way?

      
def request(self, size: int) -> None:
        self.request_size_total += size
        self.request_count += 1
        if not self.is_ready or self.stats_client is None:
            return
        if not isinstance(self.stats_client, StatsClient):
            raise RuntimeError("no StatsClient available")
        self.stats_client.gauge("request-size-total", self.request_size_total)
        self.stats_client.gauge("request-count", self.request_count)

aiven-anton · 2024-01-03T14:44:55Z

@libretto Yes, with the exception that these should be counters, and not gauges. So I'd expect to see calls to .increase() instead of .gauge().

libretto · 2024-01-10T14:35:41Z

@aiven-anton done. Review please.

libretto · 2024-01-11T06:18:07Z

BTW. Is the usage of .gauge() and .timing() in the following functions acceptable?

def are_we_master(self, is_master: bool) -> None:
    if not self.is_ready or self.stats_client is None:
        return
    if not isinstance(self.stats_client, StatsClient):
        raise RuntimeError("no StatsClient available")
    self.stats_client.gauge("master-slave-role", int(is_master))


def latency(self, latency_ms: float) -> None:
    if not self.is_ready or self.stats_client is None:
        return
    if not isinstance(self.stats_client, StatsClient):
        raise RuntimeError("no StatsClient available")
    self.stats_client.timing("latency_ms", latency_ms)

eliax1996

Can we have some tests?

libretto · 2024-05-01T19:26:27Z

Can we have some tests?

We can certainly add some tests. However, the current version of Karapace Stats lacks test coverage, so we have no existing framework to guide us. At this point, it seems that only unit tests are feasible. I'm unsure how to implement integration tests for this part task.

eliax1996 · 2024-05-06T15:08:17Z

Can we have some tests?

We can certainly add some tests. However, the current version of Karapace Stats lacks test coverage, so we have no existing framework to guide us. At this point, it seems that only unit tests are feasible. I'm unsure how to implement integration tests for this part task.

Let's kick off with the unit tests. I'll provide an example soon on how to execute an integration test. It's crucial to have a tangible test that ensures our output its compliant to the standard format.
We need to verify that compliant consumers can effectively utilize and extract the metrics.

Without integration tests, even if the feature is technically sound, we lack a mechanism preventing us from the introduction of code that might break the consumers

libretto added 26 commits June 9, 2023 10:08

Karapace metrics

e3cb524

Merge branch 'master' into karapace-metrics

32ce060

fixup issues

8dab84d

fixup issues

2898e31

fixup annotations issue

c974579

fixup exception message

7256f5d

get rid of multiple instances of class

ab6ae96

fixup issue

733d1f2

change code to send raw data only

8751eea

merge with master

53d3e4b

fixup

fedff8f

Merge branch 'master' into karapace-metrics

31d16d4

fixup code

b70ae03

fixup

a0387a3

fixup

358facc

merge

a064624

improve code by request

8533959

merge with main

ac48829

add psutil typing support

90e221c

fixup

4c48576

fixup

f9cb6d8

Merge branch 'main' into karapace-metrics

765864b

merge with master

073aa16

refactor

0c73a1a

fixup

c495c50

prometheus support

6fb96d0

libretto requested review from a team as code owners September 6, 2023 22:48

fixup requirements

1f6fce2

aiven-anton mentioned this pull request Nov 16, 2023

Karapace metrics #652

Closed

libretto added 4 commits December 20, 2023 14:00

merge with master

569d5ef

remove connections counter

0386ca9

fixup lint

5f302a8

skip the README.rst updates

c785e1a

libretto added 2 commits January 10, 2024 16:07

fixup metrics stats usage

5da8ca8

Merge branch 'master' into prometheus2

6f2093e

merge with master

03f1189

aiven-anton previously approved these changes Jan 25, 2024

View reviewed changes

merge with master and fixup conflict

2c06480

libretto dismissed aiven-anton’s stale review via 2c06480 February 13, 2024 20:03

libretto added 2 commits February 21, 2024 23:56

merge with master

7e71ecb

merge with master

c66b48f

eliax1996 requested changes Apr 19, 2024

View reviewed changes

libretto added 3 commits May 23, 2024 01:14

add unit tests

80d92c0

pylint fixes

dafa3f2

merge with main branch

c7e6796

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus client metrics support #711

Prometheus client metrics support #711

libretto commented Sep 6, 2023

libretto commented Sep 14, 2023

aiven-anton commented Nov 8, 2023 •

edited

libretto commented Nov 15, 2023

aiven-anton commented Nov 16, 2023

libretto commented Dec 21, 2023 •

edited

aiven-anton commented Jan 3, 2024

libretto commented Jan 10, 2024

libretto commented Jan 11, 2024 •

edited

eliax1996 left a comment

libretto commented May 1, 2024 •

edited

eliax1996 commented May 6, 2024

Prometheus client metrics support #711

Are you sure you want to change the base?

Prometheus client metrics support #711

Conversation

libretto commented Sep 6, 2023

libretto commented Sep 14, 2023

aiven-anton commented Nov 8, 2023 • edited

libretto commented Nov 15, 2023

aiven-anton commented Nov 16, 2023

libretto commented Dec 21, 2023 • edited

aiven-anton commented Jan 3, 2024

libretto commented Jan 10, 2024

libretto commented Jan 11, 2024 • edited

eliax1996 left a comment

Choose a reason for hiding this comment

libretto commented May 1, 2024 • edited

eliax1996 commented May 6, 2024

aiven-anton commented Nov 8, 2023 •

edited

libretto commented Dec 21, 2023 •

edited

libretto commented Jan 11, 2024 •

edited

libretto commented May 1, 2024 •

edited