#1340 better metric support #1343

jelmd · 2023-12-06T23:12:34Z

Fix #1340 - this PR provides live monitoring of the turn server application by providing up-to-date metrics exposed via HTTP in OpenMetrics/Prometheus Exposition format at relay-thread level per default, on relay-thread/session level if desired. For more details see docs/Metrics.md.

use https://github.com/jelmd/libprom instead of the bugyy digitalocean/prometheus-client-c including its compact feature.

jelmd · 2023-12-06T23:16:37Z

Build fail because CI tries to link against the wrong/buggy prometheus-client-c lib.
See 121b319

jelmd · 2023-12-07T21:47:40Z

These tests are an absolute nightmare! They seem to be pretty bogus. Please advise, what's going wrong, what needs to be fixed.

Actually the s390x bloat never succeed and prevents a priori all PRs from being accepted. Never heard, that anybody is running coturn on a non-{Linux,Solaris,BSD} platform. Having tests for all possible OS/arches is one thing, actually testing against all is a completely different thing - not a green at all and wastes a lot of resources. So IMHO the test matrix should be revised to reflect, what is really needed.

jelmd · 2023-12-08T16:17:58Z

Test bugs (home grown install wrt. libprom libs and includes and different clang-format versions depending on the distro in use) fixed.

jelmd · 2023-12-08T17:14:11Z

fixes #1337, too.

jelmd · 2023-12-16T00:09:21Z

Nobody there to review this PR?

@KangLin , @eakraly , @misi ?

It really helps to better understand coturn and get a feeling for it. E.g. the grafana dashboard I use:

KangLin

Very good work!

Why use https://github.com/jelmd/libprom instead of dirt bikes
digitalocean/prometheus-client-c?

KangLin · 2023-12-16T04:28:10Z

src/apps/relay/mainrelay.h

The recommendation is defined as a structure:

struct _PROMETHEUS
{
int prom;
int prom_port;
int prom_compact;
int prom_realm;
int prom_usernames;
int prom_usid;
int prom_rsid;
vint prom_sid_retain;
vint log_ip;
};

KangLin · 2023-12-16T05:33:14Z

docs/Prometheus.md

@@ -1,8 +1,8 @@
-# Prometheus setup
+# Metrics setup



Add in here:
Metrics introduction

KangLin · 2023-12-16T05:34:38Z

docs/Metrics.md

+- https://docs.victoriametrics.com/keyConcepts.html
+- https://docs.victoriametrics.com/FAQ.html#what-is-high-cardinality
+- https://docs.victoriametrics.com/FAQ.html#what-is-high-churn-rate
+


Add in here:

Usage

E.g. add the grafana dashboard configure and usage documents.

Metrics setup

KangLin · 2023-12-16T07:06:52Z

src/server/ns_turn_session.h

+  pms_t *sample_session_state;
+  pms_t *sample_stun_req;
+  pms_t *sample_stun_resp;
+#endif


The recommendation is defined as a structure

eakraly · 2023-12-19T06:37:48Z

Thank you @jelmd for this monumental piece of work.

But I do not think we can accept this PR in its current form:

It does not do what it claims it does - there is a bunch of changes all losely related to prometheus interface
Yes, digitalocean/prometheus-client-c looks abandoned but I doubt it will serve anyone to replace it with another potentially undermaintained library. Not sure what other good candidates may be though.
Realtime update of metrics: great idea but it has nothing to do with a different library. Should be separate.
Removing support for platforms - why?
clang-format let's not reformat back and forth
There is a lot more metrics collected which creates a problem, and introduces solutions to the problem ...
Packages or sources stored on a university ftp server running openssl/1.0.2u (deprecated 4 years ago) will not be taken seriously by security conscious people. Which will introduce work in the future (to address those concerns) or people would just stop using coturn silently for lack of trust - speaking from experience here on both accounts.
Additional labels like realm: good idea! But again, nothing to do with the library and should be separated out. Another thing is that this going toward using prometheus as debugging which it is not (IP address). But I can see how commercial users would benefit from that (while destroying performance obviously and introducing a way to break)
Solutions to keep prometheus consumed memory in check: I need to dive more into this - it's a double edged sword and moves the problem upstream (and creates new problems that need solutions). Restarting counters is not a trivial thing.
We should have new tests, especially for the new functionality

Though this list is long and negative - there are many changes we can adopt but as separate PRs. I'm trying to be practical - if something gets broken it is going to be very hard to track it and fix it (broken as in addition to the list above).

eakraly

Sending back to author to split into separate PRs

jelmd · 2023-12-20T00:28:41Z

Very good work!

* Why use https://github.com/jelmd/libprom instead of dirt bikes
  digitalocean/prometheus-client-c?

Basically the Readme gives you a coarse overview. Also it exposes stringbuffer and simple log interface, so that one does not need to re-invent the wheels to just add metrics and stuff to its "project" (but can if needed).

So way much better user experience than the religious (machine translated Go to C?) interface.

jelmd · 2023-12-20T04:34:41Z

Thank you @jelmd for this monumental piece of work.

Thanx @eakraly for your time and having a look at it!

But I do not think we can accept this PR in its current form:
1. It does not do what it claims it does - there is a bunch of changes all losely related to prometheus interface

Not sure, what you mean. Is it possible to pinpoint the sections, which are a problem? Basically my focus is not Prometheus, but metrics. They finally get exposed in OpenMetrics/Prometheus exposition format, yes - but I wouldn't insist on this (a lot of other formats are possible). The meat is, that the metrics gets stored somehow, and to not re-invent the wheel, libprom is good enough to store the samples and as side effect can be used easily to expose the data. However, OpenMetrics/Prometheus exposition format is that simple, that one could generate a report simply via metric_name[{key=value}[,key=value]...}]=%g\n... and does not need libprom at all (and libpromhttp is a really simple wrapper around libmicrohttpd so not really needed, too). It just helps the "do not want all the bloat" people, makes life easier.

2. Yes, digitalocean/prometheus-client-c looks abandoned but I doubt it will serve anyone to replace it with another potentially undermaintained library. Not sure what other good candidates may be though.

Yes, if no-one uses it, vendors certainly are not going to include it in their distros (is there really a difference between un- and under-maintained?). I'm not sure, what "potentially undermaintained" means in this context. Do you mean, it needs to be bloated every month or year to give people the impression, that some stuff got really changed and urge them to update ASAP? Hmm, I maintain stuff for free, so I update on demand and not to earn money for questionable stuff. However, if there is a good sponsor, I could increment the version on demand each month ;-).

3. Realtime update of metrics: great idea but it has nothing to do with a different library. Should be separate.

What? The problem right now is, that "metric" data get updated when an allocation gets closed. So one can really forget it - the data are worth nothing at all - just paper weight, do not provide, what a provider really needs. Similar is the redis stuff - beside that it doesn't provide life time data but coarse grained samples it is probably more resource demanding than the prom stuff (IIRC I've seen some issues regarding high CPU usage wrt. redis). So right now, coturn really lacks a proper monitoring facility, which allows operators to identify leakages and bottlenecks, monitor bandwidth and resource usage, and to size the envs as needed.

4. Removing support for platforms - why?

You mean dropping s390x stuff? Well, actually I just ditched it, because the tests always failed on it: not because the software had an error but because the test platform was so unreliable, that the tests couldn't even be started. =8-(

Very, very bad test environment, which prevents PRs to succeed just because the test platform can not be kept in a stable state to be able to start any test at all. This furthermore rises the questions: For what purpose gets the software tested on any known platform on earth? IMHO not green at all, a huge waste of resources for nothing but "compiles on every platform on earth". Who cares? Who needs it? x86_64 is sufficient for our purposes, if it works on arm64 or sparc64 and friends, fine. Everything else: please give a hint!

5. clang-format let's not reformat back and forth

The PR rules want make lint to succeed. So there is not really an option, unless one approves PRs, which do not pass ..., which probably happened in the past (or a different linter has been used and a new one got introduced without testing the current sources) =8-( . E.g. tests were made against Ubuntu 16.04 (BTW: unsupported since April 2021) and were ok, however, make lint tests were made on Ubuntu 20.04 (or 22.04) and failed, on old unchanged code!!! So either the test under 20.04 failed or if fixed for this, they failed for 16.04. Hmmm, yes, the current test env seems to be more a kindergarden chaos than a proper CI env.

6. There is a lot more metrics collected which creates a problem, and introduces solutions to the problem ...

Ehmm, nope. Can you elaborate please? Right now, as said, metrics are completely useless. They even do not deserve the term monitoring, it is just a junk yard of meaningless data. With the PR one gets the first time the opportunity to really monitor coturn application on the fly and being able to turn the knobs as needed. Depending on the degree you want to play with your knobs, you may monitor coarse grained at relay-thread level or even more fine grained on session/allocation level. So everyone's milage may vary, but the default level should be sufficient for production envs and overhead negligible (I would scare much more, if redis is enabled). Of course, if one uses its old C64, all bets are off ;-).

7. Packages or sources stored on a university ftp server running openssl/1.0.2u (deprecated 4 years ago) will not be taken seriously by security conscious people. Which will introduce work in the future (to address those concerns) or people would just stop using coturn silently for lack of trust - speaking from experience here on both accounts.

Thanks for nitpicking this one :). Actually paranoid people would download the source via github using HTTPS and compile it by themselves. Really paranoid people would download, review and probably file an issue, if they discover something, what is probably not ok (whatever this means). All others do not really care at all, would be happy even with HTTP, too (my experience so far) - they just want to get it work. Anybody can host/provide it for its customers/clients. So "lack of trust" or "stop using coturn" here is IMHO a really questionable assumption without any proof or relation to reality (usually distro vendors provide packages, not the users themselves, and in turn if they do not trust their vendors ...), especially because it is OpenSource! Last but not least, yes, people trust any software much more just because they can download it via OpenSSL 1000+.x.y instead of 1.0.2? Yepp, that's for sure, this software must be safe ;-).

8. Additional labels like realm: good idea! But again, nothing to do with the library and should be separated out. Another thing is that this going toward using prometheus as debugging which it is not (IP address). But I can see how commercial users would benefit from that (while destroying performance obviously and introducing a way to break)

Ehhhm, realm is not my idea, but I can see use for it (e.g one may group metrics by realm). Therefore I just kept it, but only as session_state attribute - would otherwise blow up the metrics numbers, if it really gets used with different values (threadID:sessionID is already unique, so other labels are just add. info - noise - one usually does not need). Wrt. to performance: I doubt that general assumptions and prefer some numbers. Actually I thing, if one disables redis and just use the prom stuff, it would be much, much better wrt. performance and less resource demanding. Unfortunately I haven't had time to waste my time with redis - just had a look at the code :(.

9. Solutions to keep prometheus consumed memory in check: I need to dive more into this - it's a double edged sword and moves the problem upstream (and creates new problems that need solutions). Restarting counters is not a trivial thing.

I'm not sure, whether I understand, what you mean. For each metric you need to keep counters somewhere. If one does it file based, no one needs to wonder about performance. So the obvious things to do is to keep counters in memory. Simply apps do it by maintaining global vars, others do it by using a lib like libprom to keep counters (key:value pairs alias samples) in a hash map and provide an API, which make it thread-safe (thread-safe is not really an issue for coturn, because the related counters are touched always by the same thread anyway). So basically, one could completely ditch libprom et al, if one bloats its one code with counters (like int server->metricXY), updates it in a thread-safe manner and generates a report as needed. So can't see the double edged sword and would be happy to understand, what you mean.

10. We should have new tests, especially for the new functionality

What tests? Right now, it is by far much better than before. Wondering, whether there are any tests for the current useless metrics implementation (or even redis things?)?

Though this list is long and negative - there are many changes we can adopt but as separate PRs. I'm trying to be practical - if something gets broken it is going to be very hard to track it and fix it (broken as in addition to the list above).

Not really and not sure whether it makes any sense to break it into separate PRs - actually I think it would make it much harder to understand the features implementation. But lets see, I'm open to any suggestions :)

KangLin

@jelmd

prometheus-client-c is pre-distributed in most systems
It took me two days to look at the source code(prometheus-client-c and your libprom) and learn prometheus.
- I think the interface of prometheus-client-c is well defined. Simple to use. Although its implementation is a bit bug, but it does not affect our use.
- IMHO, the interface defined by libprom is a bit messy（machine translated）. Some of the internal interfaces are exposed. If used in a project, it can cause a performance penalty.

So I recommend going ahead with prometheus-client-c.
You fix bug. please pull request to prometheus-client-c.

KangLin · 2023-12-21T01:52:24Z

src/apps/common/win/getopt.h

+// clang-format off
+//	different clang-format versions have different opinions on this and prevent PRs to succeed
+enum                 /* permitted values for its `has_arg' field...	*/
+{ no_argument = 0,   /* option never takes an argument	*/


Ehmm, the PR failed because ’make lint’ was not happy with it, when it got linted on ubuntu 20.04. If fixed for the linter there, the ’make lint’ for ubuntu 22.04 failed. So a deadlock for PRs (wondering, how all the other PRs got through ;-)).

KangLin · 2023-12-21T07:49:18Z

@jelmd About http server:

Use prometheus-client-c's built-in promhttp directly in the program. Currently it works in the admin service thread.
In practice, I think that in most cases, nginx, apache etc http servers will be used. So I think there needs to be a plugin as well. prometheus-client-c already provides a simple and easy-to-use interface 😄

jelmd · 2023-12-21T22:09:52Z

@jelmd

* prometheus-client-c is pre-distributed in most systems

Really? Is any software actually using it? Didn't found any, when decided to fork (but obviously it was not the only reason to fork). Really strange, that this gets bundled. But package maintainer do not necessarily understand the software they package or have a closer look on it, sometimes they just volunteer to package it because someone asked for it ... ;-)

* It took me two days to look at the source code(prometheus-client-c and your libprom) and learn prometheus.

Thanx a lot :)

  * I think the interface of prometheus-client-c is well defined. Simple to use.

cluttered, sometimes not really smart/inefficient, nightmare to build (just look at the bloat coturn CI needs to get something usable out of it), buggy wrt. histograms and even tests, ...

Although its implementation is a bit bug, but it does not affect our use.
* IMHO, the interface defined by libprom is a bit messy（machine translated）.

Messy? Plz give a hint.

Some of the internal interfaces are exposed.

That's exactly the point. IMHO reinventing the wheel does not solve any problem, this is a problem. I wrote some metric exporters, enhanced apps with it and always found it so boring, unproductive that one has to copy-and-paste all the boiler plate code like "loggers" and/or string buffers etc.. Therefore I exposed them intentionally, so that one may focus on the real task (data collection and metric export) and does not need to to waste a lot of time to find full blown loggers, string buffers, etc. libs, which are actually not needed, because they are already there. So the lib should not be understood as a data collection/formation thing, but as a utility that helps to get the more or less pretty simple job done. Finally an interface is an interface and the maintainer decides, whether to expose it or to keep it hidden because unstable or buggy or just a bad hack, whatever ...

If used in a project, it can cause a performance penalty.

How? It is actually the purpose to have something, which is safe and fast enough to do the job. Not sure, if someone has to write a new string buffer implementation, whether it is bug free and as fast as the one in libprom ;-) Same thing wrt. to a simple logger. Last but not least, as for any lib, one does not need to use all the features/functions it provides (e.g. libapr, libexpat, libssl, ...) - it just offers functions to make the life easier, there is no "you have to use ...".

So I recommend going ahead with prometheus-client-c. You fix bug. please pull request to prometheus-client-c.

Sorry but nope, there was a reason to fork it and if you have a look on the diffs, at its Issues and PRs it is pretty obvious, that a) the project is dead, b) nobody would review a PR and thus c) it would be just a huge waste of time (I do not have BTW) to create a PR, which sits in the review queue forever. Actually some of the reasons, why github encourages people to fork ;-)

jelmd · 2023-12-22T00:39:26Z

@jelmd About http server:

* Use prometheus-client-c's built-in promhttp directly in the program.

What do you mean. Plz name what you mean! You mean promhttp_start_daemon() ? Otherwise I do not know, what the "built-in promhttp" should be.

Currently it works in the admin service thread.

Everything needs to be started somewhere, no change here BTW. It gets started at the same place as before, which is ok - all other threads get started there, too. And for completeness, promhttp_start_daemon() is just a simple wrapper around MHD_start_daemon(), which in turn just creates some worker-threads, which listen for requests at the specified port and call the related handler, if a request comes in - really, really simple stuff - just have a look at the source.

So no, it does not "work in the admin service thread" (wouldn't it block it BTW, if it would ;-)), just the serving threads get started there, only.

About HTTP servers. There are several solutions:

The HTTP service is used as a thread for the turnserver. Currently in use.

Pros: Easy to develop
Cons:
- Stability is determined by the developer's capabilities
- A turnserver must contain the full functionality of an HTTP server (turnserver must pay for the resource overhead of the HTTP server, such as CPU, memory, program size, etc.)

The HTTP service is used as a child process of turnserver.

Pros:
- High stability. Even if the HTTP service crashes, it doesn't affect turnserver
Cons:
- A turnserver must contain the full functionality of an HTTP server (turnserver must pay for the resource overhead of the HTTP server, such as CPU, memory, program size, etc.)
- Difficult to develop. The problem of inter-process communication needs to be solved.

The HTTP server is used as a programe(as push gateway). Develop the appropriate plugins. For example: nginx, apache etc http servers; push gateway

Pros:
- High stability. Even if the HTTP service crashes, it doesn't affect turnserver
- turnserver doesn't need to include an HTTP server. Reduces the resource overhead of the HTTP server, such as CPU, memory, and program size.
- Easy to use. One HTTP server can serve multiple turnservers
Cons: Difficult to develop. Need to:
- Inter-process communication.
- Cross-machine communication.

We are currently talking about option 1.

From the perspective of Prometheus design and stability, it is recommended to use Option 2.

In O&M. Option 3. is also needed.

Functional requirements for Prometheus:

Provides operations on metric types (counter, gauge, and histogram).
Provide conversion of metric types (counter, Gauge, Histogram) to data model (string format)
Provide a full HTTP service function.(used as a thread)

Well, prometheus-client-c provides just that.

Provides operations on metric types (counter, gauge, and histogram).
Provides prom_collector_registry_bridge
to implement the conversion function.
Has implemented an HTTP server in one thread as option 1. and it provide pipe too(It can be used as an option to achieve Option 2). promhttp_start_daemon()

Starting a thread in the turnserver main() to call promhttp_start_daemon() is a simple HTTP service

Implementation of option 1

Start a sub-thread in the main thread of turnserver to call promhttp_start_daemon() to simply complete the HTTP service. And it already contains Metric types conversion operations (hidden from the user).
Define and document the metrics to be monitored
Initialize the metric
Call the function of the manipulative indicator where needed

@jelmd @eakraly
Let's discuss option 1 first, and if we still have the energy, we will do other options.

jelmd · 2023-12-22T17:29:48Z

About HTTP servers. There are several solutions:

Ehmm, 1st thing to note: libmicrohttpd as used in libprom is a http server (it even support https). It is not named nginx, apache httpd, or whatever, but it is an http server, pretty stable and used in several other projects (e.g. like flutter). But if one doesn't trust it, one can try to provide its own implementation, like coturn does for the web admin interface, or acme-redirect, etc. ...

1. The HTTP service is used as a thread for the turnserver. Currently in use.

* Pros: Easy to develop
* Cons:
  
  * Stability is determined by the developer's capabilities

and other projects and community using it. BTW: I use it for several years and never had an issue with it.

  * A turnserver must contain the full functionality of an HTTP server (turnserver must pay for the resource overhead of the HTTP server, such as CPU, memory, program size, etc.)

Please define "full functionality of an HTTP server". The metrics interface speaks HTTP as far as needed and that's it. Nobody claims, that it needs to provide all the same features as apache httpd, or nginx, etc. do - so if you mean this, the answer has to be: plain wrong - it does not need to .... The protocol to get the data is simply HTTP and here the interface just needs to "print out" all metrics if a correct request with the path /metrics has been received. That's it. Everything else should be answered with Bad request unless the implementation decided to provide additional data via other URLs.

2. The HTTP service is used as a child process of turnserver.

You mean a forked fat process?

* Pros:
  
  * High stability. Even if the HTTP service crashes,

What happens if a thread dies ;-) How should coturn control the forked process?

it doesn't affect turnserver

How get the metrics xfered to the forked process/its threads? I would be happy to see a "scott me up beamy" implementation, but as long as this isn't usable on earth I doubt "doesn't affect" wrt. to any problems and especially wrt. performance.

* Cons:
  
  * A turnserver must contain the full functionality of an HTTP server,

Again, please define "functionality of an HTTP server". Is it possible to make a list of what you think libmicrohttpd is missing? As already said, in general a metrics provider should be able to answer HTTP requests and provide the metrics via /metrics path - that's all.

(turnserver must pay for the resource overhead of the HTTP server, such as CPU, memory, program size, etc.)

Where is the difference? If you wanna answer an HTTP request you always need a request listener which (no matter whether you hide it in a cascade of other servers spawned as thread or fat processes, or process it directly). And you need certainly a request processor which answers the request. No matter where it sits, or how one names its cat, the resources are always needed. However, one can do it in an efficient way (right now, similar as coturn does with the relay-threads), or by introducing a lot of overhead, complexity (perhaps IPC, or file based exchange ;-) ), which would be simply a nightmare. I prefer KISSes.

BTW: Especially in the coturn context I hear always "performance, performance, performance", but read actually the opposite incl. a lot of unfounded assumptions. Where does this come from?

  * Difficult to develop. The problem of inter-process communication needs to be solved.

Indeed. It does not make any sense in this context.

3. The HTTP server is used as a programe(as push gateway). Develop the appropriate plugins. For example: nginx, apache etc http servers; [push gateway](https://github.com/prometheus/pushgateway)

Dangerous misconception, flawed design. Again, just because a service provides data via HTTP protocol, it does not need to provide all the features all other known HTTP servers provide. The purpose is key. If some wants to put a proxy in front of it, for whatever strange reason, he can do it. However, if one does this, the env certainly has already other severe problems regarding security, performance, stability - basically all what you claim it should actually solve ;-). At this stage it is better to do the homework first, and consider providing services, when this has been done.

* Pros: 
  * High stability. Even if the HTTP service crashes, it doesn't affect turnserver

If you make such assumption, please provide some proof of it. I think libmicrohttpd users/developers would answer back: "The opposite is the case". Just the much higher complexity (features they provide, third party SW they use), is an indicator, that they are affected by much more bugs and thus "high stability" in this context is a really questionable term. One may also check the Issues and PR pages of the related projects to get more indicators, how "stable" they actually are. Last but not least google CVE for the mentioned servers and check the results - apache httpd e.g. wrt. to libapr/libexpat/libxslt/..., nginx for a lot of other things which are completely unrelated to the metrics interface, ... ;-)

Deduced from the apps/utilities I have in use for several years and what I've read so far including the mentioned web pages I can say: libmicrohttpd and its use is at least as stable as all the other thingies you mentioned and is probably much more reliable.
If you request processor is flawed/buggy (and dies), than it doesn't matter, what http implementation gets used - one should really not blame them for its own faults.

  * turnserver doesn't need to include an HTTP server.

Doesn't it already? What about src/apps/relay/http_server.c ;-)

Reduces the resource overhead of the HTTP server, such as CPU, memory, and program size.

Nope, the opposite is the case.

  * Easy to use.

Huhh? Please define "easy to use".

One HTTP server can serve multiple turnservers

And why should the metrics provider in coturn implement such a feature. This is completely unrelated and a clear indicator, that your idea about infrastructure setup/metric providers is severely flawed. See above.

* Cons: Difficult to develop. Need to:
  
  * Inter-process communication.
  * Cross-machine communication.
We are currently talking about option 1.

From the perspective of Prometheus design and stability, it is recommended to use Option 2.

Not at all. It is the opposite of what you claim it should be.

In O&M. Option 3. is also needed.

Never ever. This is not a coturn but an operator problem. E.g. consider the odometer of your car being a metrics provider. It provides the data to its operator aka driver, so that it might be able to avoid dangerous situations and getting a ticket for being to fast. No normal human being would mount a 65+" OLED display on the roof of its car and use it to show the speed it currently has to the public. Total non-sense, hurts performance and implies, that affected living things are able to recognize and decrypt the shown stuff. Sound and experience are usually sufficient for Papa and Mama Tomato to know, when they need to urge baby tomato to catch up ;-).

Functional requirements for Prometheus:

Ehmm, absolutely confusing. Are we talking about coturn or one of its potential consumers? Please do not mix up everything.

* Provides operations on metric types (counter, gauge, and histogram).
* Provide conversion of metric types (counter, Gauge, Histogram) to data model (string format)
* Provide a full HTTP service function.(used as a thread)

Your totally missing the point. Actually one does not need anything of it, neither coturn nor any metrics consumer! What is simply needed by a metrics provider is a key:value store and a way to answer HTTP requests for /metrics. If such a requests comes in it is as simple as print "HTTP/1.1 200 OK\r\n\r\n"; foreach (key:val in $store) { printf "$key = $value\n" }. Not more (key has to have the format $metric_name[{$attr_name="$attr_value"[,$attr_name="$attr_value"]...}] and value needs to be a double (most implementations use printf's "%g" - I think I've already mentioned it, but you are free to study the OpenMetrics format specs if in doubts).

Well, prometheus-client-c provides just that.

As libprom does. But as said, that's not really the point.

* Provides operations on metric types (counter, gauge, and histogram).
  * counter: https://github.com/digitalocean/prometheus-client-c/blob/master/prom/include/prom_counter.h
  * gauge: https://github.com/digitalocean/prometheus-client-c/blob/master/prom/include/prom_gauge.h

Technically - under the hood - this doesn't matter. Both are counters but the API makes sure, that one can be incremented, only. Again, a simple key:value store accomplishes the same.

  * Histogram: https://github.com/digitalocean/prometheus-client-c/blob/master/prom/include/prom_histogram.h

key:value pairs as well, just mimic a thing sliced into several pieces. So key= "$key{idx="$bucketIdx}" would be an alternative to use.

* Provides [prom_collector_registry_bridge](https://github.com/digitalocean/prometheus-client-c/blob/c57034d196582d99267d027abb52a05a55dc07f6/prom/include/prom_collector_registry.h#L107)
  to implement the conversion function.

Not at all. That's an implementation detail of the libraries. That's a little bit odd to say, we need libraryXY and that to enumerate all its functions it provides as requirements. If several times pointed out, what a metrics provider needs. If in doubts, read the [OpenMetrics] (https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md) or [Prometheus Expostion format] docs. BTW: They do not say for reason, how the provider has to implement it, just that they need to be exposed via HTTP and how to format it. And please, try to understand this first, before making any further claims and misleading/wrong assumptions.

* Has implemented an HTTP server in one thread as option 1. and it provide pipe too(It can be used as an option to achieve Option 2). [promhttp_start_daemon()](https://github.com/digitalocean/prometheus-client-c/blob/c57034d196582d99267d027abb52a05a55dc07f6/promhttp/src/promhttp.c#L62)

Plain wrong again.

  Starting a thread in the turnserver main() to call promhttp_start_daemon() is a simple HTTP service

Again, requiring the opposite of what you actually want does not help. Or do you ask for an additional thread to start the relay-threads as well. That's make simply no sense at all and would be BTW a very bad server design (it should fail early ...).

Implementation of option 1

* Start a sub-thread in the main thread of turnserver to call promhttp_start_daemon() to simply complete the HTTP service. And it already contains Metric types conversion operations (hidden from the user).

* Define and document the metrics to be monitored

What about docs/Metrics.md - did you really read/reviewed the whole PR.

* Initialize the metric

What do you think what src/apps/relay/prom_server.c:init_metrics() does?

* Call the function of the manipulative indicator where needed

LOL - isn't this obvious. If you do not collect data, what metrics do you wanna get out of it ;-)

@jelmd @eakraly Let's discuss option 1 first, and if we still have the energy, we will do other options.

As said, the only option to discuss is IMHO to implement the metrics provider by itself. libpromhttp is really simple and can be replaced without any effort by using libmicrohttpd directly, replacing libprom by a proper key:value store seems to require much more work and testing because it needs to be dynamic wrt. tid:sid. If one really does it, I guess it would probably end up with a hash map like thing again - and thus reinvent the wheel [which might be quadratic].

So there are other options, but not really good ones, especially if you ride the performance and stability horse again and again.

jonesmz · 2024-01-19T21:27:57Z

src/apps/relay/http_server.c

@@ -350,9 +350,9 @@ void str_buffer_append_sz(struct str_buffer *sb, size_t sz) {
  str_buffer_append(sb, ssz);
 }

-void str_buffer_append_sid(struct str_buffer *sb, turnsession_id sid) {
+void str_buffer_append_sid(struct str_buffer *sb, turnsession_id sid, uint32_t rsid) {
  char ssz[129];


is this enough space to hold the new data in all cases?

Yes. Actually 20 + 1 + 10 + 1 = 32 would be sufficient. Not sure, why the original author has chosen such a big buffer (doesn't make any sense to me).

jonesmz · 2024-01-19T21:29:24Z

src/apps/relay/mainrelay.c

 static char Usage[] =
    "Usage: turnserver [options]\n"
    "Options:\n"
    " -d, --listening-device	<device-name>		Listener interface device (NOT RECOMMENDED. Optional, Linux "
    "only).\n"
-    " -p, --listening-port		<port>		TURN listener port (Default: 3478).\n"
+    " -p, --listening-port		<port>		TURN listener port. Default: " _STRVAL(DEFAULT_STUN_PORT) ".\n"


The changes to stamping the port number into the help text would be lovely to have as a separate PR

Same with disabling prometheous at compile time, though that's a much bigger mess.

Hmmm, maybe. But hardcoding the port stuff and then rewrite it again is takes too much time w/o any benefit. Not sure, what is meant with "disabling prometheous at compile time".

jonesmz · 2024-01-19T21:29:59Z

src/apps/relay/netengine.c

@@ -45,6 +45,7 @@ static pthread_barrier_t barrier;

 ////////////// Auth Server ////////////////

+uint32_t relay_servers_in_use = 0;


should this be a static variable?

It is needed in src/apps/relay/ns_ioalib_engine_impl.c and is a "global var". So no unless there is a special collection of global vars somewhere else...

jonesmz · 2024-01-19T21:33:23Z

src/apps/relay/ns_ioalib_engine_impl.c

+#ifdef POOL_TRACE
+#define _TRACE TURN_LOG_FUNC
+#else
+// dirty talk - is for developers, only


I'm not really understanding this comment

Would be log trashing better?

jonesmz · 2024-01-19T21:38:10Z

src/server/ns_turn_server.c

-            TURN_LOG_LEVEL_INFO, "session %018llu: realm <%s> user <%s>: incoming packet %s processed, success\n",
-            (unsigned long long)(ss->id), (const char *)(ss->realm_options.name), (const char *)(ss->username), method);
+        TURN_LOG_FUNC(TURN_LOG_LEVEL_INFO,
+                      "session %012llu.%d: realm <%s> user <%s>: incoming packet %s processed, success\n",


You replace %018llu with %012llu in a bunch of different places.

Maybe it would be better to make a #define for this string, so it can be reused globally?

I assume, that the original author had a reason for not using a macro. Perhaps to avoid clashes when the format gets changed.

jonesmz · 2024-01-19T21:40:11Z

src/server/ns_turn_server.h

@@ -40,7 +40,10 @@ extern "C" {

 //////////// defines //////////////

-#define TURN_SESSION_ID_FACTOR (1000000000000000LL)
+// Even if a single threaded app handles on average about 1 session/s


i suspect 1 session per second is on the low side. My production instances handle many thousands of concurrent users per thread.

Not that your analysis is wrong in the general sense, just that maybe the explanation should be a bit more cautious.

Actually the unit (k) is missing (1k * 1G = 1T =~ 12 zeros, 1 Gs =~ 32 years). Even if a box handles 50k session/s, session ID would wrap after ~231 days: unlikely that all sessions last that long ...

KangLin · 2024-01-22T08:45:24Z

@jelmd OK! I am select prometheus-client-c. the interface of prometheus-client-c is defined fine.
It is recommended that you only fix bugs and add windows support. Don't modify the interface.
The metric is OK! Hope to add the metric of the drop packets.

Jens Elkner added 4 commits December 6, 2023 23:49

use enums instead of raw values

c52efc1

libprom support

fe5477f

use https://github.com/jelmd/libprom instead of the bugyy digitalocean/prometheus-client-c including its compact feature.

life metrics with libprom

4cd10e1

do not use digitalocean/prometheus-client-c but libprom

121b319

Jens Elkner added 2 commits December 7, 2023 16:28

Adjust GHA to use libprom

3584653

add missing TURN_NO_PROMETHEUS guards

9969511

jelmd force-pushed the #1340_better_metric_support branch from 99ff06b to 9969511 Compare December 7, 2023 15:28

Jens Elkner added 2 commits December 7, 2023 16:42

fix typo for !TURN_NO_HIREDIS

5ad19b5

docker stuff should use libprom, too.

0d9a79c

jelmd force-pushed the #1340_better_metric_support branch from 089ebdc to 6f737da Compare December 8, 2023 07:20

docker stuff seems to be buggy - try to find/fix problems

0711a35

jelmd force-pushed the #1340_better_metric_support branch from 26c7a72 to 9e49ddc Compare December 8, 2023 15:05

Jens Elkner added 2 commits December 8, 2023 16:35

fix clang-format usage =8-( and make it happy

40ac7c0

jelmd force-pushed the #1340_better_metric_support branch from 9e49ddc to 6980c5b Compare December 8, 2023 15:36

KangLin suggested changes Dec 16, 2023

View reviewed changes

eakraly self-requested a review December 19, 2023 06:37

eakraly requested changes Dec 19, 2023

View reviewed changes

KangLin suggested changes Dec 21, 2023

View reviewed changes

jelmd mentioned this pull request Jan 18, 2024

Use a mature logging library #1344

Open

jonesmz reviewed Jan 19, 2024

View reviewed changes

fix missing unit in comment

ed41ff2

jelmd force-pushed the #1340_better_metric_support branch from d0acd7a to ed41ff2 Compare February 6, 2024 12:08

		@@ -45,6 +45,7 @@ static pthread_barrier_t barrier;

		////////////// Auth Server ////////////////

		uint32_t relay_servers_in_use = 0;

#1340 better metric support #1343

Are you sure you want to change the base?

#1340 better metric support #1343

Conversation

jelmd commented Dec 6, 2023

jelmd commented Dec 6, 2023 • edited

jelmd commented Dec 7, 2023

jelmd commented Dec 8, 2023

jelmd commented Dec 8, 2023

jelmd commented Dec 16, 2023

KangLin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Usage

Choose a reason for hiding this comment

eakraly commented Dec 19, 2023

eakraly left a comment

Choose a reason for hiding this comment

jelmd commented Dec 20, 2023

jelmd commented Dec 20, 2023

KangLin left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KangLin commented Dec 21, 2023

jelmd commented Dec 21, 2023

jelmd commented Dec 22, 2023

KangLin commented Dec 22, 2023 • edited

About HTTP servers. There are several solutions:

Functional requirements for Prometheus:

Implementation of option 1

jelmd commented Dec 22, 2023

About HTTP servers. There are several solutions:

Functional requirements for Prometheus:

Implementation of option 1

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KangLin commented Jan 22, 2024

jelmd commented Dec 6, 2023 •

edited

KangLin left a comment •

edited

KangLin commented Dec 22, 2023 •

edited