-
Hi there, Wouldn't it be better in the future to have only parents and no netdata agents ? regards |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
Hi @StefanSa - thanks for posting - this is an interesting topic - I think we have actually been recommending using parents more and more for a variety of reasons (high availability, scalability, central config, and probably a few other reasons i'm missing). Your idea of "parents only" for sure makes sense for certain use cases - e.g. if only interested in a very specific subset of metrics or the thing you are monitoring just does not fit well into the "nodes" abstraction. I think we have actually seen a fair few users take this approach, especially with SNMP devices etc. and also some APM use-cases where maybe all the is being emitted is metrics to some Prometheus endpoint that basically can just be a Netdata parent, similar for StatsD metrics too. For example we do something similar for monitoring our Apache Airflow instance that does some ETL stuff - essentially its emitting statsd metrics to a parent. I know @ralphm is also actively looking at open telemetry and what it would mean to say we are "open telemetry compatible" or "fully supported". We do also have ongoing work to bring some more log based capabilities into Netdata too - mega PR is here that focuses on that for background. I work on ML so that my main area of expertise so not the most informed on the overall architecture side of things but i'm sure @ralphm, @Ferroin and @ktsaou probably have some opinions too. My understanding is that having a "parents only" approach like you suggest would just be another totally valid topology or deployment strategy but we would more be just trying to support as many different approaches that make sense for users and try share best practices and pro's and con's of each. We recently put down some latest thinking on this here: https://learn.netdata.cloud/docs/architecture/deployment-strategies One thing i'm less sure about is the difference between "parent only" and what we call "standalone" in link above (maybe it more about emphasis on scale and variety or metrics that a parent might receive in your scenario) - essentially, i think a parent only approach can be considered a flavor of standalone, and if you wanted to do active-active then that would be two parents sharing metrics. |
Beta Was this translation helpful? Give feedback.
-
I'd start with
What is top-heavy and why Netdata is too top-heavy? I don't think so, Netdata is not heavy in general and can be configured to use less CPU, mem, and io resources (e.g. changing memory mode, disabling some collectors, changing default data collection frequency, disabling some components like ML, health). |
Beta Was this translation helpful? Give feedback.
-
Hi Andrew, Hi Ilay, |
Beta Was this translation helpful? Give feedback.
-
Hi @StefanSa and thank you for your message! OTEL is great and we are working in the direction you suggest. But even OTEL has limitations we are trying to overcome. One of the biggest differences in Netdata is that we automate most of the monitoring configuration. Netdata automatically picks what is useful to monitor and behind the scenes takes a lot of very important decisions. Take errors for example: usually error counters are zero. Netdata collects many thousands of "usually" zero metrics behind the scenes (hardware errors, operating system errors, etc), and as long as they are zero, it ignores most of them. It doesn't do anything at all about them. But it will automatically inject charts and alerts immediately after a collected error counter is non zero. Expect 2-3 times more metrics per node if this didn't happen. Another issue is the OTEL data model. At Netdata we need fully automated visualization. For this to work we need to make a few important decisions: group metrics together into meaningful charts, decide what exactly each metric monitors, identify (and name) the specific instance of each component monitored, create a tree structure of metrics, give titles to all charts, and many more. All these, do not exist in OTEL and in many cases OTEL is fuzzy (you can't really tell what exactly is monitored, or more than 1 component is monitored at the same time). Of course Netdata could receive OTEL data and enrich it, but this would involve a lot of guess work to derive the metadata we need, which can easily lead to a nightmare as the data models of OTEL and Netdata change over time. One of the things I want us to do, is to document how OTEL should be for Netdata to work without heuristics using OTEL as the primary data source. But unfortunately we haven't found the time to do it yet. Another thing you will miss is about Netdata Functions. Netdata Functions are exposed by collectors to allow us interact with the data source in a way that is not just metrics. The first function is We currently have another function that we will release this week: We also plan to build a "database slow queries", a trace for operating system calls per PID, even restart a systemd service or reboot a server. Our goal is to reach a point that complex applications like So, although your idea is very nice, I can see a future like this in Netdata and we are working to make Netdata fully OTEL compatible so that your idea is supported by Netdata, it may not be as practical as you expect it... |
Beta Was this translation helpful? Give feedback.
-
@ktsaou Now to the topic. i am still a seeker over all these years, from mrtg, nagios, zabbix, influx ellastic etc and now netdata :) Πολλούς χαιρετισμούς από τη Γερμανία. |
Beta Was this translation helpful? Give feedback.
thanks!
The opposite actually. We want it badly..
This is the problem. And also that monitoring architecturaly should be somewhat different from what is common today. We really need zero configuration, zero touch ML, automated dashboards, easy scalability, more structure and information into the tool so that monitoring is ea…