New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network.c: allow notification forwarding #4183
base: main
Are you sure you want to change the base?
Conversation
75044ff
to
7b58655
Compare
0c64891
to
124b67d
Compare
Tagged as WIP: checks do not build |
High level feedback: the problem we need to solve here is when two instances of collectd are configured to send notifications to each other. Because we can't currently tell whether we've seen a notification before, collectd would potentially bounce a notification back and forth indefinitely. To get around that, we need to "remember" notifications somehow. We could, for example, keep a copy of the notifications of the last 60 seconds and discard notifications that are older than that. Or we keep the last 100 notifications, which would make it possible to implement a "give me the last n notifications" API. Or a combination of the two. |
Sorry, probably more context is needed for this PR. The goal of this PR is to allow notifications forwarding, replicating the send behavior of "values": a notification will be sent when:
In complex network configurations with collectd forwarders, this PR makes "notifications" arrive destination as values do. In the case of two instances of collectd configured to send notifications (and values) to each other, "forwarding" should be "false" and only local notifications (and values) would be send to the other instance. Regarding build checks, it seems that is needed to include plugin.c in network_test.c (where all metadata functions for notifications are implemented) but this do not worked for me. |
ceeb88f
to
fcc531b
Compare
51b4339
to
13f3d84
Compare
All checks fail due to network test changes:
|
2460994
to
3ddffbd
Compare
3ddffbd
to
6cecd6b
Compare
Finally I manage to run checks. Removing WIP tag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @carlospeon, thanks again for your work!
In the case of two instances of collectd configured to send notifications (and values) to each other, "forwarding" should be "false" and only local notifications (and values) would be send to the other instance.
I know several people who have configured their collectd servers this way, and I'm sure that's just the tip of the iceberg. These setups would be massively disrupted with the PR in its current form.
In order to forward notifications, we need a way to deduplicate them. E.g. add notifications to a list of notifications of the last 300 seconds (with some upper bound) and dismiss received notifications if they are in that list.
In the interest of avoiding miscommunication, let me be super direct: Duplicate detection is not optional. Without it, I won't merge this PR.
Hello @octo, thank you for your review!
In my experience if you have a configuration with a loop, values are resent continuously and collectd crashs, so there should not be such setups working properly. I checked the code and although there is a de-duplication detection for values: collectd/src/daemon/utils_cache.c Line 350 in 4cc3426
the return value of uc_update is not checked: Line 2162 in 4cc3426
and duplicated values are dispatched to the write plugins. Or maybe I'm missing something. We are running this code for a year without issues, our rate of notifications is something above a thousand per day and with a proper configuration we don't have any specific issue regarding loops.
I appreciate that, fair enough. This solution -maybe suboptimal- "works for me" and I just want to share it with the community. In this particular moment I'm not ready to code and test a more complex/complete solution. |
ChangeLog: network.c allow notification forwarding.
Allow notification forwarding checking notification metadata to avoid loops.