Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network.c: allow notification forwarding #4183

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

carlospeon
Copy link
Contributor

ChangeLog: network.c allow notification forwarding.

Allow notification forwarding checking notification metadata to avoid loops.

@carlospeon carlospeon requested a review from a team as a code owner December 5, 2023 13:41
@carlospeon carlospeon force-pushed the check-notif-loops branch 2 times, most recently from 75044ff to 7b58655 Compare December 5, 2023 15:23
@carlospeon carlospeon requested a review from a team as a code owner December 5, 2023 15:23
@carlospeon carlospeon force-pushed the check-notif-loops branch 8 times, most recently from 0c64891 to 124b67d Compare December 5, 2023 16:47
@carlospeon carlospeon changed the title network.c: allow notification forwarding [WIP] network.c: allow notification forwarding Dec 5, 2023
@carlospeon
Copy link
Contributor Author

Tagged as WIP: checks do not build

@octo
Copy link
Member

octo commented Dec 5, 2023

High level feedback: the problem we need to solve here is when two instances of collectd are configured to send notifications to each other. Because we can't currently tell whether we've seen a notification before, collectd would potentially bounce a notification back and forth indefinitely.

To get around that, we need to "remember" notifications somehow. We could, for example, keep a copy of the notifications of the last 60 seconds and discard notifications that are older than that. Or we keep the last 100 notifications, which would make it possible to implement a "give me the last n notifications" API. Or a combination of the two.

@carlospeon
Copy link
Contributor Author

Sorry, probably more context is needed for this PR.

The goal of this PR is to allow notifications forwarding, replicating the send behavior of "values": a notification will be sent when:

  • Forwarding is "true"
  • Forwarding is "false" but the notification is local, that is, do not have metadata "network:received" set to "true".

In complex network configurations with collectd forwarders, this PR makes "notifications" arrive destination as values do.

In the case of two instances of collectd configured to send notifications (and values) to each other, "forwarding" should be "false" and only local notifications (and values) would be send to the other instance.

Regarding build checks, it seems that is needed to include plugin.c in network_test.c (where all metadata functions for notifications are implemented) but this do not worked for me.

src/network.c Outdated Show resolved Hide resolved
@eero-t
Copy link
Contributor

eero-t commented Dec 21, 2023

All checks fail due to network test changes:

 In file included from src/network_test.c:25:
src/daemon/plugin.c: In function 'set_thread_name':
src/daemon/plugin.c:622:16: error: implicit declaration of function 'pthread_setname_np'; did you mean 'pthread_setcanceltype'? [-Werror=implicit-function-declaration]
  622 |   int status = pthread_setname_np(tid, n);
      |                ^~~~~~~~~~~~~~~~~~
      |                pthread_setcanceltype

@carlospeon carlospeon force-pushed the check-notif-loops branch 3 times, most recently from 2460994 to 3ddffbd Compare December 21, 2023 11:26
@carlospeon
Copy link
Contributor Author

Finally I manage to run checks. Removing WIP tag.

@carlospeon carlospeon changed the title [WIP] network.c: allow notification forwarding network.c: allow notification forwarding Dec 21, 2023
Copy link
Member

@octo octo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @carlospeon, thanks again for your work!

In the case of two instances of collectd configured to send notifications (and values) to each other, "forwarding" should be "false" and only local notifications (and values) would be send to the other instance.

I know several people who have configured their collectd servers this way, and I'm sure that's just the tip of the iceberg. These setups would be massively disrupted with the PR in its current form.

In order to forward notifications, we need a way to deduplicate them. E.g. add notifications to a list of notifications of the last 300 seconds (with some upper bound) and dismiss received notifications if they are in that list.

In the interest of avoiding miscommunication, let me be super direct: Duplicate detection is not optional. Without it, I won't merge this PR.

@carlospeon
Copy link
Contributor Author

Hello @octo, thank you for your review!

I know several people who have configured their collectd servers this way, and I'm sure that's just the tip of the iceberg. These setups would be massively disrupted with the PR in its current form.

In my experience if you have a configuration with a loop, values are resent continuously and collectd crashs, so there should not be such setups working properly. I checked the code and although there is a de-duplication detection for values:

if (ce->last_time >= vl->time) {

the return value of uc_update is not checked:

uc_update(ds, vl);

and duplicated values are dispatched to the write plugins. Or maybe I'm missing something.

We are running this code for a year without issues, our rate of notifications is something above a thousand per day and with a proper configuration we don't have any specific issue regarding loops.

In the interest of avoiding miscommunication, let me be super direct: Duplicate detection is not optional. Without it, I won't merge this PR.

I appreciate that, fair enough. This solution -maybe suboptimal- "works for me" and I just want to share it with the community. In this particular moment I'm not ready to code and test a more complex/complete solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants