Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threaded dests: add reopen() with exponential backoff strategy #4825

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

alltilla
Copy link
Collaborator

@alltilla alltilla commented Feb 9, 2024

No description provided.

Copy link
Contributor

github-actions bot commented Feb 9, 2024

This Pull Request introduces config grammar changes

syslog-ng/12763388b6a838f8a03e769af139ad6f49664b6b -> alltilla/exponential-backoff-threaded-dest

--- a/destination
+++ b/destination

 amqp(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 bigquery(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 example-destination(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 http(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 java(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 kafka-c(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 loki(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 mongodb(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 mqtt(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 opentelemetry(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 python(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 redis(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 riemann(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 smtp(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 sql(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 stomp(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

 syslog-ng-otlp(
+    reopen(
+        <empty>
+        initial-seconds(<nonnegative-float>)
+        maximum-seconds(<nonnegative-float>)
+        multiplier(<positive-float>)
+    )
 )

Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
@alltilla alltilla force-pushed the exponential-backoff-threaded-dest branch from 1635fcf to fbf1ebc Compare February 9, 2024 09:58
@kira-syslogng
Copy link
Contributor

Build FAILURE

alltilla added a commit to alltilla/syslog-ng that referenced this pull request Feb 12, 2024
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
alltilla added a commit to alltilla/syslog-ng that referenced this pull request Feb 12, 2024
Signed-off-by: Attila Szakacs <attila.szakacs@axoflow.com>
#include "syslog-ng.h"

#define EXPONENTIAL_BACKOFF_OPTIONS_INITIAL_SECONDS_DEFAULT (0.1)
#define EXPONENTIAL_BACKOFF_OPTIONS_MAXIMUM_SECONDS_DEFAULT (1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very low in my opinion compared to the old default. I think we shouldn't go lower than 10 seconds by default. If we are planning to introduce this for traditional syslog/network drivers too, I would also consider going with different defaults there (or just keep the old 60 seconds as maximum?).

Note: connection attempts are logged internally on a high log level. For example, using the HTTP produces 2 error messages with each reopen attempt:

curl: error sending HTTP request; url='...', error='Couldn\'t connect to server', worker_index='0', driver='#anon-destination0#1', location='...'
Server disconnected while preparing messages for sending, trying again; driver='#anon-destination0#1', location='...', worker_index='0', reopen='0.00', batch_size='1'

TLDR: Do we really want 1 as maximum? If so, then we should do something about internal logging connection attempts as we are currently spamming 2-3 error messages every second per worker.
In my opinion, we should rather choose a higher default (60 or 10 would be my suggestion) and keep the logging as-is. The exponential retry mechanism will make sure that we reconnect instantly in case of momentary errors, and we even have 5 more attempts before reaching 1 second of wait time and 10 attempts before reaching 60.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed this was the "default" for max seconds, unless specified otherwise.

static gdouble
_get_next_reopen_seconds(LogThreadedDestWorker *self)
{
if (self->time_reopen != -1)
Copy link
Collaborator

@MrAnno MrAnno Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is set explicitly by the following drivers, disabling the exponential backoff feature:

  • Kafka
  • Python

@@ -333,6 +359,7 @@ static void
_process_result_success(LogThreadedDestWorker *self)
{
_accept_batch(self);
exponential_backoff_reset(self->exponential_backoff);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a double-reset here due to the function responsible for "explicit ack management".
It doesn't cause any problems, so I guess it's okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants