-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
omfwd: implement native load balancing - phase 1 #5369
base: master
Are you sure you want to change the base?
Conversation
fce6d35
to
60d1ada
Compare
8fec56f
to
0425a5d
Compare
As a future enhancement, if a DNS lookup returns multiple IP addresses, use this to RR between them |
We still want to have rebindInterval for people who have other load balancers |
I guess you looked at the refactoring commit. This is just moving the functonality out of a helper library to the main omfwd module. It does NOT disable the functionality. It's a code cleanup, as omfwd is the only module using it. side-note: while doing the enhancement work, I noticed that rebind interval causes some mild message duplication for quite some time. This went unnoticed. To address that efficiently, rebindInterval in the future will be considered once per batch. That means up to maxBatchSize -1 messages may be transmitted more than the rebindinterval is. That's the cleanest mode of operation and should not make any difference for real deployments. |
That's a good idea, but let's get the core functionality in place first. Most importantly, that request requires many more changes to helper objects. Please note that the load balancing, as well as other upcoming work, is a sponsored opportunity, so I need to concentrate on the sponsor's need first. I am very glad we found someone to help make this important enhancement finally happen. Note that the code change looks small, but there is a lot to consider "in the background" to implement load balancing in a useful way into the engine. There are lots of subtle issues. The current approach IMHO addresses them best. |
This patch implements a very simple round-robin load balancer for omfwd. Its intent is to provide a solid base on which more elaborate functionality can be build. That said, the load balancing basics will be in place, including the handling of failing load balancing targets and imporoved and adapted pstats counters. The new functionality is fully backwards compatible with previous configuration settings. New action() config params: * pool.resumeinterval New/"changed" rstats counters Each target receives its own set of pstats counters. Most importantly this is the case for byte counts. That counter retains the same naming, but there may now be multiple of these counters, one for each target ip, port tuple. New pstats message count to target Among others, this can be used for checking that the load balancer works as intended. The so-far byte count emitted does not provide a clear indication of how many messages the targets had actually processed. For obvious reasons, this message count makes most sense in advanced load balancing scenarios, but also provides additional insight into round-robin. Non-matches indicate that targets went offline, and we can now evaluate the impact this had on processing. - re-design rebind functionality This now works at the transaction level. It causes a rebind of all pool members. Previous code did not work 100% correct since for a couple of years now (after output batching integration). As cleanup, rebindInterval support has been removed from tcpClt, because omfwd is the only user. This permits a cleaner code path. We also noticed a bug with rebindInterval: it caused some mild message duplication for quite some time. This went unnoticed. To address that efficiently, rebindInterval in the future will be considered once per batch. That means up to (maxBatchSize - 1) messages may be transmitted more than the rebindinterval is. That's the cleanest mode of operation and should not make any difference for real deployments.
0425a5d
to
7cb2f61
Compare
I was actually looking at the code change and noticed the rebindinterval was
removed and did not appear to be added back anywhere else.
As long as the functionality is still there.
As for message duplication, I was under the impression that the rebind would
only happen between batches, so one batch would be completely sent, close the
connection, reopen the connection, and start sending the next batch.
how is it duplicating messages anywhere in this?
David Lang
…On Tue, 7 May 2024, Rainer Gerhards wrote:
Date: Tue, 07 May 2024 00:00:15 -0700
From: Rainer Gerhards ***@***.***>
Reply-To: rsyslog/rsyslog
***@***.***>
To: rsyslog/rsyslog ***@***.***>
Cc: David Lang ***@***.***>, Comment ***@***.***>
Subject: Re: [rsyslog/rsyslog] omfwd: implement native load balancing - phase
1 (PR #5369)
> We still want to have rebindInterval for people who have other load balancers
I guess you looked at the refactoring commit. This is just moving the functonality out of a helper library to the main omfwd module. It does NOT disable the functionality. It's a code cleanup, as omfwd is the only module using it.
side-note: while doing the enhancement work, I noticed that rebind interval
causes some mild message duplication for quite some time. This went unnoticed.
To address that efficiently, rebindInterval in the future will be considered
once per batch. That means up to maxBatchSize -1 messages may be transmitted
more than the rebindinterval is. That's the cleanest mode of operation and
should not make any difference for real deployments.
|
On Tue, 7 May 2024, Rainer Gerhards wrote:
That's a good idea, but let's get the core functionality in place first. Most
importantly, that request requires many more changes to helper objects. Please
note that the load balancing, as well as other upcoming work, is a sponsored
opportunity, so I need to concentrate on the sponsor's need first. I am very
glad we found someone to help make this important enhancement finally happen.
Note that the code change looks small, but there is a lot to consider "in the
background" to implement load balancing in a useful way into the engine. There
are lots of subtle issues. The current approach IMHO addresses them best.
Yes, core functionality (especially if it's sponsored) first. That's why I
tagged it as a future feature.
At some point I'd be interested to hear about the background considerations
(especially the subtle issues), but not urgent
David Lang
|
It happens on the message level, not batch. That's also where potential duplication comes in. The new mode (found in omfwd.c) works on batches and here we can ensure that a complete "transaction" is written to the remote target before a rebind is done. The issue, I think, appeared some years ago when we introduced the new transactional interface vs. the old-style method. it's not a big issue, so it went unnoticed until my current work. |
just the most important point first: there were the choice to a) re-write large parts of the rsyslog core (well, not a complete rewrite, but a large set of changes) and working with the existing interfaces or b) a kind of emulating the same behaviour for a pool as for a single target). After a lot of thinking, I went for b), which also seems to be the simpler and highly efficient method. So far, the results look very good. |
ddb48f2
to
21a6fae
Compare
238aeb2
to
eb87e02
Compare
This patch implements a very simple round-robin load balancer for omfwd. Its intent is to provide a solid base on which more elaborate functionality can be build.
That said, the load balancing basics will be in place, including the handling of failing load balancing targets and imporoved and adapted pstats counters.
The new functionality is fully backwards compatible with previous configuration settings.
New action() config params:
New/"changed" rstats counters
Each target receives its own set of pstats counters. Most importantly this is the case for byte counts. That counter retains the same naming, but there may now be multiple of these counters, one for each target ip, port tuple.
New pstats message count to target
Among others, this can be used for checking that the load balancer works as intended. The so-far byte count emitted does not provide a clear indication of how many messages the targets had actually processed.
For obvious reasons, this message count makes most sense in advanced load balancing scenarios, but also provides additional insight into round-robin. Non-matches indicate that targets went offline, and we can now evaluate the impact this had on processing.