New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add load_balance_hosts
parameter to db config
#736
base: master
Are you sure you want to change the base?
Conversation
src/loader.c
Outdated
int pool_size = -1; | ||
int min_pool_size = -1; | ||
int res_pool_size = -1; | ||
int max_db_connections = -1; | ||
int dbname_ofs; | ||
int pool_mode = POOL_INHERIT; | ||
int host_strategy = ROUND_ROBIN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int host_strategy = ROUND_ROBIN; | |
enum HostStrategy host_strategy = ROUND_ROBIN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution, it looks useful, postgres connection strings work in a similar way so it makes sense to support that as well.
I left some comments. And can you please rebase/merge this PR because there's a merge conflict.
src/loader.c
Outdated
@@ -236,6 +241,11 @@ bool parse_database(void *base, const char *name, const char *connstr) | |||
res_pool_size = atoi(val); | |||
} else if (strcmp("max_db_connections", key) == 0) { | |||
max_db_connections = atoi(val); | |||
} else if (strcmp("host_strategy", key) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs an entry in config.md
to describe its usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is in. Feedback welcome.
src/objects.c
Outdated
if (!server->pool->last_connect_failed && server->pool->db->host_strategy == LAST_SUCCESSFUL) | ||
server->pool->rrcounter = server->pool->last_successful_rrcounter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current PR seems to have the desired effect based on the tests you wrote, but I have the feeling the same could probably be achieved without the two new fields you added: server->rrcounter
and pool->last_successful_rrcounter
.
How about we simply increase the pool->rr_counter
whenever a connection attempt fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I simplified the logic as you suggest and it looks better to me.
6087208
to
5c93a29
Compare
@JelteF Touching base. Happy to incorporate any additional feedback on the new implementation. |
src/loader.c
Outdated
@@ -299,6 +309,7 @@ bool parse_database(void *base, const char *name, const char *connstr) | |||
db->res_pool_size = res_pool_size; | |||
db->pool_mode = pool_mode; | |||
db->max_db_connections = max_db_connections; | |||
db->host_strategy = host_strategy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the host_strategy
of a database can change, should we tag it as changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good call. Yes, I think we should tag it so behavior changes after a reload. I will make that change. Thanks for the feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host_strategy
is now reloadable.
show databases
/show pools
also now include host_strategy
in their output, which makes this testable and is useful on its own.
b35c9a6
to
5d3104f
Compare
I think the code in the PR is fine now (apart from the merge conflict that occured due to another merged PR). But after playing with these changes a bit I found one other issue, and I'm wondering how useful this feature is without also addressing this issue: @dpirotte How are you using this change, that it is useful in its current form? |
Hi @dpirotte, github notified me that you rebased this on a branch of your fork. I think a rebased version of this can be merged pretty easily. Could you still explain how you use this in practice? Because when I played with it, another bug in pgbouncer was causing serious issues for me. |
Hi @JelteF. Yeah, I’m working on rebasing this and another patch on top of 1.19. The code works but I haven’t updated to your new (much improved!) test framework. The purpose here is to support faster failovers in environments where 1/N hosts in the list will be the primary and (N-1)/N hosts will be replicas and where DNS updates are potentially too slow. RDS/Aurora is one such environment, and I believe some Patroni configurations are as well. Given the following configurations:
After a failover, A key part here is the I have another patch that adds lightweight Another option would be to keep the round-robin behavior and make I’ll bump once the tests are ported to pytest and we’ll go from there. |
PGBouncer's default behavior is to round-robin between comma-separated hosts on each new connection attempt. This is desirable when all hosts are similar, such as load balancing over multiple replicas, but undesirable when only one host out of the list is expected to successfully connect and login. In this scenario, every other connection attempt would hit an invalid host and fail. The `host_strategy` parameter controls this host selection behavior. The default behavior is labeled `round_robin` and a new strategy called `last_successful` is introduced. This new strategy instructs PGBouncer to prefer the host with most recent successful connections. When that host fails, new connections will round-robin through the list until a successful connection occurs.
In order to maintain behavior that connections always start from the left-most host in the host list, increment rrcounter either before choosing a host (in the case of last_successful) or after choosing a host (in the case of round_robin). (This incorporates PR feedback from @JelteF.)
Previously, host_strategy could only be set at pgbouncer startup, which is unintuitive and inconsistent with other per-database configuration. Also, add host_strategy to both `show databases` and `show pools` output for visibility into the current setting, and to facilitate testing that the configuration properly reloads.
@JelteF Rebased |
Previously, the host_strategy tests used 127.0.0.2,127.0.0.1 had a "bad IP" (127.0.0.2) as the first entry to force the first backend server connection to fail and verify that `host_strategy=last_successful` directs new connections to the second "good" entry. MacOS doesn't treat 127.0.0.2 in the same manner as Linux by default, so we need a different way to fail the first host in the list. Switching to an unresolvable DNS hostname works fine on both Linux and MacOS.
doc/config.md
Outdated
@@ -1060,6 +1060,20 @@ they are logged but ignored otherwise. | |||
Set the pool mode specific to this database. If not set, | |||
the default `pool_mode` is used. | |||
|
|||
### host_strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to change the name of the option to load_balance_hosts
. This option got added to libpq for PG16 (by me). It seems nice to use consistently named options across projects. The last_successful
option name should then be changed to disable
. And lets change round_robin
to round-robin
, to be consistent with other arguments like verify-full
and read-write
.
PS. I especially think consistency is nice here, because you mentioned you also want to add target_session_attrs
to PgBouncer. Since target_session_attrs
and load_balance_hosts
can be used nicely together, I think it's especially useful to have the same names for both Postgres and PgBouncer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, adding a random
as an option would be a nice future improvement to make sure PgBouncer supports the same as libpq. But that should definitely be another follow up PR. We can merge this PR easily without the random functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make sense to change the name of the option to load_balance_hosts. This option got added to libpq for PG16 (by me).
Ha, good call. I noticed that you contributed this feature in the release notes. +100 to consistent naming with libpq. I'll take care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the initial naming. It seems clear. load_balance_hosts seems to be a misnomer, since no "load" and no "balancing" is involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the initial naming. It seems clear.
I think the initial name was a fine name too. But I think being in line with naming of libpq, jdbc and npgsql is a big advantage of the load_balance_hosts naming.
load_balance_hosts seems to be a misnomer, since no "load" and no "balancing" is involved?
The "load" is the number of connection that are made by pgbouncer to the database, and those connections are "balanced" across the different servers in a round-robin fashion (or random fashion when implemented in the future). And you can enable this balancing by using disable
which will select a working host and send all traffic there.
IMHO, the old naming describes how pgbouncer chooses hosts. The new naming describes what the user will see happening to their setup by changing the setting. I think I personally like names that describe the effects instead of the method.
I incorrectly ported these tests from bash to pytest and so they were not actually verifying that subsequent connections were reusing the last successful host in the list. The tests now verify behavior correctly.
This feature does almost the same thing as the upcoming libpq param called `load_balance_hosts`, so use that name for consistency and rename the configuration options accordingly (`last_successful` => `disable`)
host_strategy
parameter to db configload_balance_hosts
parameter to db config
@JelteF Updated naming per discussion above. The macOS build passes now. The mingw{32,64} builds failed a couple times but seem to pass now, so those might have been random failures. The most recent build failures look like Cirrus issues: |
@@ -30,6 +30,9 @@ authdb = port=6666 host=127.0.0.1 dbname=p1 auth_user=pswcheck | |||
|
|||
hostlist1 = port=6666 host=127.0.0.1,::1 dbname=p0 user=bouncer | |||
hostlist2 = port=6666 host=127.0.0.1,127.0.0.1 dbname=p0 user=bouncer | |||
hostlist_good_first = port=6666 host=127.0.0.1,127.0.0.3 dbname=p0 user=bouncer load_balance_hosts=disable | |||
hostlist_bad_first = port=6666 host=unresolvable-hostname,127.0.0.1 dbname=p0 user=bouncer load_balance_hosts=disable | |||
load_balance_hosts_update = port=6666 host=127.0.0.1,127.0.0.3 dbname=p0 user=bouncer load_balance_hosts=disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load_balance_hosts_update database isn't used (afaict). Did you have a test in mind for it?
Lost in translation from bash tests to pytest
When a comma-separated list is specified in `host`, `load_balance_hosts` controls | ||
which entry is chosen for a new connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if this same logic would be used when receiving multiple IP adresses from DNS. The easiest way to test this is to create lines in /etc/host
. Libpq its load_balance_hosts version does this too. So if we don't implement this, we should at least make it explicit in the docs that this setting currently doesn't impact DNS based load balancing (but that this might change in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I'm wondering if this feature also includes #448 or at least to have it in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I didn't see that PR before (I guess I didn't go more than 2 years into PR history when joining the project). Right now #448 is completely distinct from this PR:
- This PR does not implement the
disable
mode for DNS load balancing. It still uses round robin even ifdisable
is used. - This PR does not implement the
random
mode. PR add server_shuffle_hosts #448 implementsrandom
mode for DNS load balancing only, not for host lists (certainly because host lists were not available in PgBouncer in the time that PR was written).
Th design of the config option would allow for implementing both 1 and 2 though. I think 2 isn't required to implement for this PR. A completely missing option isn't very confusing. But I think 1 is quite confusing because the existing option does not apply to all the places that you would expect, so preferably this PR would implement 1 too. If that's hard for some reason, then I think a note in the docs is also sufficient.
|
||
disable | ||
: A new connection continues using the same host entry until a connection | ||
fails, after which the next host entry is chosen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to mention that it is recommended to set server_login_retry
lower than the default to ensure fast retries, because of issue #872
PGBouncer's default behavior is to round-robin between comma-separated
hosts on each new connection attempt. This is desirable when all hosts
are similar, such as load balancing over multiple replicas, but
undesirable when only one host out of the list is expected to
successfully connect and login. In this scenario, every other connection
attempt would hit an invalid host and fail.
The
host_strategy
parameter controls this host selection behavior. Thedefault behavior is labeled
round_robin
and a new strategy calledlast_successful
is introduced. This new strategy instructs PGBouncerto prefer the host with most recent successful connections. When that
host fails, new connections will round-robin through the list until a
successful connection occurs.