Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign shutdown logic to support rolling restarts using so_reuseport #902

Merged
merged 1 commit into from May 6, 2024

Conversation

JelteF
Copy link
Member

@JelteF JelteF commented Jul 24, 2023

In 1.20 we announced deprecation of online restarts using the -R flag and recommended for people to use so_reuseport instead. But the safe shutdown logic that we executed when receiving a SIGINT signal didn't actually allow for zero-downtime rolling restarts.

This PR changes the behaviour of our shutdown signals to allow for rolling restarts. The changes that this PR makes are:

  1. SIGTERM does not do "immediate shutdown" anymore, but instead triggers a newly introduced "super safe shutdown". This "super safe shutdown" waits for all clients to disconnect before shutting down the server. This can be used for rolling restarts as described in the newly added documentation. A second SIGTERM will cause an "immediate shutdown".
  2. SIGINT mostly keeps doing what it was doing previously: waiting until all servers are released before shutting down PgBouncer. But it incorporates a few improvements from the new SIGTERM logic:
    1. It also stops listening for new connections while PgBouncer is waiting for servers to be released. This does have the downside that SIGUSR2 cannot be used to cancel a safe shutdown anymore, but that was quite a weird/niche feature anyway.
    2. Clients that have released their server and then do another query will be disconnected, instead of putting them in pause mode until either query_wait_timeout triggers or PgBouncer shuts down. This way they can reconnect quicker to another PgBouncer, because they were never going to get a new server anyway.
    3. A that a second SIGINT will trigger an "immediate shutdown"
  3. SIGQUIT handling is introduced and now causes an immediate shutdown.

Since this changes the existing behaviour of SIGTERM and SIGINT, this can be considered a breaking change. But the amount of breakage should be minimal since most of the time people would not want an immediate shutdown, and the new SIGINT behaviour is closer to the behaviour most people would expect.

Related to #894
Related to #900

Since this changes SIGTERM to not do a fast shutdown anymore this also indirectly addresses #806 and #770.

Fixes #806
Fixes #770

src/main.c Outdated
@@ -481,8 +481,14 @@ static void handle_sigint(evutil_socket_t sock, short flags, void *arg)
die("takeover was in progress, going down immediately");
if (cf_pause_mode == P_SUSPEND)
die("suspend was in progress, going down immediately");
cf_pause_mode = P_PAUSE;
cf_shutdown = 1;
if (cf_so_reuseport) {
Copy link
Member Author

@JelteF JelteF Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point of discussion: Do we want to piggy-back on SIGINT? Or should we use a new signal for this behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer new signal, see above.

@JelteF JelteF force-pushed the rolling-restart branch 2 times, most recently from 767b4b9 to d53fd21 Compare August 16, 2023 14:34
JelteF added a commit to JelteF/pgbouncer that referenced this pull request Aug 16, 2023
In pgbouncer#902 I want to introduce a 4th shutdown value. So I started using an
enum for this to improve readibility. I think this refactor from using
integer values to enum values stands on its own. So to make reviewing
easier, and not have it be blocked on adding tests and some discussions
for pgbouncer#902 this PR includes only that refactor.
JelteF added a commit to JelteF/pgbouncer that referenced this pull request Aug 16, 2023
In pgbouncer#902 I want to introduce a 4th shutdown value. So I started using an
enum for this to improve readibility. I think this refactor from using
integer values to enum values stands on its own. So to make reviewing
easier, and not have it be blocked on adding tests and some discussions
for pgbouncer#902 this PR includes only that refactor.
JelteF added a commit that referenced this pull request Aug 18, 2023
In #902 I want to introduce a 4th shutdown value. So I started using an
enum for this to improve readibility. I think this refactor from using
integer values to enum values stands on its own. So to make reviewing
easier, and not have it be blocked on adding tests and some discussions
for #902 this PR includes only that refactor.
@JelteF JelteF force-pushed the rolling-restart branch 2 times, most recently from beb2693 to 9a5a806 Compare August 18, 2023 08:27
@JelteF JelteF requested review from petere and eulerto August 18, 2023 08:27
@JelteF JelteF requested review from petere and eulerto August 18, 2023 09:16
JelteF added a commit to JelteF/pgbouncer that referenced this pull request Sep 6, 2023
We were using the full 64 bits of the BackendKeyData message as random
bytes. This turned out to be arguably incorrect, because the first 32
bits are used by PostgreSQL as a Process ID (and this is part of the
[protocol documentation too][1]). Actual PIDs are always positive, but
we also put a random bit in the sign bit so were setting it to negative
numbers half of the time. For most cases this does not matter, but it
turned out that [`pg_basebackup` relies on the PID part to actually be
positive][2].

While it seems semi-likely that `pg_basebackup` will be fixed to support
negative Process IDs, it still seems good to adhere to this implicit
requirement on positive numbers in case other clients also depend on it.

Since this change requires changing the bytes of the cancel key in which
we encode the `peer_id`, this change breaks cancelations in peered
clusters of different PgBouncer versions. This seems like a minor
enough problem that we should not care about this. In practice this
should only happen during a rolling upgrade, which we currently don't
support well anyway (see pgbouncer#902 for improvements on that). And even if we
did, breaking cancellations for a few minutes in this transitional stage
doesn't seem like a huge deal.

[1]: https://www.postgresql.org/docs/current/protocol-message-formats.html
[2]: https://www.postgresql.org/message-id/flat/CAGECzQQOGvYfp8ziF4fWQ_o8s2K7ppaoWBQnTmdakn3s-4Z%3D5g%40mail.gmail.com
JelteF added a commit to JelteF/pgbouncer that referenced this pull request Sep 6, 2023
We were using the full 64 bits of the BackendKeyData message as random
bytes. This turned out to be arguably incorrect, because the first 32
bits are used by PostgreSQL as a Process ID (and this is part of the
[protocol documentation too][1]). Actual PIDs are always positive, but
we also put a random bit in the sign bit so were setting it to negative
numbers half of the time. For most cases this does not matter, but it
turned out that [`pg_basebackup` relies on the PID part to actually be
positive][2].

While it seems semi-likely that `pg_basebackup` will be fixed to support
negative Process IDs, it still seems good to adhere to this implicit
requirement on positive numbers in case other clients also depend on it.

Since this change requires changing the bytes of the cancel key in which
we encode the `peer_id`, this change breaks cancelations in peered
clusters of different PgBouncer versions. This seems like a minor
enough problem that we should not care about this. In practice this
should only happen during a rolling upgrade, which we currently don't
support well anyway (see pgbouncer#902 for improvements on that). And even if we
did, breaking cancellations for a few minutes in this transitional stage
doesn't seem like a huge deal.

[1]: https://www.postgresql.org/docs/current/protocol-message-formats.html
[2]: https://www.postgresql.org/message-id/flat/CAGECzQQOGvYfp8ziF4fWQ_o8s2K7ppaoWBQnTmdakn3s-4Z%3D5g%40mail.gmail.com
petere
petere previously requested changes Oct 4, 2023
Copy link
Member

@petere petere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks like a very good direction to me. We need work out how to handle the signals.


If you have more than two processes, repeat the steps 2, 3 and 4 for each of
them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This procedure somehow emphasizes two processes, and then says that you can have more than two at the end. But the second process isn't used during the procedure. So I think this could be written like this:

  1. Have two or more PgBouncer processes running using so_reuseport and peering. Pick one process to upgrade, let's call it A.
  2. Send SIGINT to process A.
  3. Cause all clients reconnect. [...] Once all clients have reconnected. Process A will exit automatically, because no clients are connected to it anymore.
  4. Start process A again
  5. Repeat all steps for the remaining PgBouncer processes, one at a time.

Also, is peering required for this, or just recommended? (Using so_reuseport is required, as the code is currently written.) It might be good to be precise about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved the description of the steps.

: Safe shutdown. If `so_reuseport` is set to 0, then this is the same as
issuing **PAUSE** and **SHUTDOWN** on the console. If `so_reuseport` is set
to 1 then this is the same as issuing **SHUTDOWN WAIT_FOR_CLIENTS** on
the console.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we should use a separate signal for this. Otherwise, it's too much magic.

I think the best way would be:

  • Keep SIGINT as is.
  • Move the current SIGTERM behavior to SIGQUIT.
  • This new behavior goes to SIGTERM.

It's a compatibility break, but it has some advantages. It restores the intended progression that SIGTERM is "softer" than SIGINT. And it creates some consistency with the postgres/pg_ctl shutdown modes.

We could even have a setting like signal_compat that keeps the old signal assignments. Not sure if it's needed, but it would be possible pretty easily.

Copy link
Member Author

@JelteF JelteF Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the desire to be in line with how Postgres uses signals, I don't think breaking compatibility in such a big way is worth it. Changing SIGTERM would start making a lot of Docker and SystemD stop commands suddenly change their behaviour. I don't think that's something we should be doing.

The reason I suggested changing SIGINT, is because if so_reuseport because there are multiple processes listening is that the current SIGINT behaviour is actively harmful. Because the PgBouncer process is still accepting new connections only to then send an error on them, while one of the other listening procesess might still be accepting connections.
Although I guess it can still make sense if you send SIGINT to all of the processes at the same time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked:

So changing behaviour of any of them seems pretty likely to cause problems in production for some users. I think assigning the new behaviour to SIGQUIT is the least disruptive.

It would be possible to add some config compat flag, to change the meaning of the signals to match postgres its intent. I think that's only likely to cause more confusion, because now you have to know what the config is to know how to stop pgbouncer in the way that you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked:

I didn't find the source code for this, but if it's true, then I think it's possible that it's a mistake, possibly exactly because pgbouncer does SIGTERM and SIGINT backwards.

  • pgdg apt systemd service file uses SIGINT
  • pgdg rpm systemd service file uses SIGINT

So changing behaviour of any of them seems pretty likely to cause problems in production for some users.

My proposal would leave SIGINT unchanged, and almost everybody should be using SIGINT normally.

I'm not saying I'm 100% comfortable, but it doesn't seem too bad.

Copy link
Member Author

@JelteF JelteF Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I implemented this proposal now, and if we put a warning in the release notes I think we should be fine.

The only issue I have is that Windows does not have SIGQUIT, so now you cannot do an immediate shutdown in windows anymore except for using the SHUTDOWN command. Which might be fine, but since sending SIGTERM will now not allow you to connect anymore (due to it closing sockets), there's no way to "upgrade" the shutdown behaviour to a SIGQUIT signal.

One option would be to leave the signal behaviour for Windows unchanged.

Copy link
Member Author

@JelteF JelteF Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I found (and implemented) a rather elegant solution to the windows problem: If a shutdown attempt is already in progress and then a new SIGTERM signal is received, then we will trigger an immediate shutdown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, turns out windows doesn't actually send SIGTERM requests. So this was a non-issue. Still, it seems quite nice to have this "immediate" shutdown happen on repeated signals. I think I'd actually go so far as also wanting to implement this for SIGINT. Having double Ctrl+C do an immediate shutdown seems really nice for interactive usages.

Copy link
Member Author

@JelteF JelteF May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented "double SIGINT" == "immediate shutdown"

*
* This allows for a rolling restart in combination with so_reuseport.
*/
SHUTDOWN_WAIT_FOR_CLIENTS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose "wait for clients" kind of includes "wait for servers"? Not sure if that is worth writing down, just thinking out loud.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose "wait for clients" kind of includes "wait for servers"?

This wasn't the case before, but with my most recent changes it does. I updated the comment to reflect this.

src/pooler.c Outdated Show resolved Hide resolved
src/pooler.c Outdated Show resolved Hide resolved
src/main.c Outdated
@@ -481,8 +481,14 @@ static void handle_sigint(evutil_socket_t sock, short flags, void *arg)
die("takeover was in progress, going down immediately");
if (cf_pause_mode == P_SUSPEND)
die("suspend was in progress, going down immediately");
cf_pause_mode = P_PAUSE;
cf_shutdown = 1;
if (cf_so_reuseport) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer new signal, see above.

src/admin.c Outdated
/* avoid surprise later if cf_shutdown stays set */
if (cf_shutdown) {
/*
* Avoid surprises later if cf_shutdown stays set to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if we're rewriting this comment anyway, we can use a more precise language than "avoid surprises"? I for one don't know what surprise we are trying to avoid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was meant to avoid a future pause to cause the server to be shutdown when sending RESUME during a safe shutdown. I removed this code (and thus also the comment), because resuming from a shutdown is impossible now that we're closing the listening sockets at the start of the shutdown.

src/admin.c Outdated Show resolved Hide resolved
JelteF added a commit to JelteF/pgbouncer that referenced this pull request Oct 5, 2023
We were using the full 64 bits of the BackendKeyData message as random
bytes. This turned out to be arguably incorrect, because the first 32
bits are used by PostgreSQL as a Process ID (and this is part of the
[protocol documentation too][1]). Actual PIDs are always positive, but
we also put a random bit in the sign bit so were setting it to negative
numbers half of the time. For most cases this does not matter, but it
turned out that [`pg_basebackup` relies on the PID part to actually be
positive][2].

While it seems semi-likely that `pg_basebackup` will be fixed to support
negative Process IDs, it still seems good to adhere to this implicit
requirement on positive numbers in case other clients also depend on it.

Since this change requires changing the bytes of the cancel key in which
we encode the `peer_id`, this change breaks cancelations in peered
clusters of different PgBouncer versions. This seems like a minor
enough problem that we should not care about this. In practice this
should only happen during a rolling upgrade, which we currently don't
support well anyway (see pgbouncer#902 for improvements on that). And even if we
did, breaking cancellations for a few minutes in this transitional stage
doesn't seem like a huge deal.

[1]: https://www.postgresql.org/docs/current/protocol-message-formats.html
[2]: https://www.postgresql.org/message-id/flat/CAGECzQQOGvYfp8ziF4fWQ_o8s2K7ppaoWBQnTmdakn3s-4Z%3D5g%40mail.gmail.com
JelteF added a commit to JelteF/pgbouncer that referenced this pull request Oct 5, 2023
We were using the full 64 bits of the BackendKeyData message as random
bytes. This turned out to be arguably incorrect, because the first 32
bits are used by PostgreSQL as a Process ID (and this is part of the
[protocol documentation too][1]). Actual PIDs are always positive, but
we also put a random bit in the sign bit so were setting it to negative
numbers half of the time. For most cases this does not matter, but it
turned out that [`pg_basebackup` relies on the PID part to actually be
positive][2].

While it seems semi-likely that `pg_basebackup` will be fixed to support
negative Process IDs, it still seems good to adhere to this implicit
requirement on positive numbers in case other clients also depend on it.

Since this change requires changing the bytes of the cancel key in which
we encode the `peer_id`, this change breaks cancelations in peered
clusters of different PgBouncer versions. This seems like a minor
enough problem that we should not care about this. In practice this
should only happen during a rolling upgrade, which we currently don't
support well anyway (see pgbouncer#902 for improvements on that). And even if we
did, breaking cancellations for a few minutes in this transitional stage
doesn't seem like a huge deal.

[1]: https://www.postgresql.org/docs/current/protocol-message-formats.html
[2]: https://www.postgresql.org/message-id/flat/CAGECzQQOGvYfp8ziF4fWQ_o8s2K7ppaoWBQnTmdakn3s-4Z%3D5g%40mail.gmail.com
JelteF added a commit that referenced this pull request Oct 5, 2023
We were using the full 64 bits of the BackendKeyData message as random
bytes. This turned out to be arguably incorrect, because the first 32
bits are used by PostgreSQL as a Process ID (and this is part of the
[protocol documentation too][1]). Actual PIDs are always positive, but
we also put a random bit in the sign bit so were setting it to negative
numbers half of the time. For most cases this does not matter, but it
turned out that [`pg_basebackup` relied on the PID part to actually be
positive][2].

While `pg_basebackup` is now fixed to support negative Process IDs, it
still seems good to adhere to this implicit requirement on positive
numbers in case other clients also depend on it.

Since this change requires changing the bytes of the cancel key in which
we encode the `peer_id`, this change breaks cancelations in peered
clusters of different PgBouncer versions. This seems like a minor
enough problem that we should not care about this. In practice this
should only happen during a rolling upgrade, which we currently don't
support well anyway (see #902 for improvements on that). And even if we
did, breaking cancellations for a few minutes in this transitional stage
doesn't seem like a huge deal.

[1]: https://www.postgresql.org/docs/current/protocol-message-formats.html
[2]: https://www.postgresql.org/message-id/flat/CAGECzQQOGvYfp8ziF4fWQ_o8s2K7ppaoWBQnTmdakn3s-4Z%3D5g%40mail.gmail.com
@JelteF JelteF mentioned this pull request Feb 2, 2024
@JelteF JelteF force-pushed the rolling-restart branch 3 times, most recently from 85f15d3 to df78847 Compare February 12, 2024 16:32
@JelteF JelteF force-pushed the rolling-restart branch 2 times, most recently from efe4a6a to 70d6414 Compare February 20, 2024 21:50
@JelteF JelteF changed the title Better support for rolling restarts using so_reuseport Support for rolling restarts using so_reuseport May 6, 2024
@JelteF JelteF force-pushed the rolling-restart branch 3 times, most recently from a7c34dd to f866b57 Compare May 6, 2024 12:51
@JelteF JelteF changed the title Support for rolling restarts using so_reuseport Redesign shutdown logic to support rolling restarts using so_reuseport May 6, 2024
@JelteF JelteF force-pushed the rolling-restart branch 6 times, most recently from d762825 to c2ec342 Compare May 6, 2024 14:21
@JelteF JelteF dismissed petere’s stale review May 6, 2024 14:26

All feedback is now addressed

@JelteF JelteF enabled auto-merge (squash) May 6, 2024 14:31
@JelteF JelteF merged commit de33cd0 into pgbouncer:master May 6, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deployment on Heroku: add options to handle SIGTERM
2 participants