Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quorum commit feature #672

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Quorum commit feature #672

wants to merge 13 commits into from

Conversation

ants
Copy link
Collaborator

@ants ants commented May 1, 2018

I did some work on the subject, did not have enough time to get it finished, but far enough to solicit some feedback.

Description

The feature generalizes synchronous_mode to multiple nodes being in sync. Trade-off between synchronization overhead and fault tolerance is user selectable via replication_factor configuration parameter, specifying how many nodes must contain a commit before it is acknowledged.

Cluster will retain self-healing capability if within one HA loop interval up to replication_factor - 1 nodes fail.

The old synchronous_commit corresponds to replication_factor = 2.

On PostgreSQL 10 the quorum commit ANY k (n) feature is used. On 9.6 the k (n) variant is used picking first nodes. On even older versions maximum replication_factor is 2. However, this is still an improvement from status quo as it delegates sync standby selection to PostgreSQL reducing commit latency on sync standby failure.

DCS sync state is changed from {"leader": .., "sync_standby": ..} to {"leader": .., "quorum": .., "members": [..]}, where leader is kept mainly as an informative field.

State

The main quorum management engine should be in a reasonable state although I haven't tested it yet.

Open issues:

  • Is replication_factor a reasonable UI for this? Do we want to keep synchronous_commit as the main switch for this functionality.
  • Nodes that aren't actually taking part in synchronous replication can still promote themselves if they can verify their wal position with enough nodes that are taking part. Does anyone see a problem with this? It causes minor extra complexity when promoting such node, but on the other hand simplifies health checks.
  • I could not figure out a way how to find out from PostgreSQL when synchronous_standby_names becomes active. Currently there is a small race condition there. I'm not even sure if the previous check of pg_stat_replication showing walsender as sync was enough as a cursory review of postgres code hints that a walsender might think it is sync while a backend hasn't gotten the message that sync replication was enabled. More study of the code is needed.
  • Quorum engine manages transitioning DCS and sync state between "safe" states. If somebody manually modifies state into something we don't think of as valid, there currently isn't any code to automatically recover. Should be simple enough, I just didn't get to it.
  • State handling when promoting could be improved. I think we want to keep synchronous replication running with mostly the same settings as before failover, but remove any nodes that we know are down based on DCS information. Speeds up being able to accept commits.
  • Testsuite is obviously broken. Needs fixing up of old tests and new tests to exercise corner cases of the new stuff.
  • patronictl UI needs updating. How do we want to display the information?
  • There is an impedance mismatch between quorum engine and synchronous_standby_names in that quorum engine thinks of leader as just one additional synchronous replica. Currently the boundary transition between two worlds is in postgresql.py. Maybe some other place would work better.
  • flake8 is probably going to have a fit on this code. A bunch of cleanup is needed.

Generalizes synchronous_mode to multiple nodes being in sync. Trade-off
between synchronization overhead and fault tolerance is user selectable
via `replication_factor` configuration parameter, specifying how many
nodes must contain a commit before it is acknowledged.

Cluster will retain self-healing capability if within one HA loop interval
up to `replication_factor - 1` nodes fail.

The old `synchronous_commit` corresponds to `replication_factor = 2`.

On PostgreSQL 10 the quorum commit `ANY k (n)` feature is used. On 9.6
the `k (n)` variant is used picking first nodes. On even older versions
maximum `replication_factor` is 2. However, this is still an improvement
from status quo as it delegates sync standby selection to PostgreSQL reducing
commit latency on sync standby failure.

DCS sync state is changed from `{"leader": .., "sync_standby": ..}` to
`{"leader": .., "quorum": .., "members": [..]}`, where leader is kept mainly
as an informative field.
Copy link
Collaborator

@CyberDem0n CyberDem0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent a few hours trying to understand how QuorumStateResolver works. Not much success :(
It definitely deserves a few sentences explaining how it supposed to work.

if name != self._synchronous_standby_names:
if name is None:
if state == 'sync' or (state == 'potential' and current is None):
current = member.name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch seems to be unused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. That should be sync_state and it looks like I forgot to finish the code to make sure that if we can only do one sync standby (version <= 9.5) we don't switch the standby around. Will fix.

else:
assert self.name in sync
sync_standbys = sync.difference([self.name])
standby_list = ", ".join(sorted(sync_standbys)) if sync_standbys else "*"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lost quote_ident.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I will add it back in.

current = cluster.sync.sync_standby
current = current.lower() if current else current
sync_standby_names = self.query("SHOW synchronous_standby_names")
result = parse_sync_standby_names(sync_standby_names)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to parse it all the time? Is it some kind of protection from unexpected changes from outside?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically yes. It helps us automatically resolve the situation if due to a bug or administrator intervention our understanding of the state goes out of sync. If we aren't too worried about that we could cache the state and avoid spamming the database with unnecessary queries. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it could be changed with 'ALTER SYSTEM' which takes precedence over postgresql.conf :(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could detect that from pg_settings. Now the question is what should the system do in that situation? Override the administrator with ALTER SYSTEM RESET synchronous_standby_names, make a quorum state to match the synchronization state, just complain in the logs where nobody is looking anyway, panic with sys.exit(1)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like executing ALTER SYSTEM RESET synchronous_standby_names is the best we can do.
In addition to that we can write warning into the log with previous value.

# First get to requested replication factor
increase_numsync_by = self.sync_wanted - self.numsync
if increase_numsync_by:
add = set(sorted(to_add)[:increase_numsync_by])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I don't really understand.

It is responsible for https://github.com/zalando/patroni/pull/672/files#diff-57e7f41a6358e844d5245ea7b8409311R12

+        self.assertEquals(list(QuorumStateResolver(1, set("a"), 1, set("a"), active=set("abcde"), sync_wanted=3)), [
+            ('sync', 3, set('abc')),
+            ('quorum', 3, set('abcde')),
+            ('sync', 3, set('abcde')),
+        ])

Why does it fist adds only 'b' and 'c'? Why not all 4 at the same time? Something like:

+            ('sync', 3, set('abcde')),
+            ('quorum', 3, set('abcde')),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be a correct optimization. Currently the algorithm doesn't take into account that the master is special and always guaranteed to be one of the synced nodes. This may lead to extra steps when adding multiple nodes at the same time. On the plus side this allows for promotion before we update sync state in DCS. I need to think a bit about how the algorithm would look like if we consider that master to be special. If it makes it considerably more complicated than it already is, then it might not be worth the minor improvement in efficiency. I will also write down how and why the QuorumStateResolver works.

return
logger.info("Synchronous standby status assigned to %s", picked)
elif transition == 'sync':
logger.info("Setting synchronous replication to %d of %d (%s)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is misleading when num < min_sync.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will correct.

sync_wanted=sync_wanted):
if transition == 'quorum':
logger.info("Setting quorum to %d of %d (%s)", num, len(nodes), ", ".join(sorted(nodes)))
if not self.dcs.write_sync_state(leader=self.state_handler.name, quorum=num, members=list(nodes),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_sync_state - this method should be also changed, no ?
here you are passing leader, quorum, membres and index as a parameters, but signature of method wasn't changed in this pull request.

def write_sync_state(self, leader, sync_standby, index=None):

@anikin-aa
Copy link
Contributor

@ants If there is any chance You are getting back to work on this feature ?
I am pretty interested in this one, if You are need any help, feel free to contact me.

"""
current = cluster.sync.sync_standby
current = current.lower() if current else current
sync_standby_names = self.query("SHOW synchronous_standby_names")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I can see here https://github.com/zalando/patroni/blob/master/patroni/postgresql.py#L1509
self.query - will return cursor, not the result of query execution. And after that You are passing cursor to
sync_rep_parser_re.finditer its not acceptable.

@ants
Copy link
Collaborator Author

ants commented Aug 3, 2018 via email

@anikin-aa
Copy link
Contributor

@ants @CyberDem0n

I have done some changes and this feature is working as a prototype, but without usage of new variables replication_factor and minimum_replication_factor, simply just accepting in quorum all applications from pg_stat_replication (everyone should be in sync in the quorum).

Also its pretty hard to understand the purpose of the QuorumStateResolver. Just commented this part.

Is my solution acceptable ? Or should I just fix everything in current PR ?

simply just accepting in quorum all applications (everyone should be in sync in the quorum)

This was referenced Aug 22, 2018
@ants
Copy link
Collaborator Author

ants commented Nov 23, 2018

Getting closer to something I am happy with. Work still to be done:

  • Fill the couple of holes in test coverage.
  • Review and polish configuration, API and patronictl interfaces to this feature. Some known issues:
    • Currently sync standby state display is based on DCS state not on information from master. Would be neat if we can get the contents of synchronous_standby_names from master and use this to show actual state.
    • Would be great if patronictl list shows a warning if cluster is read-only due to not enough nodes to satisfy minimum_replication_factor.
  • Handling standby picking in pre-9.6 versions as it was before this feature. Add tests to verify this.
  • Feature tests.
  • Run real-life torture tests to identify any holes in testing.
  • Improve comments until other people besides me can understand how this works.

I should be able to pick this up sometime next week, publishing what I have so far to solicit feedback.

@anikin-aa
Copy link
Contributor

Guys, any updates on this PR ?

Copy link
Collaborator

@CyberDem0n CyberDem0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found a small corner case, when I started the new single node cluster with {synchronous_mode: true, replication_factor: 2, minimum_replication_factor: 2} in bootstrap.dcs it didn't set synchronous_standby_names.

@@ -309,15 +309,15 @@ def is_failover_possible(self, cluster, leader, candidate, action):
if leader and (not cluster.leader or cluster.leader.name != leader):
return 'leader name does not match'
if candidate:
if action == 'switchover' and cluster.is_synchronous_mode() and cluster.sync.sync_standby != candidate:
return 'candidate name does not match with sync_standby'
if action == 'switchover' and cluster.is_synchronous_mode() and cluster.sync.matches(candidate):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and not cluster.sync.matches(candidate):

@@ -47,6 +47,8 @@ class Config(object):
'master_start_timeout': 300,
'synchronous_mode': False,
'synchronous_mode_strict': False,
'replication_factor': 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand the status of synchronous_mode and synchronous_mode_strict. From one side it seems they are redundant, but from another side, they are still used in the ha.py and postgresql.py

@property
def use_multiple_sync(self):
return self._major_version >= 90600

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 304 references synchronous_mode_strict, which is kind of becomes deprecated.

#
# Regardless of voting, if we observe a node that can become a leader and is ahead, we defer to that node.
# This can lead to failure to act on quorum if there is asymmetric connectivity.
quorum_votes = 1 if self.state_handler.name in voting_set else 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be also good to add some info output here about the current quorum and voting set. It will make some potential investigations easier.
Right now such info output is only available on the master when it processes topology changes, but it might happen that logs from the master are unrecoverable.


def quorum_update(self, quorum, voters):
if quorum < 1:
raise QuorumError("Quorum %d < 0 of (%s)" % (quorum, voters))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception message doesn't match the condition: quorum < 1 and %d < 0


def check_invariants(self):
if self.quorum and not (len(self.voters | self.sync) < self.quorum + self.numsync):
raise QuorumError("Quorum and sync not guaranteed to overlap: nodes %d >= quorum %d + sync %d" %
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I removed sync key from the DCS the master crashed:

$ etcdctl rm /service/batman/sync
PrevNode.Value: {"leader":"postgresql2","members":["postgresql1","postgresql0","postgresql2"],"quorum":2}

2019-02-10 09:51:33,719 INFO: Lock owner: postgresql2; I am postgresql2
2019-02-10 09:51:33.720 CET [10143] LOG:  received fast shutdown request
2019-02-10 09:51:33.725 CET [10143] LOG:  aborting any active transactions
2019-02-10 09:51:33.725 CET [10178] FATAL:  terminating connection due to administrator command
2019-02-10 09:51:33.730 CET [10143] LOG:  background worker "logical replication launcher" (PID 10193) exited with exit code 1
2019-02-10 09:51:33.731 CET [10146] LOG:  shutting down
2019-02-10 09:51:33.794 CET [10143] LOG:  database system is shut down
2019-02-10 09:51:33,813 INFO: Lock owner: postgresql2; I am postgresql2
Traceback (most recent call last):
  File "./patroni.py", line 6, in <module>
    main()
  File "/home/akukushkin/git/ants/patroni/patroni/__init__.py", line 182, in main
    return patroni_main()
  File "/home/akukushkin/git/ants/patroni/patroni/__init__.py", line 148, in patroni_main
    patroni.run()
  File "/home/akukushkin/git/ants/patroni/patroni/__init__.py", line 119, in run
    logger.info(self.ha.run_cycle())
  File "/home/akukushkin/git/ants/patroni/patroni/ha.py", line 1370, in run_cycle
    info = self._run_cycle()
  File "/home/akukushkin/git/ants/patroni/patroni/ha.py", line 1343, in _run_cycle
    msg = self.process_healthy_cluster()
  File "/home/akukushkin/git/ants/patroni/patroni/ha.py", line 966, in process_healthy_cluster
    'promoted self to leader because i had the session lock'
  File "/home/akukushkin/git/ants/patroni/patroni/ha.py", line 566, in enforce_master_role
    self.process_sync_replication()
  File "/home/akukushkin/git/ants/patroni/patroni/ha.py", line 426, in process_sync_replication
    sync_wanted=sync_wanted):
  File "/home/akukushkin/git/ants/patroni/patroni/quorum.py", line 84, in __iter__
    transitions = list(self._generate_transitions())
  File "/home/akukushkin/git/ants/patroni/patroni/quorum.py", line 95, in _generate_transitions
    self.check_invariants()
  File "/home/akukushkin/git/ants/patroni/patroni/quorum.py", line 62, in check_invariants
    (len(self.voters | self.sync), self.quorum, self.numsync))
patroni.quorum.QuorumError: Quorum and sync not guaranteed to overlap: nodes 3 >= quorum 1 + sync 2

Copy link
Collaborator

@CyberDem0n CyberDem0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a lot of corner-cases are not covered in the QuorumStateResolver :(

# First get to requested replication factor
logger.debug("Adding nodes: %s", to_add)
increase_numsync_by = self.sync_wanted - self.numsync
if increase_numsync_by:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increase_numsync_by could become negative.
example:

QuorumStateResolver(quorum=2, voters=set('abcdef'),
                    numsync=5, sync=set('abcdef'),
                    active=set('abcdefg'), sync_wanted=4)


safety_margin = self.quorum + self.numsync - len(self.voters | self.sync)
if safety_margin > 1:
logger.debug("Case 3: replication factor is bigger than needed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is not covered by tests, so I tried to do it:

QuorumStateResolver(quorum=3, voters=set('abcdef'),
                    numsync=5, sync=set('abcdef'),
                    active=set('abcdef'), sync_wanted=3)

and got an exception QuorumError: Quorum and sync not guaranteed to overlap: nodes 6 >= quorum 3 + sync 3 from sync_update()

@Jan-M
Copy link
Member

Jan-M commented Mar 6, 2019

This PR suddenly is more interesting. Can this Pullrequest be made easier for non support of PG9.6? I expected this to be less complicated given Postgres supports Quorum mode now.

@CyberDem0n CyberDem0n added this to the 2.0 milestone Feb 7, 2020
We can then handle any node that is ahead but unable to promote.
Downside is that we might potentially need to rewind more and/or lose
a couple more unsychronized transactions.
if SyncState is empty.
:param quorum: number of servers from synchronous set we need to see to know we see the latest
commit.
:param members: set of member names that participate in determining quorum.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncState uses the word "members" while everywhere else in the algorithm the term voters is used. Can you please name it consistently ?

from patroni.quorum import QuorumStateResolver, QuorumError

class QuorumTest(unittest.TestCase):
def test_1111(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to explicitly spell out how test names map to the content of the state variable; it is not obvious at first glance

logger.info("Removing synchronous privilege from %s", current)
if not self.dcs.write_sync_state(self.state_handler.name, None, index=self.cluster.sync.index):
sync_state = self.state_handler.current_sync_state(self.cluster)
numsync = min(sync_state['numsync'], len(sync_state['sync']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this line be moved to self.state_handler.current_sync_state(self.cluster) ?

# Active set matches state
self.assertEqual(list(QuorumStateResolver(*state, active=set("ab"), sync_wanted=3)), [
])
# Add node by increasing quorum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this description a bit too short , better would be to have sth like

add a sync standby without increasing the replication factor; must increase quorum to avoid losing commits on failover

('quorum', 2, set('abc')),
('sync', 2, set('abc')),
])
# Add node by increasing sync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here:

add a sync standby and increase the replication factor in the corner case where before and after adding a sync standby all nodes of a PG cluster must ack the commit

('quorum', 3, set('abcde')),
('sync', 3, set('abcde')),
])
# Master is alone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that removing a sync standby from a 2-node cluster ?

logger.debug("Case 1: synchronous_standby_names subset of DCS state")
# Case 1: quorum is superset of sync nodes. In the middle of changing quorum.
# Evict from quorum dead nodes that are not being synced.
remove_from_quorum = self.voters - (self.sync | self.active)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it happen at this point that self.sync still contains some node that ceased to be active and therefore must be evicted from voters ? In other words, why do we subtract here the union and not self.sync & self.active or simply self.active ?

candidates = []
# Pick candidates based on who has flushed WAL farthest.
# TODO: for synchronous_commit = remote_write we actually want to order on write_location

for app_name, state, sync_state in self.query(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync_state is no longer used in the proposed change

if candidates:
return candidates[0], False
return None, False
active.append(member.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check: a standby with pg_stat_replication.sync_state in ('async', 'potential'): is still considered active for the purposes of QuorumStateResolver?

if remove_from_sync:
yield self.sync_update(
numsync=min(self.sync_wanted, len(self.sync) - len(remove_from_sync)),
sync=self.sync - remove_from_sync)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

if remove_from_sync:
      remaining_sync = self.sync - remove_from_sync
      yield self.sync_update(
               numsync=min(self.sync_wanted, len(remaining_sync)),
               sync=remaining_sync)

# Case 3: quorum or replication factor is bigger than needed. In the middle of changing replication factor.
if self.numsync > self.sync_wanted:
# Reduce replication factor
new_sync = clamp(self.sync_wanted, min=len(self.voters) - self.quorum + 1, max=len(self.sync))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync denotes a set everywhere else, so here just for the sake of consistency, new_sync should be new_numsync

- A node that is not the leader or current synchronous standby is not allowed to promote itself automatically.

Patroni will only ever assign one standby to ``synchronous_standby_names`` because with multiple candidates it is not possible to know which node was acting as synchronous during the failure.
When in synchronous mode Patroni maintains synchronization state in the DCS, containing the latest primary, number of nodes required for quorum and nodes currently eligible to vote on quorum. In steady state the nodes voting on quorum are the leader and all synchronous standbys. This state is updated with strict ordering constraints with regards to node promotion and ``synchronous_standby_names`` to ensure that at all times any subset of voters that can achieve quorum is contained to have at least one node having the latest successful commit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In steady state the nodes voting on quorum are the leader and all synchronous standbys.

Judging from this line, steady state also means updates to quorum and/or replication factor have completed (as opposed to simply having self.voters == self.sync).

@ksarabu1
Copy link
Contributor

We are looking into implementing Patroni in-house and this feature is one of the shortfall holding us right now. We are very much interested in this enhancement and wondering if we can be any help in getting this PR moving. We are open to collaborate and contribute. Thanks.

@ksarabu1
Copy link
Contributor

ksarabu1 commented Jun 4, 2020

We utilize mix of synchronous hot standbys (with synchronous_commit set to remote_apply
for read-consistency between master/sync_standby) & async standby
databases where the applications tolerate staleness.

With the Quorum based (ANY) implementation, the databases that receives the change
synchronously from the list may be random (unless we set sync_num matching to # of replicas).

My suggestion would be to extend current one synchronous standby implementation to
support precise multiple guaranteed sync standbys based on maximum/minimum replication factor. Synchronous standby members list can be updated by Patroni as the members join/leave to maintain maximum/minimum limit of no of sync standby's.

This functionality can be further extended to make it generic using new parameter(s) for providing
additional support for Priority (FIRST n) based synchronous replication & Quorum (ANY n) based synchronous replication.

Please let me know your thoughts.

Thanks

@hughcapet hughcapet removed this from the 2.0 milestone Nov 1, 2022
CyberDem0n added a commit that referenced this pull request Jan 24, 2023
When `synchronous_standby_names` GUC is changed PostgreSQL nearly immediately starts reporting corresponding walsenders as synchronous, while in fact they maybe didn't reach this state yet. To mitigate this problem we memorize current flush lsn on the primary right after change of `synchronous_standby_names` got visible and use it as an additional check for walsenders.
The walsender will be counted as truly "sync" only when write/flush/replay_lsn on it reached memorized LSN and the `application_name` is known to be a part of `synchronous_standby_names`.

The size of PR mostly related to refactoring and moving the code responsible for working with `synchronous_standby_names` and `pg_stat_replication` to the dedicated file.
And `parse_sync_standby_names()` function was mostly copied from #672.
@hughcapet hughcapet linked an issue Dec 3, 2023 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: design for supporting synchronous quorum commit in Patroni
7 participants