New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DB is corrupted if slave is on different timeline from master, but same timeline ID #1251
Comments
Hi @Vanav, Yes, this is a problem. When both instances are on the same timeline, pg_rewind will not do anything (even on pg12): $ /usr/lib/postgresql/12/bin/pg_rewind -D data/pg12.1 -n -P --source-server="dbname=postgres port=5433"
pg_rewind: connected to server
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required Patroni knows it and therefore doesn't even trying to execute pg_rewind. If you look into history file, it becomes clear that timelines are diverging: $ diff -u data/pg12.{1,2}/pg_wal/00000002.history
--- data/pg12.1/pg_wal/00000002.history 2019-10-25 08:11:30.438195187 +0200
+++ data/pg12.2/pg_wal/00000002.history 2019-10-25 08:10:53.758962843 +0200
@@ -1 +1 @@
-1 0/3011A10 no recovery target specified
+1 0/30000D8 no recovery target specified In theory, we can implement a workaround in Patroni, i.e. always compare history files and advance to the new timeline the node which must be rewound. But I am not sure that we should do that, because you've got into such a situation artificially, by stopping Patroni on db2 and manually promoting postgres. You actions are not based on quorum, therefore you must also take care about fencing the old primary (db1). That wasn't done and you've got to such a weird situation. |
It is ok. But instead of I think it is a good idea to add timelines comparison to Patroni, like pg_rewind does (just skip that line of code where it compares ID first and do full comparison of timelines as in following code). I know that I had to do manual disaster recovery and I'm responsible for all of this, but I'd like Patroni to help me in recovery and automate some stages. At least it should fail, but don't corrupt DB. I can imagine second scenario, a bit different:
|
Maybe create another issue with a (not critical) feature request and close this one. The title is scary and the issue has been open for too long. @Vanav should confirm he forced Patroni to skip quorum and created the DB corruption accidentally. |
I agree with you. This issue is not critical. But I have cases in real life, when I need to skip quorum for disaster recovery, and then start Patroni, which leads to corruption. I suggest to add extra safety check and to compare full timelines (not just TimelineID). |
Steps to reproduce
db1
is master,db2
is slave. They are on timeline ID 1 (db1).Split brain happens. Network links are degraded, cluster lost majority in both
db1
anddb2
and is not accessible.db1
demotes to read-only.I'm required to do manual failover to
db2
. I stopdb2
Patroni and manually promotedb2
to new master. New timeline ID 2 (db2) is created. I need to stopdb2
Patroni because I don't want Patroni to convert it back to slave later after connection is restored.Connection is restored, cluster becomes available.
db1
is promoted back to master. New timeline ID 2 (db1) is created.Now I have two masters with different timelines, but same timeline ID:
db2
is a new real master after manual failover, anddb1
that now need to be converted to slave.I stop
db1
Patroni (to release cluster leader), startdb2
Patroni (it grabs cluster leader), and startdb1
Patroni (will become slave).db1
Patroni compares timeline IDs, see no difference, starts replication and it corrupts db1. This happens because timelines are really different, but Patroni and Postgres can't detect it.Workaround
On step 5, before starting
db1
Patroni, I need to manually demotedb1
and promote again to create new timeline ID 3 (db1). Now Patroni should notice that timelines are different, runpg_rewind
that will correctly synchronizedb1
.Questions
Is there anything that can be improved by Patroni on step 6? Is there a way to differentiate timeline ID 2 (db1) and timeline ID 2 (db2)?
Do I need to stop
db2
Patroni on step 2? Will cluster accept new masterdb2
, or it will force keep old masterdb1
?Is there a better way to start new timeline manually (see workaround)?
May be related to #890.
Patroni 1.6.0, Postgres 9.6.
Logs and details
db2
timelines:db1
timelines:pg_rewind
ondb1
on step 6:pg_rewind
ondb1
after manual timeline creation (workaround):Error messages of corrupted
db1
:Comparison of
pg_controldata
ondb2
anddb1
on step 4 (only differences):The text was updated successfully, but these errors were encountered: