Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with restore from backup created from replica. #616

Open
artemsafiyulin opened this issue Dec 7, 2023 · 2 comments
Open

Problem with restore from backup created from replica. #616

artemsafiyulin opened this issue Dec 7, 2023 · 2 comments

Comments

@artemsafiyulin
Copy link

artemsafiyulin commented Dec 7, 2023

Hello! I have a problem with restore backups, witch was make from stand-by server. I have next configuration:

Master-replica based on patroni.

Backup server for creating and testing backups.

If i create backup from master, they restored correctly, but if i create backup from replica, when backup restored and i try start postgresql, i got next error:

2023-12-07 08:08:10 INFO: Starting postgresql
waiting for server to start....2023-12-07 08:08:11.079 GMT [148] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-12-07 08:08:11.103 GMT [148] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-07 08:08:11.122 GMT [148] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-12-07 08:08:11.205 GMT [149] LOG:  database system was interrupted; last known up at 2023-12-07 07:26:47 GMT
2023-12-07 08:08:11.205 GMT [149] LOG:  creating missing WAL directory "pg_wal/archive_status"
.2023-12-07 08:08:12.148 GMT [149] FATAL:  requested timeline 30 does not contain minimum recovery point 572E/FB416508 on timeline 0
2023-12-07 08:08:12.149 GMT [148] LOG:  startup process (PID 149) exited with exit code 1
2023-12-07 08:08:12.149 GMT [148] LOG:  aborting startup due to startup process failure
2023-12-07 08:08:12.162 GMT [148] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server

Both postgresql servers has identical cofiguration:

postgres=# select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 11.22 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)

postgres=# \q
-bash-4.2$ pg_probackup-11 --version
pg_probackup-11 2.5.12 (PostgreSQL 11.14)
-bash-4.2$ cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

-bash-4.2$

Can you please help me with this problem?

P.S. command for creating backup:

pg_probackup-11 backup -B /backup/my_instance-11 --instance=my_instance-11 -b FULL -j 4 --pghost=master_or_replica.domain.com --stream --archive-timeout=1200 --delete-wal --delete-expired

command for restoring backup:

pg_probackup-11 restore -B /backup/my_instance-11/ --instance=my_instance-11 -D /var/lib/pgsql/11/data -j 4 --remote-proto=none
@artemsafiyulin
Copy link
Author

Hello again!

I found root cause why when I restore backup maked from replica, I get this error. Reason in pg_control file.
When I maked backup from replica, I found in backup files pg_controldata with next content:

pg_control version number:            1100
Catalog version number:               201809051
Database system identifier:           6848477509408503714
Database cluster state:               in production
pg_control last modified:             Mon Jan 15 06:42:42 2024
Latest checkpoint location:           5806/B15A6510
Latest checkpoint's REDO location:    5806/A4BE5E90
Latest checkpoint's REDO WAL file:    0000001E00005806000000A4
Latest checkpoint's TimeLineID:       30
Latest checkpoint's PrevTimeLineID:   30
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          2:969059430
Latest checkpoint's NextOID:          8751767
Latest checkpoint's NextMultiXactId:  176014
Latest checkpoint's NextMultiOffset:  847578
Latest checkpoint's oldestXID:        771379076
Latest checkpoint's oldestXID's DB:   16401
Latest checkpoint's oldestActiveXID:  969059420
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16401
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon Jan 15 06:29:12 2024
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     5806/BC3E1C48
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                on
max_connections setting:              1600
max_worker_processes setting:         16
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            05e779c6bfeca4f438c6584fba46cb70b15bcdb6a3679b4cc227f6565d9d0ad9

This file looks like file copied from master server, not from replica. I checked this theory on my another server, and it approved (when I maked backup from replica from another clusters, in backup files i found pg_control copied from replica).

Than I tried copy pg_control manualy from replica after backup starting. I attached content of this file:

pg_control version number:            1100
Catalog version number:               201809051
Database system identifier:           6848477509408503714
Database cluster state:               in archive recovery
pg_control last modified:             Mon Jan 15 05:26:23 2024
Latest checkpoint location:           5806/54E1C128
Latest checkpoint's REDO location:    5806/415E9B68
Latest checkpoint's REDO WAL file:    0000001E0000580600000041
Latest checkpoint's TimeLineID:       30
Latest checkpoint's PrevTimeLineID:   30
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          2:968675663
Latest checkpoint's NextOID:          8751767
Latest checkpoint's NextMultiXactId:  176014
Latest checkpoint's NextMultiOffset:  847578
Latest checkpoint's oldestXID:        768678501
Latest checkpoint's oldestXID's DB:   16401
Latest checkpoint's oldestActiveXID:  968675663
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 18073
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon Jan 15 04:59:12 2024
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     5806/637B1BF8
Min recovery ending loc's timeline:   30
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                on
max_connections setting:              1600
max_worker_processes setting:         16
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            05e779c6bfeca4f438c6584fba46cb70b15bcdb6a3679b4cc227f6565d9d0ad9

And when I change "original" backuped pg_control (look first code block in this message) to manualy copied pg_control (look second code block in this message) - backup restored and started correctly.

Also after restoing backup i checked that data doesn't have corruption. In this cluster checksums doesn't enabled, but I
run pg_dumpall for check that all data can be readed and doesn't have corruption.

Could you help me please found reason why when i make backup I got incorrect pg_control? And how I can fix it?

@funny-falcon
Copy link
Collaborator

There is no way inside of pg_probackup to copy single file from other host.

I believe, all files were copied from master, and WAL were streamed from slave.

That is because pghost were specified with dinamic domain, but no remote-host were specified. So that, SSH connection were made to host used with pg_probackup add-instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants