Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating replica not completed #747

Open
zpavlig opened this issue Jul 21, 2018 · 4 comments
Open

creating replica not completed #747

zpavlig opened this issue Jul 21, 2018 · 4 comments

Comments

@zpavlig
Copy link

zpavlig commented Jul 21, 2018

Hello

One node with cluster patroni was off. After shut up Lag in MB was 1Gb and nothing happened for a long time.
The command was executed: patronictl -c /opt/app/patroni/etc/postgresql.yml reinit, start "creating replica", 24 hours have passed and not done replica.
if you run the command "df -h" you can see how the free space is reduced, but how to complete 27Gb (DB size Leader = 27Gb), it all begins again, and continues again.

@mrquokka
Copy link

I am just a newbie, but if you create second config patroni node on this machine and start, does it create replica?

@alexeyklyukin
Copy link
Contributor

It looks like something happens during the basebackup that forces Patroni to start over. Depending on your create_replica_method Patroni does nothing more than calling a specific script for initializing the new replica. By default it calls pg_basebackup. You can either look for some error-like messages in the Patroni or PostgreSQL logs on the replica being initialized.

@CyberDem0n
Copy link
Collaborator

Hmm, I think we can do better job in verifying the state of the data directory.
Most of the (good) backup tools will restore global/pg_control on the last step.
Missing pg_control could only mean that there is no other way to fix the cluster other than restoring it from a basebackup.

@bj-pop
Copy link

bj-pop commented May 8, 2023

Hi, I encountered same issue after our network had some issues on the Switch.

May 8 11:40:27 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:27,345 INFO: bootstrap from leader 'PATRONIDB1' in progress
May 8 11:40:37 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:37,340 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3
May 8 11:40:37 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:37,345 INFO: bootstrap from leader 'PATRONIDB1' in progress
May 8 11:40:47 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:47,341 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3
May 8 11:40:47 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:47,347 INFO: bootstrap from leader 'PATRONIDB1' in progress
May 8 11:40:50 PATRONIDB3 patroni[101246]: pg_basebackup: error: could not create directory "/dati/pgdata": File exists
May 8 11:40:50 PATRONIDB3 patroni[101246]: pg_basebackup: removing contents of data directory "/dati"
May 8 11:40:54 PATRONIDB3 patroni[101246]: pg_basebackup: changes to tablespace directories will not be undone
May 8 11:40:54 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:54,850 ERROR: Error when fetching backup: pg_basebackup exited with code=1
May 8 11:40:54 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:54,851 ERROR: failed to bootstrap from leader 'PATRONIDB1'
May 8 11:40:54 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:54,851 INFO: Removing data directory: /dati
May 8 11:40:54 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:54,852 ERROR: Could not remove data directory /dati

by goggling it seems to me that my tablespaces inside the data directory
https://stackoverflow.com/questions/27125686/pg-basebackup-fails-with-message-could-not-create-directory

root@PATRONIDB3:/var/log# systemctl status patroni
● patroni.service - PostgreSQL high-availability manager
Loaded: loaded (/lib/systemd/system/patroni.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-05-08 09:17:36 CEST; 2h 23min ago
Main PID: 85889 (patroni)
Tasks: 7 (limit: 28683)
Memory: 46.2M
CGroup: /system.slice/patroni.service
├─ 85889 /usr/bin/python3 /usr/bin/patroni /etc/patroni/patroni.yml
└─104042 /usr/lib/postgresql/14/bin/pg_basebackup --pgdata=/dati -X stream --dbname=dbname=postgres user=replicator host=10.x.x.x port=5432

May 08 11:40:54 PATRONIDB3 patroni[85889]: File "/usr/lib/python3.8/shutil.py", line 720, in rmtree
May 08 11:40:54 PATRONIDB3 patroni[85889]: os.rmdir(path)
May 08 11:40:54 PATRONIDB3 patroni[85889]: PermissionError: [Errno 13] Permission denied: '/dati'
May 08 11:40:54 PATRONIDB3 patroni[85889]: During handling of the above exception, another exception occurred:
May 08 11:40:54 PATRONIDB3 patroni[85889]: Traceback (most recent call last):
May 08 11:40:54 PATRONIDB3 patroni[85889]: File "/usr/lib/python3/dist-packages/patroni/postgresql/init.py", line 1046, in move_data_directory
May 08 11:40:54 PATRONIDB3 patroni[85889]: os.rename(self._data_dir, new_name)
May 08 11:40:54 PATRONIDB3 patroni[85889]: PermissionError: [Errno 13] Permission denied: '/dati' -> '/dati_2023-05-08-11-40-54'
May 08 11:40:57 PATRONIDB3 patroni[85889]: 2023-05-08 11:40:57,340 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3

then retried it self in a loop but not was able to complete the replica.
root@PATRONIDB3:/var/log# systemctl status patroni
● patroni.service - PostgreSQL high-availability manager
Loaded: loaded (/lib/systemd/system/patroni.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2023-05-08 09:17:36 CEST; 6h ago
Main PID: 85889 (patroni)
Tasks: 7 (limit: 28683)
Memory: 43.0M
CGroup: /system.slice/patroni.service
├─ 85889 /usr/bin/python3 /usr/bin/patroni /etc/patroni/patroni.yml
└─130139 /usr/lib/postgresql/14/bin/pg_basebackup --pgdata=/dati -X stream --dbname=dbname=postgres user=replicator host=10.x.x.x port=5432

May 08 16:11:17 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:17,340 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3
May 08 16:11:17 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:17,410 INFO: bootstrap from leader 'PATRONIDB1' in progress
May 08 16:11:27 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:27,340 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3
May 08 16:11:27 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:27,346 INFO: bootstrap from leader 'PATRONIDB1' in progress
May 08 16:11:37 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:37,340 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3
May 08 16:11:37 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:37,346 INFO: bootstrap from leader 'PATRONIDB1' in progress
May 08 16:11:47 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:47,340 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3
May 08 16:11:47 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:47,345 INFO: bootstrap from leader 'PATRONIDB1' in progress
May 08 16:11:57 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:57,340 INFO: Lock owner: PATRONIDB1; I am PATRONIDB3
May 08 16:11:57 PATRONIDB3 patroni[85889]: 2023-05-08 16:11:57,346 INFO: bootstrap from leader 'PATRONIDB1' in progress

does moving the tablespace will really solve this issue?

my conf is

postgresql:
listen: 0.0.0.0:5432
connect_address: 10.x..x.x:5432
data_dir: "/dati"
bin_dir: "/usr/lib/postgresql/14/bin"

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants