Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WAL-E clone GCS issue #935

Open
ggramal opened this issue Oct 10, 2023 · 1 comment
Open

WAL-E clone GCS issue #935

ggramal opened this issue Oct 10, 2023 · 1 comment

Comments

@ggramal
Copy link

ggramal commented Oct 10, 2023

Hello everyone. At first i want to thank you guys for your cool postgresql HA solutions and k8s operator. Unfortunately we have an issue with restoring(cloning) from wal-e backups in GCS.

Environment
Spilo image - ghcr.io/zalando/spilo-15:3.0-p1
Postgres operator - registry.opensource.zalan.do/acid/postgres-operator:v1.10.1

Postgres crd

kind: postgresql
metadata:
  name: test
spec:
  clone:
    uid: "<UID>"
    cluster: "prod"
    timestamp: "2023-10-05T18:06:52+00:00"
 .....

When container starts it has this errors in logs

2023-10-10 16:24:59,174 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-10-10 16:24:59,180 INFO: Lock owner: None; I am test
2023-10-10 16:24:59,213 INFO: trying to bootstrap a new cluster
2023-10-10 16:24:59,213 INFO: Running custom bootstrap script: envdir "/run/etc/wal-e.d/env-clone-prod" python3 /scripts/clone_with_wale.py --recovery-target-time="2023-10-05T18:06:52+00:00"
2023-10-10 16:24:59,422 INFO: Trying gs://somebucket/spilo/prod/<UID>/wal/15/ for clone
wal_e.main   INFO     MSG: starting WAL-E
        DETAIL: The subcommand is "backup-list".
        STRUCTURED: time=2023-10-10T16:24:59.724227-00 pid=204
2023-10-10 16:25:00,304 ERROR: Clone failed
Traceback (most recent call last):
  File "/scripts/clone_with_wale.py", line 185, in main
    run_clone_from_s3(options)
  File "/scripts/clone_with_wale.py", line 166, in run_clone_from_s3
    backup_name, update_envdir = find_backup(options.recovery_target_time, env)
  File "/scripts/clone_with_wale.py", line 153, in find_backup
    backup = choose_backup(backup_list, recovery_target_time)
  File "/scripts/clone_with_wale.py", line 74, in choose_backup
    if last_modified < recovery_target_time:
TypeError: can't compare offset-naive and offset-aware datetimes

We analyzed the source code of spilo a bit and found the route cause.
So script clone_with_wale.py executes wal-e backup-list command and tries to parse the output to get the timestamp. The output is returned in format

name	last_modified	expanded_size_bytes	wal_segment_backup_start	wal_segment_offset_backup_start	wal_segment_backup_stop	wal_segment_offset_backup_stop
base_00000005000000000000001B_00000040	2021-06-23 01:00:14.498000+00:00		00000005000000000000001B	00000040

So timestamp here should be 2021-06-23 01:00:14.498000+00:00 but only the first part (2021-06-23) of the timestamp is used when being compared to the recovery timestamp. Because of this an error happens

TypeError: can't compare offset-naive and offset-aware datetimes

We fixed this issue by making a custom image of spilo and applying this patch

diff --git a/postgres-appliance/bootstrap/clone_with_wale.py b/postgres-appliance/bootstrap/clone_with_wale.py
index e8d3196..e6c6b12 100755
--- a/postgres-appliance/bootstrap/clone_with_wale.py
+++ b/postgres-appliance/bootstrap/clone_with_wale.py
@@ -62,7 +62,7 @@ def fix_output(output):
             if started:
                 line = line.replace(' modified ', ' last_modified ')
         if started:
-            yield '\t'.join(line.split())
+            yield '\t'.join(line.split('\t'))


 def choose_backup(backup_list, recovery_target_time):

We can make a PR to fix it the issue in original image but we are not sure that

  • this repo is still maintained
  • this will not brake the s3 wal-e backups (i guess there should be tests in CI that check that)
@ggramal
Copy link
Author

ggramal commented Oct 10, 2023

I guess there are also people having the same issue #301 (comment)
#301 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant