[Feature]: Archive shutdown checkpoint WAL file after demotion #4586

gbartolini · 2024-05-18T06:43:00Z

Is there an existing issue already for this bug?

I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

I have read the troubleshooting guide and I think this is a new bug.

Contact Details

No response

Version

1.23.1

What version of Kubernetes are you using?

1.29

What is your Kubernetes environment?

Cloud: Amazon EKS

How did you install the operator?

YAML manifest

What happened?

With 1.23.1, when demoting the primary, the WAL file following the shutdown checkpoint wasn't archived:

kubectl exec jeeg-eu-central-1-1 -- pg_controldata|grep REDO
Defaulted container "postgres" out of: postgres, bootstrap-controller (init)
Latest checkpoint's REDO location:    0/24000028
Latest checkpoint's REDO WAL file:    000000010000000000000009

Log reports:

{"level":"info","ts":"2024-05-17T15:54:51Z","logger":"wal-archive","msg":"Archived WAL file","logging_pod":"jeeg-eu-central-1-1","walName":"pg_wal/000000010000000000000008","startTime":"2024-05-17T15:54:50Z","endTime":"2024-05-17T15:54:51Z","elapsedWalTime":1.093924041}
{"level":"info","ts":"2024-05-17T15:56:46Z","logger":"postgres","msg":"record","logging_pod":"jeeg-eu-central-1-1","record":{"log_time":"2024-05-17 15:56:46.897 UTC","process_id":"20","session_id":"66477c68.14","session_line_num":"14","session_start_time":"2024-05-17 15:48:56 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"55P02","message":"parameter \"archive_mode\" cannot be changed without restarting the server","backend_type":"postmaster","query_id":"0"}}

Steps to reproduce are detailed here: https://github.com/gbartolini/postgres-kubernetes-playground/tree/main/aws-eks/examples/jeeg#simulating-a-data-center-switchover

This prevents the other cluster from being promoted without a gap.

Cluster resource

No response

Relevant log output

See above

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

gbartolini · 2024-05-25T06:15:43Z

Blocking #4678

gbartolini · 2024-05-28T07:37:48Z

Upon the transition of replica.enabled from false to true, we could archive the latest WAL file containing the shutdown checkpoint (making sure we add the .partial suffix to the name). Thoughts?

gbartolini · 2024-05-28T09:24:02Z

This is an excerpt of PostgreSQL source code (line 6570 of src/backend/access/transam/xlog.c):

                /*
                 * If archiving is enabled, rotate the last XLOG file so that all the
                 * remaining records are archived (postmaster wakes up the archiver
                 * process one more time at the end of shutdown). The checkpoint
                 * record will go to the next XLOG file and won't be archived (yet).
                 */
                if (XLogArchivingActive())
                        RequestXLogSwitch(false);

                CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);

I'd be interested in exploring if it makes sense to achieve a similar goal in Postgres itself by archiving the file as .partial.

mnencia · 2024-05-28T11:32:37Z

This is an excerpt of PostgreSQL source code (line 6570 of src/backend/access/transam/xlog.c):

            /*
             * If archiving is enabled, rotate the last XLOG file so that all the
             * remaining records are archived (postmaster wakes up the archiver
             * process one more time at the end of shutdown). The checkpoint
             * record will go to the next XLOG file and won't be archived (yet).
             */
            if (XLogArchivingActive())
                    RequestXLogSwitch(false);

            CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);

I'd be interested in exploring if it makes sense to achieve a similar goal in Postgres itself by archiving the file as .partial.

I wonder why these two operations are done in this order. There's no hint of the reason in the code. Unless there is a good reason to do it, swapping the two operations is the easiest and safest solution.

mnencia · 2024-05-29T08:31:24Z

Upon the transition of replica.enabled from false to true, we could archive the latest WAL file containing the shutdown checkpoint (making sure we add the .partial suffix to the name). Thoughts?

With the current Postgres behavior, uploading the last WAL as .partial is the best way to make the needed WAL file reachable from the other clusters.

gbartolini · 2024-05-29T12:44:33Z

I tried with your suggested approach Marco. See: postgres/postgres@master...gbartolini:postgres:archive-shutdown-checkpoint-wal

However, the change is logically too invasive:

2024-05-29 14:31:52.571 CEST [79113] FATAL:  cannot make new WAL entries during recovery

Let's proceed for the moment with archiving a partial WAL file, while I try and involve some Postgres developers here.

gbartolini · 2024-06-04T14:53:16Z

This feature seems to be strictly tied to the checkpoint token patch. I am moving it to 1.24 for now, unless we see that we can actually extract it and make it a backportable patch.

gbartolini added the triage Pending triage label May 18, 2024

gbartolini self-assigned this May 18, 2024

gbartolini added bug 🐛 Something isn't working and removed triage Pending triage labels May 18, 2024

gbartolini assigned mnencia, leonardoce, armru and fcanovai May 18, 2024

gbartolini added this to the 1.23.2 milestone May 28, 2024

mnencia removed the bug 🐛 Something isn't working label May 29, 2024

mnencia changed the title ~~[Bug]: Shutdown checkpoint WAL file doesn't get archived after demotion~~ [Feature]: Shutdown checkpoint WAL file doesn't get archived after demotion May 29, 2024

mnencia changed the title ~~[Feature]: Shutdown checkpoint WAL file doesn't get archived after demotion~~ [Feature]: Archive shutdown checkpoint WAL file after demotion May 30, 2024

gbartolini linked a pull request May 30, 2024 that will close this issue

feat: enhance shutdown process for demotion of the cluster #4411

Open

mnencia mentioned this issue Jun 4, 2024

[Bug]: replica cluster switchover doesn't work when only object store replication is configured #4772

Open

4 tasks

gbartolini modified the milestones: 1.23.2, 1.24.0 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Archive shutdown checkpoint WAL file after demotion #4586

[Feature]: Archive shutdown checkpoint WAL file after demotion #4586

gbartolini commented May 18, 2024 •

edited

gbartolini commented May 25, 2024

gbartolini commented May 28, 2024

gbartolini commented May 28, 2024

mnencia commented May 28, 2024

mnencia commented May 29, 2024

gbartolini commented May 29, 2024

gbartolini commented Jun 4, 2024

[Feature]: Archive shutdown checkpoint WAL file after demotion #4586

[Feature]: Archive shutdown checkpoint WAL file after demotion #4586

Comments

gbartolini commented May 18, 2024 • edited

Is there an existing issue already for this bug?

I have read the troubleshooting guide

I am running a supported version of CloudNativePG

Contact Details

Version

What version of Kubernetes are you using?

What is your Kubernetes environment?

How did you install the operator?

What happened?

Cluster resource

Relevant log output

Code of Conduct

gbartolini commented May 25, 2024

gbartolini commented May 28, 2024

gbartolini commented May 28, 2024

mnencia commented May 28, 2024

mnencia commented May 29, 2024

gbartolini commented May 29, 2024

gbartolini commented Jun 4, 2024

gbartolini commented May 18, 2024 •

edited