-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Archive shutdown checkpoint WAL file after demotion #4586
Comments
Blocking #4678 |
Upon the transition of |
This is an excerpt of PostgreSQL source code (line 6570 of src/backend/access/transam/xlog.c): /*
* If archiving is enabled, rotate the last XLOG file so that all the
* remaining records are archived (postmaster wakes up the archiver
* process one more time at the end of shutdown). The checkpoint
* record will go to the next XLOG file and won't be archived (yet).
*/
if (XLogArchivingActive())
RequestXLogSwitch(false);
CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE); I'd be interested in exploring if it makes sense to achieve a similar goal in Postgres itself by archiving the file as |
I wonder why these two operations are done in this order. There's no hint of the reason in the code. Unless there is a good reason to do it, swapping the two operations is the easiest and safest solution. |
With the current Postgres behavior, uploading the last WAL as |
I tried with your suggested approach Marco. See: postgres/postgres@master...gbartolini:postgres:archive-shutdown-checkpoint-wal However, the change is logically too invasive:
Let's proceed for the moment with archiving a partial WAL file, while I try and involve some Postgres developers here. |
This feature seems to be strictly tied to the checkpoint token patch. I am moving it to 1.24 for now, unless we see that we can actually extract it and make it a backportable patch. |
Is there an existing issue already for this bug?
I have read the troubleshooting guide
I am running a supported version of CloudNativePG
Contact Details
No response
Version
1.23.1
What version of Kubernetes are you using?
1.29
What is your Kubernetes environment?
Cloud: Amazon EKS
How did you install the operator?
YAML manifest
What happened?
With 1.23.1, when demoting the primary, the WAL file following the shutdown checkpoint wasn't archived:
Log reports:
Steps to reproduce are detailed here: https://github.com/gbartolini/postgres-kubernetes-playground/tree/main/aws-eks/examples/jeeg#simulating-a-data-center-switchover
This prevents the other cluster from being promoted without a gap.
Cluster resource
No response
Relevant log output
Code of Conduct
The text was updated successfully, but these errors were encountered: