Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Archive shutdown checkpoint WAL file after demotion #4586

Open
4 tasks done
gbartolini opened this issue May 18, 2024 · 7 comments · May be fixed by #4411
Open
4 tasks done

[Feature]: Archive shutdown checkpoint WAL file after demotion #4586

gbartolini opened this issue May 18, 2024 · 7 comments · May be fixed by #4411
Assignees
Milestone

Comments

@gbartolini
Copy link
Contributor

gbartolini commented May 18, 2024

Is there an existing issue already for this bug?

  • I have searched for an existing issue, and could not find anything. I believe this is a new bug.

I have read the troubleshooting guide

  • I have read the troubleshooting guide and I think this is a new bug.

I am running a supported version of CloudNativePG

  • I have read the troubleshooting guide and I think this is a new bug.

Contact Details

No response

Version

1.23.1

What version of Kubernetes are you using?

1.29

What is your Kubernetes environment?

Cloud: Amazon EKS

How did you install the operator?

YAML manifest

What happened?

With 1.23.1, when demoting the primary, the WAL file following the shutdown checkpoint wasn't archived:

kubectl exec jeeg-eu-central-1-1 -- pg_controldata|grep REDO
Defaulted container "postgres" out of: postgres, bootstrap-controller (init)
Latest checkpoint's REDO location:    0/24000028
Latest checkpoint's REDO WAL file:    000000010000000000000009

Log reports:

{"level":"info","ts":"2024-05-17T15:54:51Z","logger":"wal-archive","msg":"Archived WAL file","logging_pod":"jeeg-eu-central-1-1","walName":"pg_wal/000000010000000000000008","startTime":"2024-05-17T15:54:50Z","endTime":"2024-05-17T15:54:51Z","elapsedWalTime":1.093924041}
{"level":"info","ts":"2024-05-17T15:56:46Z","logger":"postgres","msg":"record","logging_pod":"jeeg-eu-central-1-1","record":{"log_time":"2024-05-17 15:56:46.897 UTC","process_id":"20","session_id":"66477c68.14","session_line_num":"14","session_start_time":"2024-05-17 15:48:56 UTC","transaction_id":"0","error_severity":"LOG","sql_state_code":"55P02","message":"parameter \"archive_mode\" cannot be changed without restarting the server","backend_type":"postmaster","query_id":"0"}}

Steps to reproduce are detailed here: https://github.com/gbartolini/postgres-kubernetes-playground/tree/main/aws-eks/examples/jeeg#simulating-a-data-center-switchover

This prevents the other cluster from being promoted without a gap.

Cluster resource

No response

Relevant log output

See above

Code of Conduct

  • I agree to follow this project's Code of Conduct
@gbartolini gbartolini added the triage Pending triage label May 18, 2024
@gbartolini gbartolini self-assigned this May 18, 2024
@gbartolini gbartolini added bug 🐛 Something isn't working and removed triage Pending triage labels May 18, 2024
@gbartolini
Copy link
Contributor Author

Blocking #4678

@gbartolini
Copy link
Contributor Author

Upon the transition of replica.enabled from false to true, we could archive the latest WAL file containing the shutdown checkpoint (making sure we add the .partial suffix to the name). Thoughts?

@gbartolini
Copy link
Contributor Author

This is an excerpt of PostgreSQL source code (line 6570 of src/backend/access/transam/xlog.c):

                /*
                 * If archiving is enabled, rotate the last XLOG file so that all the
                 * remaining records are archived (postmaster wakes up the archiver
                 * process one more time at the end of shutdown). The checkpoint
                 * record will go to the next XLOG file and won't be archived (yet).
                 */
                if (XLogArchivingActive())
                        RequestXLogSwitch(false);

                CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);

I'd be interested in exploring if it makes sense to achieve a similar goal in Postgres itself by archiving the file as .partial.

@gbartolini gbartolini added this to the 1.23.2 milestone May 28, 2024
@mnencia
Copy link
Member

mnencia commented May 28, 2024

This is an excerpt of PostgreSQL source code (line 6570 of src/backend/access/transam/xlog.c):

            /*
             * If archiving is enabled, rotate the last XLOG file so that all the
             * remaining records are archived (postmaster wakes up the archiver
             * process one more time at the end of shutdown). The checkpoint
             * record will go to the next XLOG file and won't be archived (yet).
             */
            if (XLogArchivingActive())
                    RequestXLogSwitch(false);

            CreateCheckPoint(CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_IMMEDIATE);

I'd be interested in exploring if it makes sense to achieve a similar goal in Postgres itself by archiving the file as .partial.

I wonder why these two operations are done in this order. There's no hint of the reason in the code. Unless there is a good reason to do it, swapping the two operations is the easiest and safest solution.

@mnencia
Copy link
Member

mnencia commented May 29, 2024

Upon the transition of replica.enabled from false to true, we could archive the latest WAL file containing the shutdown checkpoint (making sure we add the .partial suffix to the name). Thoughts?

With the current Postgres behavior, uploading the last WAL as .partial is the best way to make the needed WAL file reachable from the other clusters.

@gbartolini
Copy link
Contributor Author

I tried with your suggested approach Marco. See: postgres/postgres@master...gbartolini:postgres:archive-shutdown-checkpoint-wal

However, the change is logically too invasive:

2024-05-29 14:31:52.571 CEST [79113] FATAL:  cannot make new WAL entries during recovery

Let's proceed for the moment with archiving a partial WAL file, while I try and involve some Postgres developers here.

@mnencia mnencia removed the bug 🐛 Something isn't working label May 29, 2024
@mnencia mnencia changed the title [Bug]: Shutdown checkpoint WAL file doesn't get archived after demotion [Feature]: Shutdown checkpoint WAL file doesn't get archived after demotion May 29, 2024
@mnencia mnencia changed the title [Feature]: Shutdown checkpoint WAL file doesn't get archived after demotion [Feature]: Archive shutdown checkpoint WAL file after demotion May 30, 2024
@gbartolini gbartolini modified the milestones: 1.23.2, 1.24.0 Jun 4, 2024
@gbartolini
Copy link
Contributor Author

This feature seems to be strictly tied to the checkpoint token patch. I am moving it to 1.24 for now, unless we see that we can actually extract it and make it a backportable patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

5 participants