Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace json state file as a source of truth #458

Open
rdunklau opened this issue Aug 11, 2021 · 0 comments
Open

Replace json state file as a source of truth #458

rdunklau opened this issue Aug 11, 2021 · 0 comments

Comments

@rdunklau
Copy link
Contributor

Hello,

The JSON state file is used a source of truth for the last_flushed_lsn when using the walreceiver.
This is prone to errors, as the json file is persisted only locally, and asynchronously.
This means that if we were to stop the walreceiver process on a machine and start it up elsewhere, we would lose WAL in between.
I think we should use the object storage as the source of truth instead.
In that particular case, we could probably get away with:

  • reading the value from the json_state_file if it exists. It might be stale, in that case we would simply re-archive WALs that we already archived. It's not ideal but except in the event that those already-archived WAL files have been discarded from the server it should be fine
  • when we don't have a last_flushed_lsn from the json_state_file, list xlogs from the object storage, find the latest one and compute the LSN associated to the one that would follow. It's not clear to me what are the implications of a timeline switch at this point.
    A problem with this is that listing the full xlog directory on the object storage might be expensive. Having a concept similar to pgbackrest's manifest could help with that: persisting a file on the object storage acting as a global metadata for the whole backup site.

Another approach would be to mandate the use of a replication slot, and require permission to a "maintenance" db to fetch the restart_lsn of the replication slot (as it's not possible to get it from a replication connection, although a patch has been proposed for that). This is quite invasive as it would require giving "regular" connection permission to the backup user.

I would personally be more inclined to implement the first solution, but I'm curious to have other opinions on the subject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant