Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

References for PITR / Continuous Archiving implementation? #139

Open
ccrvlh opened this issue Nov 8, 2022 · 1 comment
Open

References for PITR / Continuous Archiving implementation? #139

ccrvlh opened this issue Nov 8, 2022 · 1 comment

Comments

@ccrvlh
Copy link

ccrvlh commented Nov 8, 2022

Hi.

I've had some issues with Zalando, and now I'm looking for a simpler operator. Kubegres seems to fit the bill, and my experience deploy a cluster was great. I have a custom image setup to run pg_dump and pg_restore scripts , CronJobs for the dump and an on-demand job for the restoration process. This is really simple, and works well, but with restrictions: won't work for larger databases, slow, very high RPO.

I've been looking at strategies to implement PITR and continuous backup. Zalando had this baked in using pg_basebackup and WAL-G (I think). Outside the k8s world, I've read a lot about PgBackrest, Barman, WAL-G and couple of other solutions. But those doesn't look all that simple to setup when the DB is running in containers (they might be, but I don't find much information on it except one or two repos). I know Timescale runs PgBackrest as a sidecar, Zalando runs a custom image with WAL-G/E + pg_basebackup, Percona also uses PgBackrest (not sure about the architecture). PGO Crunchy also backrest, Stackgres I think is custom solution, not sure.

I tried running a separate container for PgBackrest, so I changed the VolumeClaim policy for ReadWriteMany (so that Backrest could connect directly to the data directory), but I had quite a few issues all around the process and couldn't make it work (yet? will keep trying).

I understand Kubegres is not particularly going in this direction at the moment, but I wonder if this could be an option for the future. There has been a brief discussion about it here, but it stopped at pg_dump. Stackgres has an interesting approach with several CRDs. Although this looks complex at first having multiple CRD also allows for more flexibility. Zalando's approach tries to put everything into the cluster definition and/or configuration file, so things are not always trivial to grasp. (I'll follow up in a bit with potential implementations.

I imagine that this should be a common requirement for folks deploying PSQL to k8s, so even if this is not a plan for Kubegres in future, I imagine the pain still exists, so I was wondering if there were any examples, references or any other material really to implement this solution with Kubegres, or any experience people could share.

Thanks a lot!

@alazycoder101
Copy link

I have the same kind of requests as well.
PITR is very good, before using kubegres we are backing up with the following strategy:

  • base backup of the DB with pg_baseback every Monday.
  • WAL copy to object storage
  • clean up the obsolete WAL files after doing base backup.

Unfortunately the config file in kubegres is for both primary and replicas, we need WAL only for primary.

I understand kubegres is just keeping as simple as possible so that you can customize as you want.

Maybe the first step is to seperate the config file between primary and replicas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants