Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Benji? Please tell us! #67

Open
elemental-lf opened this issue Jan 13, 2020 · 14 comments
Open

Using Benji? Please tell us! #67

elemental-lf opened this issue Jan 13, 2020 · 14 comments
Assignees

Comments

@elemental-lf
Copy link
Owner

If a tool like Benji is running fine and without a hitch (even tough it's still pre-release software) we generally don't hear about it. So this issue is for collecting success stories and related information.

Tell us something about your use case and your environment:

  • What version(s) of Benji are you using?
  • What are you backing up?
  • How much data is it?
  • Have you written custom tooling around Benji?
  • Using compression or encryption?
  • What kind of storage are you using? S3? B2? File system?

Also if you've mentioned or see Benji mentioned in a talk or presentation or if you've written or found a forum or blog post discussing Benji please report it here.

On the road to a stable release we need reports of Benji running along successfully, especially as the user base of Benji is still small. Every feedback counts and also allows us to identify new features for future development.

For bugs, ideas, feature requests, or just questions, please open a new separate issue — so that other developers and users can find it later.

Thank you for your time!

@elemental-lf elemental-lf self-assigned this Jan 13, 2020
@elemental-lf elemental-lf pinned this issue Jan 13, 2020
@wech71
Copy link
Contributor

wech71 commented Jan 21, 2020

We are currently using Benji v0.8 on two proxmox clusters with ceph storage.
We are backing up the ceph object storage on both clusters.

cluster 1 has about 10 TB of data

cluster 2 has about 20 TB of data, but we don't back up everything there because the a freenas vm replicates the zfs pool to another machine, so we don't have to backup the disks and some Rancher Nodes which don't hold any valuable information.

Upgraded the database schema took around 4 hours on every node. cluster 1 postgres db is 15 GB, 66 million Block rows for 4622 versions (every backup also qemu-conf).
Backup retetion is set to "latest31,days31,months6"

We use a backup script, originating in the backy2 script, as we used backy2 earlier. The script runs on every node every night and sends Reports by E-Mail and errors to our company chat system. After every backup an additional export-metdata is done.

We use a zfs pool shared trough a nfs share as local backup block storage (though: "file"). We don't compress or encrypt the backup here.

The metadata-Database is a postgresql11 server.

Additionally currently we are starting additonally to create weekly off-site backups to a s3 storage provided by strato (s3 hidrive).

For this we use a different benji.yaml with a separate (postgresql) database and zstd compression and aes256 encryption.

Hope this helps.

Thank you for this great tool!

@elemental-lf
Copy link
Owner Author

@wech71 thank you for the detailed feedback. Database upgrade to 0.8.0 is slow, this was expected. With the next release the number of rows in the blocks table will go down for most users as sparse blocks won't be represented in the database any more. This should reduce the size of the database and speed up some things like the initial cloning of a version from an older snapshot. This change is already present in current master and Oliver has been testing it with his Ceph cluster with good results so far.

@elemental-lf
Copy link
Owner Author

One additional note: Compression will probably save you a lot of space. Oliver has been measuring ratios between 10 and over 100. See his presentation https://indico.cern.ch/event/765214/contributions/3517132/, page 16. I wouldn't recommend a compression level of 22, but something between 1 and 3 (inclusive) as performance degrades quickly with higher levels. See https://raw.githubusercontent.com/facebook/zstd/master/doc/images/DCspeed5.png. The graph is for version 0.6.0 of zstandard, it'll probably be faster now.

@wech71
Copy link
Contributor

wech71 commented Jan 21, 2020

Yes, this is definitely the case. I see this in the offsite backup where i use zstandard compression (level 1 currently).

I just hesitate to change the configuration for the (local) production backups, because I'm unsure if the information about compression and encryption is persisted to the stored blocks.

What would happen if i add compression to the current configuration which results in some blocks of one backup set to contain compressed and uncompressed blocks?

Also I could not find information on how to configure an additional storage (offline-storage) on the same benji.yaml and database.

Therefore I used a separate benji.yaml and database. I just was not completely sure if two different storages would be supported by one database. I did not want to risk putting one block on only one of the storages which could result in missing blocks during restore in the case of losing one of the two backup storage locations.

I guess it would be handled correctly as there is a column in benji ls giving the storage used, but I needed to make it 100% safe ;-)

@adambmedent
Copy link

We have successfully been using Benji for a bit over 6 months now.

We have a dedicated 8 node ceph cluster which is all proxmox hosts. This cluster is dedicated to ceph storage and doesn't have any VM's running on the front ends. We have other clusters with dedicated front ends which handle that

We are now backing up 45 RBD block devices and adding more each day. In total they have about 5TB of real data on them.

We currently have 4049 versions and have a retention of "enforce latest1,days525". Our database is roughly 34G in size and took well over 24 hours to get it updated to v0.8.0.

Using some of the provided scripts along with some custom ones to handle automating the backups and controlling how many run concurrently.

The backup storage is a simple ZFS array which is presented over NFS to our ceph VM.

I did make a thread over on the proxmox forums for some attention. Not a ton of reply's but it so far it has seen 150 views.

https://forum.proxmox.com/threads/any-proxmox-ceph-users-interested-in-helping-test-benji.63027/

This project is awesome, not only that but the developers are quick to react and offer as much help as they can. I probably owe Elemental at least a 6 pack!

@blodone
Copy link

blodone commented Feb 2, 2020

We also run benji for more than a year now in a spliced setup with two independent backup destination server saving files locally. Currently we upgraded to the latest version of benji, the migration took about 2 hours. Also i now enabled compression at level 3.

The database size is about 5gb for each server
It runs very good and fast. We built a Django operation tooling around it and backup a Ceph System for VM Images (Xenserver)

But now to one problem which is getting bigger at the moment:
We are running out of backup space.

The backups shown in benji with bytes_written sum up to 3,4TB (server-1) and 2,4TB (server-2).
But the absolute space used in the data directory is 8,8TB (server-1) and 7,1TB (server-2)

Seems like there is some kind of lost blocks left in this directory and i could not find out how to diagnose or cleanup. On one server we're running batch-scrub and i'll try the cleanup afterwards but i don't think this will solve the problem. Do you have any advices how to manage that situation?

Is there any check that looks for files not in the database?

@wech71
Copy link
Contributor

wech71 commented Feb 3, 2020

Did you run
benji enforce "list of retention specifications"
before
benji cleanup

cleanup only removes blocks which are marked as unused by benji enforce.

@blodone
Copy link

blodone commented Feb 3, 2020

Yes we run this event daily before cleanup.

The size reported of every backup available (bytes_written) in benji DB is about 3 x smaller than the actual disc size used. How can that be? There must be some kind of data in the folder that does not belong to the backups anymore. e.g. from failures in previous backups or failures from previous cleanup.

@elemental-lf
Copy link
Owner Author

Thanks for the feedback, @blodone! I've moved the discussion to issue #69.

Everyone, please open new issues for any problems or questions and try to keep discussions in this issue at a minimum. Thanks!

@DevOpsAmph
Copy link

If a tool like Benji is running fine and without a hitch (even tough it's still pre-release software) we generally don't hear about it. So this issue is for collecting success stories and related information.

Tell us something about your use case and your environment:

  • What version(s) of Benji are you using?
  • What are you backing up?
  • How much data is it?
  • Have you written custom tooling around Benji?
  • Using compression or encryption?
  • What kind of storage are you using? S3? B2? File system?

Also if you've mentioned or see Benji mentioned in a talk or presentation or if you've written or found a forum or blog post discussing Benji please report it here.

On the road to a stable release we need reports of Benji running along successfully, especially as the user base of Benji is still small. Every feedback counts and also allows us to identify new features for future development.

For bugs, ideas, feature requests, or just questions, please open a new separate issue — so that other developers and users can find it later.

Thank you for your time!

Hi,
we are using benji to backup a ceph cluster via rbd-storga access. Runs fine - Benji runs on two phyiscal boxes. Block optimizationand compressions are doing a great job - 130 TB fit into 50 TB Backup-Storage. We run on centos 7 with 20 Gbit/s interfaces.
"Preparation phases" last longer than the data transmission with usual change volume. A big issue are MS Windows virtual machines with a shadow copy cycle activated - change volume is massive due to this.

Great tool!

@allenporter
Copy link
Contributor

Hello,

  • Installed benji via helm chart (so using whatever version is latest there)
  • Backing up kubernetes PVCs in kubernetes. This is for a pretty small home cluster that is built following the patterns established via https://k8s-at-home.com/ community.
  • Less than 1TB of data
  • No custom tooling, but do have configuration for automated pushes using flux based on toboshii's configuration (home-cluster)
  • The PVCs are backed by ceph and rook-ceph, and the target storage is NFS

Everything is working smoothly, though I did hit speed bumps getting the bitnami postgresql chart working right and figuring out all the commands to exercise manual backup (via running the cron job) and how to do pvc restores. My install, configuration, and docs for manual testing are tracked documented here

@MannerMan
Copy link

Hello,

I work for Fortnox AB, a Swedish SaaS-company with a large customer base. We use Benji as one of our DB backup systems primarily due to its excellent performance. We have a special data model that results in many millions of small files on each of our DB servers, and 'native' DB-backup tools generally struggle with this. Using benji block-level backup, we can do very fast backup and recovery operations.

  • What version(s) of Benji are you using?
    v0.14.1
  • What are you backing up?
    Database servers
  • How much data is it?
    34TB combined, spread over 180 servers.
  • Have you written custom tooling around Benji?
    Yes, we have integrated Benji into a webportal used by operations and our technical-support, from which we can do very easy recovery operations. Our technical-support restores copies of databases almost on daily basis, to investigate various issues in our system. We combine the 'base backup' recovered from benji with archived transaction log from the DB to achieve quick Point-in-Time restores, usually it takes around 20 minutes. We have had this system in use for a few months now and it has worked out great!
  • Using compression or encryption?
    We use zstd in benji, encryption are done at rest.
  • What kind of storage are you using? S3? B2? File system?
    We backup to a geo-separated S3 instance (minio)

We are very satisfied with Benji and can highly recommend it. No issues besides some small things repository cleanup taking some time (~36h, although not really a problem for us) and benji storage-statistics not working in our large setup (eats all memory - I'll get around to creating an issue about that one at some point)

@elemental-lf
Copy link
Owner Author

@allenporter and @MannerMan thank you very much for the detailed description of your setups.

@allenporter
Copy link
Contributor

Since posting this, numerous other folks from the k8s-at-home community/discord have installed benji for it's simplicity compared to other backup products.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants