Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow skipping files/folders using an xattr and merge snapshots #3648

Open
gmgigi96 opened this issue Feb 14, 2022 · 1 comment
Open

Allow skipping files/folders using an xattr and merge snapshots #3648

gmgigi96 opened this issue Feb 14, 2022 · 1 comment
Labels

Comments

@gmgigi96
Copy link

Output of restic version

restic 0.12.0 compiled with go1.15.8 on linux/amd64

What should restic do differently? Which functionality do you think we should add?

Now restic when backing up some data, will compare the size, mtime, ctime and the inode of a file instead of fully read it. There are some storage platforms (like Ceph and EOS) that recursivelly update some extended attributes when a file changes to the root parents. This could be exploited in order to discard some folders as we are 100% sure that from the previous snapshot the folder (and avery files in it, no matter how much deeply) did not change. So, it could be nice have an option (like proposed in #2902) for comparing an xattr specified by the user, that is checked against a folder as well. This functionality has two main advantages:

  1. skipping folders that we already know were not updated since the last snapshot will save a lot of time, expecially if you have millions of resources in that folder
  2. reduces the load on production storage instances

In addition, because restic for the time being only stores in a new snapshot the processed files (so adding a feature like skipping a folder by an extended attribute means skipping a sub tree in the new snapshot), could be nice to have a merge option, that will merge the new snapshot with a parent one. This can be helpful while restoring a full folder, or a part of it, from a snapshot without checking whether the previous snapshots have additional files/folders that were skipped in the current snapshot we are using for restoring. Also, this can be useful as a prune could delete these older snapshots that contains useful resources not included in the newer snapshots. This was also already mentioned in the forum, like in https://forum.restic.net/t/merge-restic-snapshots/4364, or https://forum.restic.net/t/backup-parent-behavior/3286.

What are you trying to do? What problem would this solve?

This will save a lot of time backing up really big folders and reduce the load on instances used everyday by thousands of users.

Did restic help you today? Did it make you happy in any way?

Restic is a fantastic tool we are using at CERN to backup every day ~40K home and project directories of our users, resulting in >300M of new files every day, for a total of 4PB of data being backed up.

@MichaelEischer
Copy link
Member

MichaelEischer commented May 14, 2022

Allowing the archiver part of the backup command to skip directories based on an extended attribute would be fairly easy to implement. However, I currently see two problems that would have to be solved:

  • The backup command consists of a scanner which just counts how many files have to be backed up (only for statistics!) and the actual archiver component which backs up everything. The scanner currently does not use the parent snapshot in any way such that it also won't be able to skip directories based on an extended attribute. It would probably be possible to add that functionality to the scanner, but we'd have to ensure that it doesn't regress performance when not using the extended attributes.
  • When skipping directories then the statistics won't add up. That would probably be acceptable. Reconstructing the statistics based on the directory metadata from the backup repository is probably overkill.

could be nice to have a merge option, that will merge the new snapshot with a parent one

I've seen #3405 which I plan to look at whenever I get around to look at snapshot rewriting and similar functionalities. But I can't make any promises when that will happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants