-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of fallocate in restore to prevent extreme fragmentation of (large) files #2675
Comments
The restorer code has already been rewritten since 0.9.6 (#2195). You should recheck whether the file gets so fragmented. |
I had to use a recent daily-build (see description) for the restore, which contains said updated restore code AFAIU. AFAIU2, the updated restore code is the actual cause of the fragmentation problem. |
Excuse me, I had misread the version number. I am afraid that using fallocate would conflict with writing sparse files (files with holes), as in #2601, so one of them would need an option to turn it on/off. |
What if restic would primarily try to append to existing files first, instead of doing random writes? |
Single-threaded append-only restore is very slow (see #1719). Multi-threaded append-only restore needs significant amount of memory to work efficiently and overall is pretty complex (#2195). And files will probably still be fragmented after multi-threaded restore (and single-threaded too, if other processes write to the filesystem during restore). Personally, I am in favour of using |
It could also be possible to reduce the fragmentation a lot by processing the pack files in the order in which they are first referenced by a file in the backup. The current iteration order over the pack files is completely random, whereas processing the pack files in appearance order, quite likely, closely resembles consecutive file accesses. |
Allocating up front is a more localized change to the code, though. |
I've taken a closer look at this, using a test file with just 2GB size. That file consisted of 1GB data read from Accessing the pack files in order of first appearance (just remember the order in which the filerestorer encounters a new pack file restic/internal/restorer/filerestorer.go Line 125 in 0fed6a8
With I'm currently working on a PR which includes both changes. As fallocate or a similar command is not available on every platform is probably a good idea to have a fallback to reduce the fragmentation even without such a command. (It would of course be possible, to start with (sequentially) zeroing the files before but that creates a lot of overhead) |
Output of
restic version
restic 0.9.6 (v0.9.6-148-gc03bc88b) compiled with go1.13.5 on linux/amd64
(Beta-Version needed because of problem to restore 'large files' in release-version.
What should restic do differently? Which functionality do you think we should add?
fallocate a file before restoring it's contents. At least optionally
What are you trying to do?
Get files with the least possible fragmentation and contiguously allocated space.
AFAIU: The current version creates the file as sparse and then restores the content "chunk by chunk" in essentially random order.
The resulting file is thus a) extremely fragmented and b) discontiguous.
For e.g.:
Yesterday i restored a 1.4TB VM-Image-File (Ploop with Ext4 inside).
Just mounting that restored-file needed 7 Minutes.
Then i rewrote said file:
rsync -avP --preallocate restored-file new-file
Mounting that rewritten file now only takes 4 seconds as layout of the rewritten is "optimal".
The text was updated successfully, but these errors were encountered: