-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid: Restic extremely slow on machines with many logical volumes #4755
Comments
There are multiple major problems with your theory about the root cause of the performance problems:
Thus, the problem must be caused by something else. Please provide the information I've asked for in #3041 (comment) . |
Thanks @MichaelEischer : Your clear words helped to free my own mind of some weird theories. Before raising wrong flags here (again) just a short update. I think I actually found the issue (and how I - and not restic - was causing it), but it will take me few more days to fully prove the solution. I will likely need to completely reformulate the issue, including the title... 🙈 |
current state of affairs:
I compared the environments (as listed by
With the left side (called from a shell as With the right side (called from My setup uses the cache predictably now - independent of how it is called - because I am setting So the only thing that remains to complain about is that I seem to remember to have seen a warning about "very slow without cache" only on very few occasions and I would have wished that restic is more informative when something went wrong (well, what is "wrong"?) with finding a cache. Why not print a message whenever an empty cache is initialized and when no cache is used? |
More information: My "primary" ( I (wrongly) assumed, that the my "primary" program persists all metadata, but unfortunately the
As the
With defaults
With ignore-ctime
@MichaelEischer I started this ticket with a false assumption, even the title is wrong (now that I know). Do you want me to
I have two intentions:
|
There's too much unrelated information in this issue. Let's create a new one; just make sure to include a link back to this one.
As far as I remember, a user space program cannot modify
It would be possible to let restic try whether
There should already be an error that gets printed to stderr, see Line 514 in 2173c69
|
closing issue here. Too many assumptions, sorry for the noise. Real issue is now in #4775 |
Output of
restic version
What backend/service did you use to store the repository?
Infomaniak "Swiss Backup" S3 compatible
Problem description / Steps to reproduce
I started using restic a few months ago. I already have in place a working backup host that backs up about 25 clients (each into his own subdirectory) with an rsync based solution called "dirvish". I use LVM extensively, all clients have their own LV on the backup host so that I can manage disk space individually. This gives 25 LVs to be restic backed up that are found during lvscan and mounted on startup.
Dirvish creates a new timestamp named directory on the backup host for each backup run. I use restic to mirror the latest run of each client to cloud storage. As restic does not prune the full path, I apply a bind mount on top that mounts the latest directory to
bindlatest
, so that it is always the same path for restic.The backup host is powered off after each the backup and woken up on LAN daily.
The restic command is as follows (the ignore parameters are probably not needed any more after this issue is solved):
The prima vista symptom was that
whenever I manually run my backup script, it is quite fast (less than 30mins for all hosts),
but when I let the system run it automatically, it takes 16hours. See this comment of mine. However, this turned out not to be the final truth. I found out later:
As long as my backup server stays powered, the allocation of minor device numbers is stable, i.e. /dev/mapper/VG_LV will always have the same minor number, even after a reboot, eg:
But when I power cycle the machine, the minor numbers change wildly as they are dynamically assigned during the activation of the LVs.
So, to rephrase my learnings: Using
restic
with dynamically changing major/minor device ids of the source file systems is to be avoided!The full output of my restic backup wrapper (after a change of the minor device number): 2h15min
And without a change of the major/minor (and no change in the source data): 1m30sec
Expected behavior
When I backup the same path (e.g.
/home/backup/myhost
) I excpect restic to understand that this is the same data.Actual behavior
When the same path has new major/minor, restic does not trust that the local and remote data are equal and does the full backup process only to add a few bytes (if at all) after a few hours.
Do you have any idea what may have caused this?
As written above,
restic
is very sensitive to thedevice_id
. I am just about to try to control the major/minor number of my logical volumes as follows:lvchange --persistent y --minor $MINOR MV_VG/MY_LV
The red hat docs were helpful. I started with minor
32
for the persistent allocation so that there is enough room below for dynamically allocated ones (like the root or swap file system).The bind mount mounts an already mounted device again. Therefore the device id of the device under the bind mount is the same as the one under the original mount.
I will be able to confirm if this really was the solution to my issues in about two days, but I am quite optimistic.
Did restic help you today? Did it make you happy in any way?
Restic is fantastic (when you use it the right way, see above). In the first week of my usage with it, I accidently destroyed the local copy of some data but could easily recover it via
restic
.The text was updated successfully, but these errors were encountered: