Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restic backup should refuse to run without cache #4775

Open
nis65 opened this issue Apr 21, 2024 · 1 comment
Open

restic backup should refuse to run without cache #4775

nis65 opened this issue Apr 21, 2024 · 1 comment

Comments

@nis65
Copy link

nis65 commented Apr 21, 2024

Output of restic version

restic 0.16.4 compiled with go1.21.6 on linux/amd64

What backend/service did you use to store the repository?

Infomaniak "Swiss Backup" S3 compatible

Problem description / Steps to reproduce

Short story: Now that I have found out, it is easy to describe the problem: When restic backup fails to identify the cache directory, a message stating this is printed to stderr and the backup continues. The backup works, but is incredibly slow. When restic is run from a script that only captures stdout and ignores stderr, it is impossible to see the reason for the slowness.

I would prefer if restic backup would abort without cache (and maybe a new option --ignore-no-cache would be implemented to get back the old behavior). IMHO, running restic backup without cache is something that should be avoided. Read on to understand where I am coming from.

Long story

  • I run a dedicated backup server. This backup server uses dirvish (basically an rsync wrapper) to backup remote hosts. Dirvish uses the --link-dest feature of rsync for file based deduplication (unchanged files are hard linked between backup generation directories instead of being transferred and stored a 2nd time) . The server is "woken on LAN" daily, the backup job which runs all backups (including the offsite ones) is run from systemd as last task of the boot up sequence. The last statement of the backup job shuts down the machine, ready for the next wake on lan.

  • I introduced restic as "secondary offsite backup", i.e. I wanted restic to copy the latest dirvish backup generation to the cloud, so I simply added a few lines to my existing backup script: Do a bind mount so that restic sees the same base directory name for the recent dirvish backup (which has a timestamp as directory name), run restic backup and umount the bind mount again.

After having set up (and manually tested) all this to my best knowledge according to the documentation, I realized that when ran automatically from systemd it was unbearably slow, but kind of worked too.

Looking Back: Unfortunately, my existing backup script captured only stdout and not stderr, so

  • when run from systemd, the error messages about the missing cache ended up in journalctl (where I did not look at until doing this bug report) and not in my own logs (where I stared at for hours to see a meaningful difference).
  • when run manually, the error did not happen, so in the setting where I would have seen stderr on the console, nothing was printed there to see.

Having had no clue why restic was so fast in manual mode and so slow in systemd mode made me so desperate that I started to raise the wildest (and wrong) ideas about the root cause, (e.g. ticket #4755). Introducing the explicitly set RESTIC_CACHE_DIR in my backup script solved all problems and restic is now incredibly fast 🎉

Some additional Info

  • journalctl:
Mar 09 21:23:27 kronos do_backups[4027]: unable to open cache: unable to locate cache directory: neither $XDG_CACHE_HOME nor $HOME are defined
  • systemd job:
# /etc/systemd/system/backup.service
[Unit]
Requires=multi-user.target
After=multi-user.target

[Service]
Type=oneshot  
ExecStart=/usr/local/bin/do_backups

[Install]
WantedBy=default.target
  • restic backup -v WITHOUT CACHE: 2h15m
#INF: Repo chosen for milou: s3:https://s3.swiss-backup03.infomaniak.com/restic-pot202403
#INF creating bind mount for /home/backup/milou/20240330-2004 on /home/backup/milou/bindlatest
#INF#Wed Apr  3 05:16:38 AM CEST 2024# restic backup -v --ignore-ctime --ignore-inode --no-extra-verify /home/backup/milou/bindlatest
open repository
lock repository
using parent snapshot 3af862d5
load index files
start scan on [/home/backup/milou/bindlatest]
start backup on [/home/backup/milou/bindlatest]
scan finished in 55.488s: 516965 files, 35.138 GiB

Files:           0 new,     0 changed, 516965 unmodified
Dirs:            0 new,     0 changed, 82420 unmodified
Data Blobs:      0 new
Tree Blobs:      0 new
Added to the repository: 0 B   (0 B   stored)

processed 516965 files, 35.138 GiB in 2:15:50
snapshot e7ffe15c saved
#INF umounting /home/backup/milou/bindlatest
#INF#Wed Apr  3 07:32:31 AM CEST 2024# done
  • restic backup -v WITH CACHE: 1m30s
#INF: Repo chosen for milou: s3:https://s3.swiss-backup03.infomaniak.com/restic-pot202403
#INF creating bind mount for /home/backup/milou/20240330-2004 on /home/backup/milou/bindlatest
#INF#Wed Apr  3 11:27:08 PM CEST 2024# restic backup -v --ignore-ctime --ignore-inode --no-extra-verify /home/backup/milou/bindlatest
open repository
lock repository
using parent snapshot e7ffe15c
load index files
start scan on [/home/backup/milou/bindlatest]
start backup on [/home/backup/milou/bindlatest]
scan finished in 69.366s: 516965 files, 35.138 GiB

Files:           0 new,     0 changed, 516965 unmodified
Dirs:            0 new,     0 changed, 82420 unmodified
Data Blobs:      0 new
Tree Blobs:      0 new
Added to the repository: 0 B   (0 B   stored)

processed 516965 files, 35.138 GiB in 1:30
snapshot 87c73a40 saved
#INF umounting /home/backup/milou/bindlatest
#INF#Wed Apr  3 11:28:40 PM CEST 2024# done

Hint: The options in the examples above (--ignore-ctime --ignore-inode --no-extra-verify) were also a result of me having no clue what causes that slowness. As the bind mount does not change the inode numbers, I don't need --ignore-inode and I use --ignore-ctime only now. This is another lesson learned: As the rsync feature --link-dest updates the hard link count (among others), the ctime of all files and directories in my backup will always have the latest backup date as ctime, because

  • the ctime cannot be "faked" by touch (or even a syscall).
  • the ctime gets updated when a file is created
  • the ctime gets updated whenever a file is linked to its cousin from the previous day.

Expected behavior

  • restic backup should refuse to run when cache directory not found (i.e. working without cache), maybe an option to override this (i.e. "--do-even-without-cache") needs to be implemented too.
  • The messages about the missing cache should go both to stderr and stdout.

Actual behavior

  • restic backup logs to stderr only about the missing cache and then continues happily (but becomes very slow).

Do you have any idea what may have caused this?

See above, the error message is clear. restic needs HOME or XDG_CACHE_HOME to be set to have a cache.

Did restic help you today? Did it make you happy in any way?

I am still thrilled by the implementation of restic forget: the easiest way to manage staggered backup history I've ever seen :-)

@nis65
Copy link
Author

nis65 commented Apr 22, 2024

Just found out: this is strongly related to, if not a duplicate of #4591 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant