Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to run borg check for specific segment(s) #8070

Open
jensb opened this issue Jan 31, 2024 · 3 comments
Open

Possibility to run borg check for specific segment(s) #8070

jensb opened this issue Jan 31, 2024 · 3 comments
Labels

Comments

@jensb
Copy link

jensb commented Jan 31, 2024

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Question

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

v1.2.0 (Ubuntu package)
and
v1.2.7 (fat binary for ARM on Synology)

Operating system (distribution) and version.

Ubuntu 22.04 on x64, and Synology OS on ARM.

Hardware / network configuration, and filesystems used.

ext4

How much data is handled by borg?

~400 GB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg check, borg delete, borg create - doesn't matter

Describe the problem you're observing.

I canceled a running backup operation (using Ctrl-C), which was backing up to a SMB mount onto a Synology NAS.
After this action I was left with a checkpoint snapshot which apparently has missing archives:

  • "data/17/17896" is missing.

borg check on this archive would take over 8 hours (extrapolated) and was canceled twice because of network issues, so I'd like to avoid running it if possible. Deleting the broken snapshot would be OK, however borg delete ::linuxkiste-2024-01-29T20:13:27.checkpoint throws the same exception, so I cannot even delete the snapshot which has this problem (not even with --force).

Would borg check --repair fix this error?
If so, can I run borg check just for this single segment, and/or somehow force-delete the last checkpoint snapshot?
I know I can use --max-duration, but this too starts at the beginning and (probably needlessly) scans >17000 512MB files.

Fetching and building archive index for linuxkiste-2024-01-29T20:13:27.checkpoint ...
Local Exception
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1432, in get_fd
    ts, fd = self.fds[segment]
  File "/usr/lib/python3/dist-packages/borg/lrucache.py", line 21, in __getitem__
    value = self._cache[key]  # raise KeyError if not found
KeyError: 17896

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5089, in main
    exit_code = archiver.run(args)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 5020, in run
    return set_ec(func(args))
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 183, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1165, in do_delete
    return self._delete_archives(args, repository)
  File "/usr/lib/python3/dist-packages/borg/archiver.py", line 1213, in _delete_archives
    with Cache(repository, key, manifest, progress=args.progress, lock_wait=self.lock_wait) as cache:
  File "/usr/lib/python3/dist-packages/borg/cache.py", line 383, in __new__
    return local()
  File "/usr/lib/python3/dist-packages/borg/cache.py", line 374, in local
    return LocalCache(repository=repository, key=key, manifest=manifest, path=path, sync=sync,
  File "/usr/lib/python3/dist-packages/borg/cache.py", line 493, in __init__
    self.sync()
  File "/usr/lib/python3/dist-packages/borg/cache.py", line 899, in sync
    self.chunks = create_master_idx(self.chunks)
  File "/usr/lib/python3/dist-packages/borg/cache.py", line 853, in create_master_idx
    fetch_and_build_idx(archive_id, decrypted_repository, archive_chunk_idx)
  File "/usr/lib/python3/dist-packages/borg/cache.py", line 752, in fetch_and_build_idx
    csize, data = decrypted_repository.get(archive_id)
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 1087, in get
    return next(self.get_many([key], cache=False))
  File "/usr/lib/python3/dist-packages/borg/remote.py", line 1090, in get_many
    for key, data in zip(keys, self.repository.get_many(keys)):
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1188, in get_many
    yield self.get(id_)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1182, in get
    return self.io.read(segment, offset, id)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1548, in read
    fd = self.get_fd(segment)
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1434, in get_fd
    fd = open_fd()
  File "/usr/lib/python3/dist-packages/borg/repository.py", line 1415, in open_fd
    fd = open(self.segment_filename(segment), 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/....../Backups/Linuxkiste_18.04_Borg/data/17/17896'

Platform: Linux linuxkiste 6.5.0-14-generic #14~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 20 18:15:30 UTC 2 x86_64
Linux: Unknown Linux  
Borg: 1.2.0  Python: CPython 3.10.12 msgpack: 1.0.3 fuse: pyfuse3 3.2.0 [pyfuse3,llfuse]
PID: 54340  CWD: /home/jens
sys.argv: ['/usr/bin/borg', 'delete', '--debug', '::linuxkiste-2024-01-29T20:13:27.checkpoint']
SSH_ORIGINAL_COMMAND: None

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes, every time.

@ThomasWaldmann
Copy link
Member

Can you post the output of borg check -v --repair ..., please?

@ThomasWaldmann
Copy link
Member

BTW, I can't explain how Ctrl-C would ever lead to that state (a missing, but referenced segment file) in a repository.

So there might be some unidentified additional problem: "why does the repo lose files"?

Maybe consider a fsck on the repo filesystem, smartctl -t long on the repo disk, some sort of memtest on the server and client machine.

@ThomasWaldmann
Copy link
Member

Besides the stuff you already noted, there is only borg check --repair --repository-only that uses less time to check/repair a repo, but that only does the low-level checks.

As there is known damage in your repo and we do not really know whether it is just that single segment file, it would be better to run a full check --repair (including the high level archives part).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants