New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete Files from Existing Snapshot #14
Comments
That'd be really nice. |
It would also allow removing sensitive data that got included unwittingly. |
This would be a great feature! |
Any feedback from the devs on this idea? It would be very nice. For example, I just discovered that a program I build from git checkouts has been creating enormous binaries (almost 100 MB), and these have been getting backed up in my Restic backups unnecessarily. I haven't been using Restic for very long, as I'm still in a testing phase, so it's not a problem to delete the old snapshots in question. But this issue can happen quite easily, and it would be good to have long-term solutions for it, other than forgetting every snapshot. I suppose it would be possible to write a script to restore every snapshot, delete undesired files, and re-backup the snapshot by setting the date manually, but obviously that would take a very long time. It would be great if Restic could do this natively. Thanks. |
I think there are multiple valid use cases for this. Seems like a really good feature to have. I would probably use it myself at some point. |
It probably doesn't really change the implementation effort, but from an UX viewpoint, this might be done with a rather low profile by extending the
So instead of offering a command that modifies snapshots, this would allow making a new backup based on an existing snapshot ID. Deleting a file would be achieved with exclude rules. |
@dnnr See #1550 (comment) However, I don't follow you here. Removing data from old snapshots is definitely a distinct operation and should have its own command. Something like:
And It would also be good for it to have a
|
Well, I left out the step where you'd delete the source snapshot afterwards (using In my opinion, doing it like this would keep the command set more orthogonal compared to adding a new command that overlaps with the functionality of existing commands. Right now, there is |
since we are proposing one file operations it would be nice being able to rename. |
I agree with @alphapapa that there should be a distinct command for this type of operation. It might be For that reason I think perhaps adding a That said I'm not entirely sure renaming is something that's reasonable to implement - it goes quite a long way from what a backup program is about. But sure, why not. I don't think it's wise to have the purge stuff be part of the backup command. In one perspective, you could argue that it's fine - you are doing an operation on your backup. But with that rationale the prune and unlock and forget actions should also be part of the backup command, as they too are about maintaining stuff in your backup. I don't think that makes sense, so I think it should indeed be a separate operation/command, e.g. |
It's definitely not obvious. It's also better if Restic handles that for the user, rather than the user having to keep track of which snapshot IDs have changed and need to be forgotten--which would be quite a burden if the user were rewriting all snapshots in the repo.
I don't understand what you mean. The opposite is the case. This proposed purge/delete/rewrite command does not overlap with
Again, no idea what you're thinking here.
You are proposing making the @rawtaz Yes,
I recommend against using commas as separators, because it makes constructing command lines in scripts much more complicated. |
Well, in a sense, modifying the contents of a snapshot is creating a new snapshot (because it's not the same snapshot as before). Think
I didn't say that. Why would it? There is |
You're right. But at the same time, Restic is not git, and it's not designed to require knowledge of content-based addressing to work. Regardless of how it works under the hood, I think that, to users, the command we are proposing should be considered to modify an existing snapshot, not create a new one, therefore it should be a distinct command.
Well, you said:
Maybe you should explain in more detail.
Let's be specific. |
I'd like to add my opinion: I think having a way to modify snapshots in the repo is valuable, based on the feedback how many people would like to have something like this. The command should be independent of the I don't like the name For the supported operations of the command, I've seen requests for:
The former is exactly what this issue (originally) is about, but are there really use cases for renaming files? |
I think Imagine the repo/backup/snapshot is a bucket. Change is more like swapping the bucket itself for something else, or taking something out of it and putting another thing in, rather than picking something in the bucket up, modifying it a bit, and putting it back. Perhaps some native english/american person knows which is more proper :) It boils down to linguistics I think. |
Hm, |
|
If this is only about deleting files, would it make sense to enhance the If this new feature is about deleting and renaming (or something else) I'd vote for |
Thanks for your input @dimejo 👍 I think that when you're renaming and/or deleting, you are not |
IMHO "rewrite" conveys the meaning the best. |
The |
If it's gonna be separate command, calling it By the way, I feel that restic would benefit from command categories, similar to what Git has with its plumbing commands. Right now, |
You might also consider |
For all that monitor this for updates and hit it from Google, there's no need to wait for this issue to never go into fruition, just use duplicati for the meantime, it has first class support for removing files post fact from snapshots. |
I've been using restic for about a year now and I stopped waiting for features to be implemented. I don't mean that everything should be added into restic, but there is basic things that should be there. I'm considering moving away from restic: the repository is very fragile and can get broken very easily. Yesterday I deleted a snapshots because it included files that should not have been in the backup (I forgot to add an exclude). Since then I have errors in my repository and I haven't been able to repair it yet. I should not have to delete a whole snapshots because some files where included by mistake. |
@MorgothSauron I usually just removed snapshots that contained it too, which is the only solution it seems in restic, but again, duplicati can do it via a single command for a while now, so I've changed since and had no issues. |
I wish to thank everyone for their input on this matter. As we've seen, many people have wanted in particular the ability to remove files from a snapshot. I guess we all make mistakes once in a while when backing up ;) At this point in time the available maintainer and developer time is needed on other parts of restic, so I do not foresee this issue being implemented in the foreseeable future. I'm also going to release a new rest-server as soon as I can, and will then start to look into some other issues. That said, if someone makes a solid PR that is nicely and clearly written, well tested and bug free, and produced in coordination with maintainers, it will definitely be considered for inclusion. This specific issue is one where @fd0 has already given his blessing on the direction, so focus can be mainly on producing a solid implementation (that we know won't corrupt repos) rather than "should we add this feature", which is good. Such a PR should be basic and act as a starting point which if needed can be built upon. An example of what I mean by that is it should for starters:
The rationale here is to get a minimal start as a proof of concept and minimum viable product. Once being tested we can adjust it as needed, e.g. by adding the other
On a related note, perhaps the work done by @middelink in #323 could be used as inspiration or a basis for the implementation, as it does some processing of existing snapshots. I'm going to see if we can get moving with this one too soon. |
Thanks for the thoughtful feedback! |
Hi there. I've added draft It works here with test repo, passes I've tried to get syntax very close to Also I don't like idea to replace snapshots by default, so currently default behavior is to just create new snapshot with Any feedback would be greatly appreciated. |
Hey Dmitry, Thanks for this implementation, great work ! So far it works perfectly on Linux with a small test repo of 600 files + several test snapshots. Restore works and diff shows correctly excluded folders. I will be doing more intensive tests on a "clone" real repo with many GB of data with more 100's of snapshots. I will also try Windows sourced repos. One proposition : have the option to specify a tag for the snapshots that contained the exclusions on a rewrite pass. (keeping the "
This would help identify those snapshots that still contains "thisfileshouldberemoved.txt". On the other hand, the more direct Again very good work. |
@NovacomExperts Yes, my initial motivation was to keep 'history editing as safe as possible. It's very easy to exclude something important with I fully agree that currently this is not fully achieved. It's easy to 'observe' new snapshots, but too difficult to delete old one. Plus I don't like hardcoded In any case I'll wait for feedback from maintainers. Don't want to spend much time if it's move in wrong direction. PS. My primary restic repo is around ~2TB now. Will try on it later after making LVM snapshot. |
@dionorgua Your initial motivation is fully correct. I'll cast my vote to keep it like that, with the "dangerous" option But I agree, let's wait for feedback on this. Yesterday, I forgot the cloned test repo (65 GB) inside a folder that was backed up by restic overnight. I could have I test more intensively with data that spans across multiple snapshots. Cheers |
I've replaced that wrong #2720 pull request with new one because old one was created from master branch. Just added one missing error check. Sorry for extra noise |
Very late for this, but rectify is my suggestion for the delete-specific-file-from-backup command. |
#2731 is exciting, thanks a bunch! |
I have to say that's not a great name for it. Rectify implies there's something wrong that needs correcting/rectifying. While this may be true in one of the use cases, it's not always the case. A user may want to just remove some data from existing snapshots to free up space for all we know, while keeping the rest of the snapshot. The wording has to be more neutral than rectify, I think. |
Hi, if it was possible to add, remove folders or files to an existing snapshot, restic could be like a dedupe filesystem, as OpenDedup. An interesting use case could be to save multiple versions of vhd files. |
The thing should be simple, e.g.
It will delete the directory from all snapshots where it exists. |
Such a destructive action should not be so trivial. FWIW, I think the approach currently taken is the right approach (editing snapshots to remove references to paths then using |
What do you mean "the approach currently taken"? Is it taken for future release or it has already implemented? |
There's a PR for this which incorporates outcomes of discussion here. See #2731. |
It would be great for Restic to have functionality analogous to borg recreate. https://borgbackup.readthedocs.io/en/stable/usage/recreate.html |
Any updates on this? Pull request #2731 seems not maintained any more. |
Because it seems there is some interest in this issue for a long time now, I'll post my crude workaround using Python which I used a couple of days ago and worked perfectly. The basic idea is to rewrite all snapshots but with an 'exclude' filter to exclude the files you want to scrub/purge. Depending on the amount and size of the snapshots, this might take some time because it will rescan the metadata of every file in all of your snapshots. It uses the restic mount function for this so You will lose some information from your snapshots but this can be changed by adapting the script to your needs. As far as I know (might be more) with the current script you will lose:
Requirements:
If you now run this Python script it will change into the directory of each snapshot and perform a backup again using the current snapshot as parent and tagging it to make it distinguishable. Might be wise to test it with just one snapshot if it works to your liking. When restic is making a backup it will show you the 'current_file' it's processing and the total amount of files increasing. These numbers should increase roughly at the same rate as this script is not really writing new files to your repository but only metadata (which is quite fast). #!/usr/bin/env python3
import datetime
import os
mapping = SNAPSHOTS_JSON_MAPPING
for i in mapping:
command = """ RESTIC_COMMAND
--parent {parent_id} \
--ignore-inode \
--time "{time}" \
--tag "PRUNE_TAG" \
. """
repo_dir = "RESTIC_MOUNT_DIR"
os.chdir(f'{repo_dir}/ids/{i["short_id"]}')
# We are forced to lose the timezone and some seconds precision
command = command.format(parent_id=i["id"],
time=datetime.datetime.fromisoformat(i["time"][0:19]))
print(f'---- Processing snapshot {i["short_id"]} ----')
os.system(command) Example: I normally use this command The script will become: #!/usr/bin/env python3
import datetime
import os
mapping = [
{
"time": "2022-08-03T03:04:10.22434835+02:00",
"parent": "6b0a4ca9cbc8bce824588c6343e347405aac3d2bf196ca29b0d59234fc5e4da2",
"tree": "8d33292e2d616d855e1dfba601abaf0e02f61404ae09075462eb6496e5a7eeba",
"paths": [
"/mnt/resticmnt"
],
"hostname": "big-server",
"username": "root",
"id": "f4c4093a743a1ce3eb7f6e7a1914f9b13fca7bab87de6fe1bed0c3d0a2cd314c",
"short_id": "f4c4093a"
}
]
for i in mapping:
command = """ restic --no-cache -p /etc/resticpasswd -r "/mnt/vg2-backup_lvol1/" backup \
--exclude "home/user/directory_i_also_dont_want" \
--parent {parent_id} \
--ignore-inode \
--time "{time}" \
--tag "prune_downloads" \
. """
repo_dir = "/mnt/resticmnt"
os.chdir(f'{repo_dir}/ids/{i["short_id"]}')
# We are forced to lose the timezone and some seconds precision
command = command.format(parent_id=i["id"],
time=datetime.datetime.fromisoformat(i["time"][0:19]))
print(f'---- Processing snapshot {i["short_id"]} ----')
os.system(command) Afterwards you can use Afterwards you first new backup should contain both the I'm writing this after the fact so I might have forgotton some steps or requirements. Let me know and will update this post. Also feel free to make the script more robust if you so desire. |
Is there anything we can do to get this rolling? Would a bounty work? |
Hi all, just checking in to see if this has had any movement? I have several TB of data I need to remove. They're video editing movie data (massive files) that was put into the wrong folder by someone who simply messed up. Human error. Our online backup is now MASSIVE and costing us some coin. We need to remove these files as they're costing us each month to have it there. Just wanted to see if there's an implementation yet? Thanks so much. |
@therealrobster there is a pull request in #2731 that is still being
worked on, but until it is finished, you may want to try the workaround
documented above (note the data loss mentioned in the post).
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
I'd say stay tuned because that PR is shaping up pretty well. |
In cases of accidential backup of e.g. too large files, I would like to be able to delete specific files or directories (incl. recursion) from existing snapshots
The text was updated successfully, but these errors were encountered: