Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 'rewrite' command to exclude files from existing snapshots #2731

Merged
merged 25 commits into from Nov 12, 2022

Conversation

dionorgua
Copy link
Contributor

@dionorgua dionorgua commented May 12, 2020

What is the purpose of this change? What does it change?

This adds 'rewrite' command that provides a way to remove data from existing snapshots.

This is some sort of preliminary pull requests. It's actually works here on test repo, but lacks tests, etc. Mostly to discuss things

Was the change discussed in an issue or in the forum before?

[ replaces #2720 because it was created from master branch ]
closes #14

Checklist

  • I have read the Contribution Guidelines
  • I have enabled maintainer edits for this PR
  • I have added tests for all changes in this PR
  • I have added documentation for the changes (in the manual)
  • There's a new file in changelog/unreleased/ that describes the changes for our users (template here)
  • I have run gofmt on the code in all commits
  • All commit messages are formatted in the same style as the other commits in the repo
  • I'm done, this Pull Request is ready for review

Copy link
Contributor

@aawsome aawsome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me 👍
I propose to call the existing code from cmd_forget to actually remove snapshots. That way also a pruning could be directly started from this command which IMO improves the usage for end users.

changelog/unreleased/issue-14 Outdated Show resolved Hide resolved
changelog/unreleased/issue-14 Outdated Show resolved Hide resolved
cmd/restic/cmd_rewrite.go Outdated Show resolved Hide resolved
cmd/restic/cmd_rewrite.go Outdated Show resolved Hide resolved
cmd/restic/cmd_rewrite.go Outdated Show resolved Hide resolved
cmd/restic/cmd_rewrite.go Show resolved Hide resolved
cmd/restic/cmd_rewrite.go Outdated Show resolved Hide resolved
return true, nil
}

err = repo.Flush(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need to flush the repo when the snapshot was modified.

I also wouldn't flush after finishing a single snapshot, but once after all snapshots are rewritten, i.e. in line 289.
The reason is that flush may finish small packs. If you modify a lot of snapshots generating only few new tree blobs, you may end up in having lots of new packs which each only contain few tree blobs.

However, the snapshot files should only be written after flush finishes with nil error. So also writing the snapshots should be postponed. I suggest returning the new snapshot object here and write all snapshots after Flush in runRewrite

cmd/restic/cmd_rewrite.go Outdated Show resolved Hide resolved
cmd/restic/cmd_rewrite.go Outdated Show resolved Hide resolved
@MichaelEischer MichaelEischer mentioned this pull request Aug 8, 2020
8 tasks
@ghost ghost mentioned this pull request Aug 21, 2020
@dionorgua
Copy link
Contributor Author

Thanks a lot for review. I'll try to address everything soon

@edmwagner
Copy link

I Just mirrored my minio and used this branch to rewrite a few different scenarios. looks good so far. restoring files worked as expected and matched versions restored from original repo. :) The repo is several GB smaller though.

I would like to help move this feature forward.

@dionorgua
Copy link
Contributor Author

Hi,
I was busy with personal things. Will try to catch up this weekend and at least update/rebase it.

@ghost
Copy link

ghost commented Apr 7, 2021

Is supporting such a feature for a whole repository planned?

@aawsome
Copy link
Contributor

aawsome commented Apr 7, 2021

Is supporting such a feature for a whole repository planned?

This PR already allows to give multiple snapshots or even multiple snapshot criteria. E.g. you can run it with all snapshots or give multiply -H options to cover all hosts.

We could add a possibility to specify all which resolves to all snapshots - but I would leave that to a future PR as it is pretty independent from this change.

@ghost
Copy link

ghost commented Apr 7, 2021

Good stuff, thank you. 👍

@bertbesser
Copy link

bertbesser commented Aug 16, 2021

What's the status of this PR? Any chance that it gets merge soon? I'd very much like to use the feature :-D

Good work 👍🏼

@rawtaz
Copy link
Contributor

rawtaz commented Aug 16, 2021

@bertbesser This PR should be considered work in progress. No decision has been made about whether this should be merged or not, the PR is not yet marked as done by the author, and there's a bunch of unaddressed comments. Clearly it's not ready for merging, and the status is simply what you see here in the GitHub GUI where you posted a comment. That said, there's a lot of other stuff going on if you look in this repository, so consider this WIP. Feel free to try it yourself though, just don't consider it production ready (at least not officially).

@dionorgua
Copy link
Contributor Author

Hi,

@bertbesser

It was working pretty well on my machine in time it was created (but yes, it's very experimental). Good news is that it doesn't touch any data pack files at all. Basically It just adds new snapshot as copy of existing with a few files removed (plus optionally removes old snapshot). I don't think that it can corrupt repository (except new snapshots, that can be just deleted manually)

I think that 'rewrite + check + prune' scenario should be safe enough. I've already used it on my own 2.5TB repository without any problems. Removed most of garbage from old snapshots.

But I'm not sure that it'll work as is right now, I'll rebase it on top of latest git and try to address most of comments next week.

@Gaibhne
Copy link

Gaibhne commented Oct 21, 2021

This looks like a fantastic feature I sorely miss in Restic. Is there anything I, as a user, can do to help with this PR ?

@edmwagner
Copy link

edmwagner commented Oct 21, 2021 via email

@rawtaz
Copy link
Contributor

rawtaz commented Oct 21, 2021

It's good news that it seems to work fine for those who tried it, and the author's description of how it works suggests it might be not very intrusive and have limited ability to cause any harm if not working as intended. But it's still WIP, needs rebasing, need addressing comments and other polishing, not to mention tests and documentation. Let's conclude that this is the case and unless someone does those parts it's not even close to further review, sorry.

@ghost
Copy link

ghost commented Dec 3, 2021

Hi @dionorgua, is there any chance you'd have an update regarding your work on this PR? I may or may have not oopsied a B2 backup again. 🤠

@MichaelEischer
Copy link
Member

MichaelEischer commented Sep 10, 2022

I've rebased the PR and made quite a few cleanups (see the commit messages for more details). The main tasks left are:

  • Writing some documentation
  • More tests. The current test coverage is just the absolute minimum
  • Decide what to do with the --inplace flag, I'll probably rename it to --forget
  • Make filter option handling and descriptions work as in Cleanup snapshot filter options #3912
  • [ ] Performance optimizations. -> Can be done later on.
  • Finalize the CLI options

@NovacomExperts
Copy link

I'm so thrilled that this PR is getting some progress ! Many thanks to the team members for all the latest commits. I will do some test on my mostly "local" repos.

I support @MichaelEischer on the naming for --forget instead of --inplace. Of course, "are you sure" could follow before execution, just like prune. As discussed with @dionorgua before, a clear --tag-old-snapshots-with <tag> and --tag-new-snapshots-with <tag> would work great for managing 2 step forget process.

Keep on the great work !

@bertbesser
Copy link

bertbesser commented Sep 14, 2022

It was working pretty well on my machine in time it was created (but yes, it's very experimental). Good news is that it doesn't touch any data pack files at all. Basically It just adds new snapshot as copy of existing with a few files removed (plus optionally removes old snapshot). I don't think that it can corrupt repository (except new snapshots, that can be just deleted manually)

I think that 'rewrite + check + prune' scenario should be safe enough. I've already used it on my own 2.5TB repository without any problems. Removed most of garbage from old snapshots.

@NovacomExperts The --tag-xxx-snapshots-with would just be a welcome addition to this, correct? Not a fundamentally other approch if I get you right. Cheers

@MichaelEischer
Copy link
Member

I've added enough tests to achieve over 80% test coverage for the new code. There's now also a new documentation section. I'll probably split some of the preparatory refactoring commits into separate PRs to reduce the amount of code a bit.

As discussed with @dionorgua before, a clear --tag-old-snapshots-with <tag> and

--tag-old-snapshots is impossible to implement. We cannot change the tag of an existing snapshot without creating a new one and removing the old snapshot.

Instead of using tags, using the printed snapshot id should also work.

@MichaelEischer
Copy link
Member

I've added a commit to let the rewrite command reject rewriting anything which would result in loosing data in a tree. This ensures that we can add new fields to the Tree struct without having to increase the repository version every time. Without the check, the rewrite command could just loose data.

The check is not completely optimal, but it should prevent rewriting for all problematic situations. If the need arises to handle this in a more targeted way, we can still adapt the implementation later on.

@MichaelEischer
Copy link
Member

LGTM. I've made one final change before merging this PR: when just passing--iexclude-file the rewrite command would complain about missing excludes.

Everything went very well and all checks are ok (including read-data).

Thanks for testing! Good to hear that the PR indeed worked as expected (it's been in a good shape for some time.

Copy link
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM2

@MichaelEischer MichaelEischer merged commit 726a196 into restic:master Nov 12, 2022
@12nick12
Copy link

12nick12 commented Jan 7, 2023

Any idea when this will be in the prebuilt binaries?

@rawtaz
Copy link
Contributor

rawtaz commented Jan 7, 2023

It will be in the next release, which is released when it's done.

@aawsome
Copy link
Contributor

aawsome commented Jan 8, 2023

@12nick12 You might want to have a look at https://beta.restic.net/

@rawtaz
Copy link
Contributor

rawtaz commented Jan 8, 2023

FWIW, we're very close to a new release so stay tuned and you'll have it in your hands soon :) But indeed, nothing wrong with using the builds @aawsome pointed to (thanks for pointing that out)! I use them all the time.

@rawtaz
Copy link
Contributor

rawtaz commented Jan 12, 2023

@12nick12 It's out now! See: https://restic.net/blog/2023-01-12/restic-0.15.0-released/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delete Files from Existing Snapshot
9 participants