Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 'rewrite' command to exclude files from existing snapshots #2731

Merged
merged 25 commits into from
Nov 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
dc29709
Implement 'rewrite' command to exclude files from existing snapshots
dionorgua May 5, 2020
b922774
rewrite: fix compilation
MichaelEischer Sep 6, 2022
82592b8
rewrite: address most review comments
MichaelEischer Sep 6, 2022
4d6ab83
rewrite: use treejsonbuilder
MichaelEischer Sep 6, 2022
c0f7ba2
rewrite: simplify dryrun
MichaelEischer Sep 6, 2022
f6339b8
rewrite: extract tree filtering
MichaelEischer Sep 6, 2022
2b69a1c
rewrite: filter all snapshots if none are specified
MichaelEischer Sep 6, 2022
4cace1f
unify exclude patterns with backup command
MichaelEischer Sep 6, 2022
559acea
unify exclude pattern options
MichaelEischer Sep 7, 2022
7ebaf6e
rewrite: start repository uploader goroutines
MichaelEischer Sep 9, 2022
ad14d6e
rewrite: use SelectByName like in the backup command
MichaelEischer Sep 9, 2022
327f418
rewrite: cleanup err handling and output
MichaelEischer Sep 9, 2022
375a3db
rewrite: non-exclusive lock if snapshots are only added
MichaelEischer Sep 9, 2022
b044649
rewrite: add minimal test
MichaelEischer Sep 9, 2022
a47d9a1
rewrite: use unified snapshot filter options
MichaelEischer Sep 27, 2022
73f54cc
rewrite: rename --inplace to --forget
MichaelEischer Sep 27, 2022
0224e27
walker: Add tests for FilterTree
MichaelEischer Oct 14, 2022
ec0c91e
rewrite: Add tests for further ways to use the command
MichaelEischer Oct 14, 2022
11b8c3a
rewrite: add documentation
MichaelEischer Oct 14, 2022
f88acd4
rewrite: Fail if a tree contains an unknown field
MichaelEischer Oct 15, 2022
c15bedc
rewrite: Revert unrelated documentation change
rawtaz Oct 24, 2022
f86ef4d
rewrite: Polish code and add missing messages
rawtaz Oct 24, 2022
f175da2
rewrite: Polish documentation
rawtaz Oct 24, 2022
537cfe2
rewrite: Fix check that an exclude pattern was passed
MichaelEischer Nov 12, 2022
bb0fa76
Cleanup exclude pattern collection
MichaelEischer Nov 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions changelog/unreleased/issue-14
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Enhancement: Implement rewrite command

We've added a new command which allows to rewrite existing snapshots to remove
unwanted files.

https://github.com/restic/restic/issues/14
https://github.com/restic/restic/pull/2731
2 changes: 1 addition & 1 deletion cmd/restic/cmd_backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,7 @@ func collectRejectByNameFuncs(opts BackupOptions, repo *repository.Repository, t
fs = append(fs, f)
}

fsPatterns, err := collectExcludePatterns(opts.excludePatternOptions)
fsPatterns, err := opts.excludePatternOptions.CollectPatterns()
if err != nil {
return nil, err
}
Expand Down
218 changes: 218 additions & 0 deletions cmd/restic/cmd_rewrite.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
package main

import (
"context"
"fmt"

"github.com/spf13/cobra"
"golang.org/x/sync/errgroup"

"github.com/restic/restic/internal/backend"
"github.com/restic/restic/internal/debug"
"github.com/restic/restic/internal/errors"
"github.com/restic/restic/internal/repository"
"github.com/restic/restic/internal/restic"
"github.com/restic/restic/internal/walker"
)

var cmdRewrite = &cobra.Command{
Use: "rewrite [flags] [snapshotID ...]",
Short: "Rewrite snapshots to exclude unwanted files",
Long: `
The "rewrite" command excludes files from existing snapshots. It creates new
snapshots containing the same data as the original ones, but without the files
you specify to exclude. All metadata (time, host, tags) will be preserved.

The snapshots to rewrite are specified using the --host, --tag and --path options,
or by providing a list of snapshot IDs. Please note that specifying neither any of
these options nor a snapshot ID will cause the command to rewrite all snapshots.

The special tag 'rewrite' will be added to the new snapshots to distinguish
them from the original ones, unless --forget is used. If the --forget option is
used, the original snapshots will instead be directly removed from the repository.

Please note that the --forget option only removes the snapshots and not the actual
data stored in the repository. In order to delete the no longer referenced data,
use the "prune" command.

EXIT STATUS
===========

Exit status is 0 if the command was successful, and non-zero if there was any error.
`,
DisableAutoGenTag: true,
RunE: func(cmd *cobra.Command, args []string) error {
return runRewrite(cmd.Context(), rewriteOptions, globalOptions, args)
},
}

// RewriteOptions collects all options for the rewrite command.
type RewriteOptions struct {
Forget bool
DryRun bool

snapshotFilterOptions
excludePatternOptions
}

var rewriteOptions RewriteOptions

func init() {
cmdRoot.AddCommand(cmdRewrite)

f := cmdRewrite.Flags()
f.BoolVarP(&rewriteOptions.Forget, "forget", "", false, "remove original snapshots after creating new ones")
f.BoolVarP(&rewriteOptions.DryRun, "dry-run", "n", false, "do not do anything, just print what would be done")

initMultiSnapshotFilterOptions(f, &rewriteOptions.snapshotFilterOptions, true)
initExcludePatternOptions(f, &rewriteOptions.excludePatternOptions)
}

func rewriteSnapshot(ctx context.Context, repo *repository.Repository, sn *restic.Snapshot, opts RewriteOptions) (bool, error) {
if sn.Tree == nil {
return false, errors.Errorf("snapshot %v has nil tree", sn.ID().Str())
}

rejectByNameFuncs, err := opts.excludePatternOptions.CollectPatterns()
if err != nil {
return false, err
}

selectByName := func(nodepath string) bool {
for _, reject := range rejectByNameFuncs {
if reject(nodepath) {
return false
}
}
return true
}

wg, wgCtx := errgroup.WithContext(ctx)
repo.StartPackUploader(wgCtx, wg)

var filteredTree restic.ID
wg.Go(func() error {
filteredTree, err = walker.FilterTree(wgCtx, repo, "/", *sn.Tree, &walker.TreeFilterVisitor{
SelectByName: selectByName,
PrintExclude: func(path string) { Verbosef(fmt.Sprintf("excluding %s\n", path)) },
})
if err != nil {
return err
}

return repo.Flush(wgCtx)
})
err = wg.Wait()
if err != nil {
return false, err
}

if filteredTree == *sn.Tree {
debug.Log("Snapshot %v not modified", sn)
return false, nil
}

debug.Log("Snapshot %v modified", sn)
if opts.DryRun {
Verbosef("would save new snapshot\n")

if opts.Forget {
Verbosef("would remove old snapshot\n")
}

return true, nil
}
MichaelEischer marked this conversation as resolved.
Show resolved Hide resolved

// Retain the original snapshot id over all tag changes.
if sn.Original == nil {
sn.Original = sn.ID()
}
*sn.Tree = filteredTree

if !opts.Forget {
sn.AddTags([]string{"rewrite"})
}

// Save the new snapshot.
id, err := restic.SaveSnapshot(ctx, repo, sn)
if err != nil {
return false, err
}

if opts.Forget {
h := restic.Handle{Type: restic.SnapshotFile, Name: sn.ID().String()}
if err = repo.Backend().Remove(ctx, h); err != nil {
return false, err
}
debug.Log("removed old snapshot %v", sn.ID())
Verbosef("removed old snapshot %v\n", sn.ID().Str())
}
Verbosef("saved new snapshot %v\n", id.Str())
return true, nil
}

func runRewrite(ctx context.Context, opts RewriteOptions, gopts GlobalOptions, args []string) error {
if opts.excludePatternOptions.Empty() {
return errors.Fatal("Nothing to do: no excludes provided")
}

repo, err := OpenRepository(ctx, gopts)
if err != nil {
return err
}

if !opts.DryRun {
var lock *restic.Lock
var err error
if opts.Forget {
Verbosef("create exclusive lock for repository\n")
lock, ctx, err = lockRepoExclusive(ctx, repo)
} else {
lock, ctx, err = lockRepo(ctx, repo)
}
defer unlockRepo(lock)
if err != nil {
return err
}
} else {
repo.SetDryRun()
}

snapshotLister, err := backend.MemorizeList(ctx, repo.Backend(), restic.SnapshotFile)
if err != nil {
return err
}

if err = repo.LoadIndex(ctx); err != nil {
return err
}

changedCount := 0
for sn := range FindFilteredSnapshots(ctx, snapshotLister, repo, opts.Hosts, opts.Tags, opts.Paths, args) {
Verbosef("\nsnapshot %s of %v at %s)\n", sn.ID().Str(), sn.Paths, sn.Time)
changed, err := rewriteSnapshot(ctx, repo, sn, opts)
if err != nil {
return errors.Fatalf("unable to rewrite snapshot ID %q: %v", sn.ID().Str(), err)
}
if changed {
changedCount++
}
}

Verbosef("\n")
if changedCount == 0 {
if !opts.DryRun {
Verbosef("no snapshots were modified\n")
} else {
Verbosef("no snapshots would be modified\n")
}
} else {
if !opts.DryRun {
Verbosef("modified %v snapshots\n", changedCount)
} else {
Verbosef("would modify %v snapshots\n", changedCount)
}
}

return nil
}
6 changes: 5 additions & 1 deletion cmd/restic/exclude.go
Original file line number Diff line number Diff line change
Expand Up @@ -475,7 +475,11 @@ func initExcludePatternOptions(f *pflag.FlagSet, opts *excludePatternOptions) {
f.StringArrayVar(&opts.InsensitiveExcludeFiles, "iexclude-file", nil, "same as --exclude-file but ignores casing of `file`names in patterns")
}

func collectExcludePatterns(opts excludePatternOptions) ([]RejectByNameFunc, error) {
func (opts *excludePatternOptions) Empty() bool {
return len(opts.Excludes) == 0 && len(opts.InsensitiveExcludes) == 0 && len(opts.ExcludeFiles) == 0 && len(opts.InsensitiveExcludeFiles) == 0
}

func (opts excludePatternOptions) CollectPatterns() ([]RejectByNameFunc, error) {
var fs []RejectByNameFunc
// add patterns from file
if len(opts.ExcludeFiles) > 0 {
Expand Down
73 changes: 73 additions & 0 deletions cmd/restic/integration_rewrite_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
package main

import (
"context"
"path/filepath"
"testing"

"github.com/restic/restic/internal/restic"
rtest "github.com/restic/restic/internal/test"
)

func testRunRewriteExclude(t testing.TB, gopts GlobalOptions, excludes []string, forget bool) {
opts := RewriteOptions{
excludePatternOptions: excludePatternOptions{
Excludes: excludes,
},
Forget: forget,
}

rtest.OK(t, runRewrite(context.TODO(), opts, gopts, nil))
}

func createBasicRewriteRepo(t testing.TB, env *testEnvironment) restic.ID {
testSetupBackupData(t, env)

// create backup
testRunBackup(t, filepath.Dir(env.testdata), []string{"testdata"}, BackupOptions{}, env.gopts)
snapshotIDs := testRunList(t, "snapshots", env.gopts)
rtest.Assert(t, len(snapshotIDs) == 1, "expected one snapshot, got %v", snapshotIDs)
testRunCheck(t, env.gopts)

return snapshotIDs[0]
}

func TestRewrite(t *testing.T) {
env, cleanup := withTestEnvironment(t)
defer cleanup()
createBasicRewriteRepo(t, env)

// exclude some data
testRunRewriteExclude(t, env.gopts, []string{"3"}, false)
snapshotIDs := testRunList(t, "snapshots", env.gopts)
rtest.Assert(t, len(snapshotIDs) == 2, "expected two snapshots, got %v", snapshotIDs)
testRunCheck(t, env.gopts)
}

func TestRewriteUnchanged(t *testing.T) {
env, cleanup := withTestEnvironment(t)
defer cleanup()
snapshotID := createBasicRewriteRepo(t, env)

// use an exclude that will not exclude anything
testRunRewriteExclude(t, env.gopts, []string{"3dflkhjgdflhkjetrlkhjgfdlhkj"}, false)
newSnapshotIDs := testRunList(t, "snapshots", env.gopts)
rtest.Assert(t, len(newSnapshotIDs) == 1, "expected one snapshot, got %v", newSnapshotIDs)
rtest.Assert(t, snapshotID == newSnapshotIDs[0], "snapshot id changed unexpectedly")
testRunCheck(t, env.gopts)
}

func TestRewriteReplace(t *testing.T) {
env, cleanup := withTestEnvironment(t)
defer cleanup()
snapshotID := createBasicRewriteRepo(t, env)

// exclude some data
testRunRewriteExclude(t, env.gopts, []string{"3"}, true)
newSnapshotIDs := testRunList(t, "snapshots", env.gopts)
rtest.Assert(t, len(newSnapshotIDs) == 1, "expected one snapshot, got %v", newSnapshotIDs)
rtest.Assert(t, snapshotID != newSnapshotIDs[0], "snapshot id should have changed")
// check forbids unused blobs, thus remove them first
testRunPrune(t, env.gopts, PruneOptions{MaxUnused: "0"})
testRunCheck(t, env.gopts)
}
1 change: 1 addition & 0 deletions doc/040_backup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@ Combined with ``--verbose``, you can see a list of changes:
modified /archive.tar.gz, saved in 0.140s (25.542 MiB added)
Would be added to the repository: 25.551 MiB

.. _backup-excluding-files:
Excluding Files
***************

Expand Down