Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command to copy all data to another repository #323

Closed
fd0 opened this issue Oct 25, 2015 · 22 comments · Fixed by #2606
Closed

Add command to copy all data to another repository #323

fd0 opened this issue Oct 25, 2015 · 22 comments · Fixed by #2606
Labels
state: work in progress someone's working on it type: feature suggestion suggesting a new feature

Comments

@fd0
Copy link
Member

fd0 commented Oct 25, 2015

During the discussion in #320 we discovered that functionality may be helpful to copy all data (data blobs, tree blobs, snapshots) from a repository to a new one, recreating pack files and indexes on the fly. This allows creating a new repository in a different location (e.g. moving from a local repository to an sftp-server) and using that from now on without losing any history and old snapshots.

This issues tracks the implementation of this feature and can be closed when it is implemented.

@Intensity
Copy link

Is this intended to handle a one-time copy from one repository (A) to a new one (B)? Or is this meant to be more general by performing a "sync" or update of changed content between (A) and (B) since the last sync?

@fd0
Copy link
Member Author

fd0 commented Nov 3, 2015

At the moment this is inteded to handle a one-time copy only, so that users can migrate to a different repository in a different location, or with a new master key.

@witeshadow
Copy link

Given a slow internet connection, I would like the possibility to backup to s3 and another location as efficiently as possible.

@middelink
Copy link
Member

middelink commented Jun 30, 2017

@witeshadow I'm not sure how that can be done efficiently, as the data is encrypted in repo A with master A', and needs to do to repo B with a different masterkey B`. We need to read in all the data, decrypt with A`, encrypt with B` and write out. There is no way to optimize this for slow bandwidth. Its gonna hurt...

the only optimization I can think is having a selection criteria on the source repo A, by using the host, path and tags filters so you don't have to copy all. However, that depends on your use case.

@mholt
Copy link
Contributor

mholt commented Apr 9, 2018

@fd0 I just wanted to add my vote for this feature request. Anything I can do to make it happen?

@fd0
Copy link
Member Author

fd0 commented Apr 9, 2018

You could implement it... The functionality itself is not hard to do, configuring the two backends is the hard thing. We don't support accessing more than one backend (e.g. there's only one $B2_ACCOUNT_ID)... so I think this feature depends on a proper config file (see #16).

Let's say we have two repos, A and B, and you'd like to sync A->B so that after the process is finished, the set of blobs (and snapshots) in B is a superset of the set of blobs in A.

So, you open both repos and load the index files for each one. Then you iterate over the index of A, for each blob checking if the blob is also contained in B. If this is true, move on to the next. If it's false, download, decrypt, encrypt and upload it to B.

Last is copying the snapshot files over. For each snapshot file in A, decrypt the file, encrypt it again for B, store it there and it's done.

As I said, the technical implementation is rather easy :)

@mholt
Copy link
Contributor

mholt commented Apr 9, 2018

Great! Thanks for the tips. I have this itch, so I will see if I can make time to scratch it -- but for the short-term I will have to go without this restic merge feature. If someone gets to it before I do, that's fine -- or I'll circle back around to this eventually!

@middelink
Copy link
Member

I think I have the implemented already... /me scratches head and looks for it...
... https://github.com/middelink/restic/tree/fix-323
I need to check if it still compiles though, that branch is 228 commits behind ...

@matthijskooijman
Copy link

It might be useful to allow not only a full copy, but also a subset of snapshots. This would support a usecase suggested by #1910 (backup to a primary repo often, and from there backup to offsite/slower/more expensive storage less often) and, I think, would not be a lot harder to implement than a full copy. Might be a future addition, though :-)

@sergeevabc
Copy link

Err… Any news for mere users without dev skills to compile and try out @middelink’s suggestion?

@klmitch
Copy link

klmitch commented Jan 23, 2019

This is mostly a "me too" comment, but I'd like to have the ability to copy only specific snapshots from one repo to another, rather than a "copy-all" or "sync" semantic; e.g., make daily backups to local storage, then once a week copy only the most recent daily to an s3 bucket, etc.

@middelink
Copy link
Member

Well, then you are in luck, my copy cmd takes one or more snapshot ids. In fact copy-all is not something it does. You would have to list your snapshot ids first and then concatenate them on the "restic copy" cmdline. As I see this as a degenerate use-case, I'm good with it.

@Fjodor42
Copy link

Without delving too deep into this, perhaps some discussions with ncw/rclone could be of use...

@keesse
Copy link

keesse commented Aug 13, 2019

I'm also interested in the merge/copy functionality, I have a repository on an USB-stick I would like to merge/copy to my central repository (same passwords).
Any news on this?

@theoretical2019
Copy link

Looks like the fork branch was updated to master, but there's not yet a PR for it.

@middelink Is your code finished / mergeable? If not, what still needs to be done? This is a feature I really want :)

@middelink
Copy link
Member

middelink commented Sep 24, 2019

@theoretical2019 The code itself is finished, but each time I sit down to create an official PR, I keep finding things I need to do before it's ready. Like documentation, like a unreleased/changelog...
Oh, and tests! Did I mention tests? It needs tests :P

@seqizz
Copy link

seqizz commented Jan 29, 2020

@middelink Fyi, I have tested your branch by rebasing to upstream master and it works pretty good. It created a new snapshot with same host, tags and date 👍
Waiting for PR 🎉

Now with such feature, I can create a secondary repository, which is used by the clients only when the first repository is locked for maintenance (e.g. prune). And prune task can trigger a copy from the secondary after it finishes, so no missing backups, hence zero downtime on backup service.

@rawtaz
Copy link
Contributor

rawtaz commented Feb 26, 2020

@middelink Would you be so kind as to create a PR of your code? When doing so, please also allow edits from maintainers - this way, we can help you with the changelog, documentation and so on.

The important thing is that we get a base PR to work on. I'd love to get your great work moving, and so would others I think :) Let me know if you need any help creating the PR!

@middelink
Copy link
Member

@rawtaz Sure. Let me sync up and all that stuff. For some reason I have not found the time to do so earlier, but it looks like I have some time now.

@lfrancke
Copy link

Thank you everyone for your work on this!

I've got one question left that's not answered by the docs (at least for me): Do I need to prune both or is it enough to do it in the source and snapshot deletions are propagated?

@rawtaz
Copy link
Contributor

rawtaz commented Oct 26, 2020

@lfrancke When using the copy command you specifically list the snapshots that you want to copy. Others, both existing, non-existing and previously-existing-but-now-pruned-and-no-longer-existing snapshots are not applicable.

If you copy snapshots from repo A to repo B and then forget and prune them in repo A, they will not be forgotten and pruned in repo B automatically, you'll have to do that in repo B yourself.

@lfrancke
Copy link

Excellent, thank you very much @rawtaz for the quick and helpful response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: work in progress someone's working on it type: feature suggestion suggesting a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.