New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add copy functionality. #2606
Add copy functionality. #2606
Conversation
I'm sorry, but may I ask what this is blocked on right now? |
@FallenWarrior2k no idea, very good question. @fd0 care to intervene? This PR has been sitting idle for almost 2 months now. |
@FallenWarrior2k @middelink I can only apologize. I am the one who asked @middelink to make a PR out of her work, which she so swiftly did. My intention was to have something that we could work off in terms of adjustments and reviews. We started having more of a flow in doing restic maintenance at that point in time, but then Corona etc came into the picture and since then it's been almost impossible to get anything done due to maintainers' life situations with isolation and such things requiring all available time. This PR is still on our radar though, FWIW. |
That's completely fair, I was just wondering. No blame on anybody during these trying times.
…-------- Original Message --------
On 17 Apr 2020, 20:47, rawtaz wrote:
***@***.***(https://github.com/FallenWarrior2k) ***@***.***(https://github.com/middelink) I can only apologize. I am the one who asked ***@***.***(https://github.com/middelink) to make a PR out of her work, which she so swiftly did. My intention was to have something that we could work off in terms of adjustments and reviews. We started having more of a flow in doing restic maintenance at that point in time, but then Corona etc came into the picture and since then it's been almost impossible to get anything done due to maintainers' life situations with isolation and such things requiring all available time. This PR is still on our radar though, FWIW.
—
You are receiving this because you were mentioned.
Reply to this email directly, [view it on GitHub](#2606 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/AE3A7FIVX23MS67P2I34QWDRNCP3LANCNFSM4K42KIVQ).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found a few more typos (see review comments) and a few more fundamental points which could also be deferred to a follow-up PR: The Repository
object currently does not upload intermediate indexes on its own. That could cause the copy command to create a single gigantic index file when completing the snapshot. It would probably be a good idea to use something like the IndexUploader
from the backup
command.
The next point is more of a usability question: The initial idea in #323 was to support a one-time copy from one repository to another. But it would probably also be nice to ensure that when copying a snapshot multiple times, that that only creates one snapshot in the destination repository.
The current implementation won't scale for large numbers of snapshots. These usually share large parts of their filesystem trees, thus iterating the full tree for every snapshot will be rather redundant. A rather simple solution for that would be to use FindUsedBlobs
to collect all blobs of all snapshots and then copy everything in one go. This has the slight downside that when copying a single snapshot it will load trees twice, but should be much faster when copying multiple snapshots. And it would allow adding a progress bar. As well as avoiding yet another tree walking implementation.
From what I can tell right now the steps to finish this PR are: The command needs documentation and a changelog:
[edit] Tests:
Code fixes:
|
For most remotes, I think it should be possible to accomplish this by going through rclone. Although it might not be possible to use e.g. |
Small note: It's no longer possible to build it. I think that due to #2640 merged. I've fixed it locally with this and tested with one-snapshot copy:
|
This is a must-have feature for me, thanks @dionorgua 👍 Also tested & working. |
/middelink/restic/pull/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the two major issues that @MichaelEischer also found.
I think the Index.Has
problem should be fixed in another PR. I'll work on this.
The saving of intermediate indexes should IMO be added in this PR.
Moreover I have some enhancement suggestions and IMO found a possibility to speed copy
up.
EDIT: I started working on the propsed PR and realized that I might easily fix both issues.
The new PR is ready for review: #2773 |
Just noticed that I should have taken a look at #2773 first, which solves my index related concerns. |
I just realized that there might be another optimization: However, I would propose to not solve this in this PR and leave it open for further optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that #2773 was merged, there are only few open issues with this PR.
I think @middelink has not much time to check this PR :) I wanted to rebase this over @aawsome 's new prune branch and got error on
|
middelink/restic#2 |
Hi, sorry for the delay indeed. I seem to not get mail notifications anymore in github on this matter (guess I pressed a wrong button somewhere), just the emails themselves which go off into some folder I hardly look at. Anyways, excuses. I've merged the PR, and (at least on my computer) it compiles again. Thanks! |
I've rebased the PR to the current master branch and adapted it to the repository interface changes. There's also a small optimization to only copy tree once, such that for later snapshots only the changed parts have to be checked. I'll take care of adding some tests and a bit of documentation. |
I've added some documentation and two basic integration tests for the copy command. The more interesting change is probably "copy: Mark and skip previously copied snapshots", which uses the |
Add a copy command to copy snapshots between repositories. It allows the user to specify a destination repository, password, password-file, password-command or key-hint to supply the necessary details to open the destination repository. You need to supply a list of snapshots to copy, snapshots which already exist in the destination repository will be skipped. Note, when using the network this becomes rather slow, as it needs to read the blocks, decrypt them using the source key, then encrypt them again using the destination key before finally writing them out to the destination repository.
Flush does this now.
This speeds up copying multiple overlapping snapshots from one repository to another, as we only have to copy the changed parts of later snapshots.
The re-encryption part shouldn't be too expensive on somewhat modern processors. After all a single CPU core with AES-NI is able to encrypt/decrypt over 1GB/s of data. That is I don't think it's worth to bypass the current Load/StoreBlob abstractions. Re-encrypting blobs also has the benefit, that an attacker can't learn blob boundaries within a pack file after rewriting. (Rewrite would basically shuffle lots of encrypted strings. By checking for common substrings before and after rewriting you'll learn the boundaries between most blobs) However, I can't tell in what regard that information would be useful. For matching chunking parameters we'd probably need some way to initialize a repository with the same chunking parameters as another one. Modifying the config file of an existing repository doesn't sound right to me. (Take these remarks with a grain of salt, I haven't thought much about whether that a good idea or not)
My plan was to go for an all-out parallelization in a few follow-up PRs. The first part would be to extract the parallel tree loading used by the check command into a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I think the most important optimization would be to not reload multiple blobs from identical packs (like done in restorer) and parallelize the saving of blobs (like done in archiver), but this should be done in a future PR.
You are right this is a good point! In fact with this argument I would also opt for always re-encrypting blobs (even when using the same key, but always with a new random nonce) |
Use the `Original` field of the copied snapshot to store a persistent snapshot ID. This can either be the ID of the source snapshot if `Original` was not yet set or the previous value stored in the `Original` field. In order to still copy snapshots modified using the tags command the source snapshot is compared to all snapshots in the destination repository which have the same persistent ID. Snapshots are only considered equal if all fields except `Original` and `Parent` match. That way modified snapshots are still copied while avoiding duplicate copies at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks to everyone helping with implementing this PR and for the extensive discussions!
What is the purpose of this change? What does it change?
Add a copy command to copy snapshots between repositories. It allows the user
to specify a destination repository, password, password-file, password-command
or key-hint to supply the nessecary details to open the destination repository.
You need to supply a list of snapshots to copy, snapshots which alreadys exist
in the destination repository will be skipped.
Note, when using the network this becomes rather slow, as it needs to read the
blocks, decrypt them using the source key, then encrypt them again using the
destination key before finally writing them out to the destination repository.
Was the change discussed in an issue or in the forum before?
closes #323
Checklist
changelog/unreleased/
that describes the changes for our users (template here)gofmt
on the code in all commits