Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for bucket to bucket sync of the latest versions of files #165

Merged
merged 23 commits into from
Nov 3, 2020

Conversation

mlech-reef
Copy link
Contributor

An example of usage:

$ b2 ls --recursive ML-TestBucket
$ b2 ls --recursive ML-TestBucket2
$ b2 sync ./test b2://ML-TestBucket/test
upload 1.txt
upload 2.txt
upload 3.txt
$ b2 sync b2://ML-TestBucket/test b2://ML-TestBucket2/test
copy 1.txt
copy 2.txt
copy 3.txt
$ b2 ls --recursive ML-TestBucket2
test/1.txt
test/2.txt
test/3.txt

Copy link
Contributor

@bwbeach bwbeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a review of just the first commit. (It's time for the next meeting, and I want the comments to be visible.)

Overall question: What are the semantics of copying from a hidden file? It looks like the destination file gets deleted. Is that what's expected?

@@ -687,10 +687,10 @@ def copy(
"""

copy_source = CopySource(file_id, offset=offset, length=length)
if length is None:
if not length:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is not the same when length is 0. I think this first branch should still happen in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first branch is using a simple copy manager and the second emerger created by @mzukowski-reef
The problem is that emerger doesn't work when copying 0-length files and actually is not needed for such files

"""

def _get_hide_delete_actions(self):
for action in super()._get_hide_delete_actions():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What actions might the parent class have that we want to use here? (It looks like the actual base class just returns the empty list.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the parent class will start returning something, we'd like this class (and all descendants) to honor that without modification of the inheriting classes

if length is None and offset > 0:
raise ValueError('Cannot copy with non zero offset and unknown length')
if not length and offset > 0:
raise ValueError('Cannot copy with non zero offset and unknown or zero length')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Can't we copy a zero-length file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but we'd rather not. Uploads are A-class transactions (free), but b2_copy_file are C-class transactions (2500/day for free, then $0.004 per 1,000 calls), so since we are not going to save much bandwidth on transfer, it's better to re-upload the "whole" file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an emerger code. See my previous comment.

@@ -38,13 +38,15 @@ def is_copy(self):
return True

def get_bytes_range(self):
if self.length is None:
if not self.length:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@bwbeach
Copy link
Contributor

bwbeach commented Oct 15, 2020

I went through the rest of the commits, and they look fine.

I do think there needs to be documentation on what happens when you sync from bucket to bucket. I think it does not copy all the versions of files, just the most recent one. And when in "delete" mode, when the source file is hidden, all versions of the destination file will be deleted.

@mlech-reef
Copy link
Contributor Author

Overall question: What are the semantics of copying from a hidden file? It looks like the destination file gets deleted. Is that what's expected?

  1. By default the destination file is untouched.
  2. If you pass --delete and there is no file on the source, the destination is deleted.
  3. If you pass --delete and the file on the source is hidden, the destination file is untouched.

For me 1 and 2 are ok, but 3 is weird behavior but is not something that I introduced as it works exactly the same way when the destination is a local folder.

@ppolewicz
Copy link
Collaborator

Please file a separate issue for that so that we may decide whether this is a bug that needs to be fixed for local folders too

@mlech-reef
Copy link
Contributor Author

mlech-reef commented Oct 15, 2020

I do think there needs to be documentation on what happens when you sync from bucket to bucket.

Where would you like to see such documentation? In CLI? I think that overall sync documentation in b2sdk is less informative comparing to CLI sync command description.

@ppolewicz
Copy link
Collaborator

both, actually.

Also, we'll add sync of all versions in a separare PR

@mlech-reef mlech-reef changed the title Add support for bucket to bucket sync Add support for bucket to bucket sync of the latest versions of files Oct 28, 2020
@mlech-reef
Copy link
Contributor Author

I've added a better docstring, updated the sphinx docs slightly, and created tickets for dealing with hidden files and sync of all versions (linked).

@mlech-reef
Copy link
Contributor Author

An update for the CLI: Backblaze/B2_Command_Line_Tool#667

b2sdk/sync/sync.py Outdated Show resolved Hide resolved
b2sdk/sync/sync.py Outdated Show resolved Hide resolved
b2sdk/sync/sync.py Show resolved Hide resolved
doc/source/api/sync.rst Outdated Show resolved Hide resolved
mlech-reef and others added 2 commits October 28, 2020 14:57
Co-authored-by: Paweł Polewicz <p.polewicz@gmail.com>
@ppolewicz
Copy link
Collaborator

Codacy says:
test/unit/v1/test_sync.py
unnecessary from enum import Enum

apart from that I think we are ready to merge

@mlech-reef
Copy link
Contributor Author

Codacy says:
test/unit/v1/test_sync.py
unnecessary from enum import Enum

apart from that I think we are ready to merge

Fixed.

@mlech-reef mlech-reef merged commit e30256e into master Nov 3, 2020
@mlech-reef mlech-reef deleted the b2b-sync branch November 3, 2020 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants