Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable --progress-total option using pv. #547

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kpoeppel
Copy link

@kpoeppel kpoeppel commented Aug 2, 2023

Add a --progress-total option that internally runs btrbk diff first to get a total size to be transferred (per target). This option is an alternative to --progress and shows a ETA time estimate as well as progress bar via pv.

Aims to resolve #543 .

Copy link
Owner

@digint digint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the MR, looks promising. I've had a brief look at it, and did not have time to test it yet (I'm quite busy at the moment, this might take a while sorry).

Do you have any numbers on how good are the estimates are and how long it takes to calculate the diffs? I'm especially curious as the btrfs send-stream adds quite some metadata.

btrbk Outdated
Comment on lines 723 to 725
my @cmd = ( "pv" );
push @cmd, ("-s", $total_len);
push @cmd, "-e -r -p -i 2";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check if mbuffer supports similar functionality? I'd hate to re-introduce pv as a new dependency, even if optional.

see 9dc717c

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mbuffer does not support this unfortunately.

btrbk Outdated
Comment on lines 5272 to 5299
sub sub_diff($$) {
#
# calc snapshot diff (btrfs find-new)
#
my $src_vol = $_[0];
my $target_vol = $_[1];

# NOTE: in some cases "cgen" differs from "gen", even for read-only snapshots (observed: gen=cgen+1)
my $lastgen = $src_vol->{node}{gen} + 1;

# dump files, sorted and unique
my $ret = btrfs_subvolume_find_new($target_vol, $lastgen);

INFO "Listing changed files for subvolume: $target_vol->{PRINT} (gen=$target_vol->{node}{gen})";
INFO "Starting at generation after subvolume: $src_vol->{PRINT} (gen=$src_vol->{node}{gen})";
INFO "Listing files modified within generation range: [$lastgen..$target_vol->{node}{gen}]";
DEBUG "Newest file generation (transid marker) was: $ret->{transid_marker}";
my $files = $ret->{files};
my $total_len = 0;
foreach my $name (sort keys %$files) {
my $finfo = $files->{$name};
$total_len += $finfo->{len};
}

INFO "Total size: " . print_size($total_len) ;

return $total_len;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get similar (same? better?) results by using the results from btrbk extents, which might have some advantages:

  • faster (?) (note: installing module "IO::AIO" gives quite some speedup!)
  • has caching capabilities (!)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It don't know, I haven't used btrbk extents so far. This is just a copy of the part run by btrbk diff

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a must or still debatable?

@kpoeppel
Copy link
Author

kpoeppel commented Aug 3, 2023

Thanks for the MR, looks promising. I've had a brief look at it, and did not have time to test it yet (I'm quite busy at the moment, this might take a while sorry).

Do you have any numbers on how good are the estimates are and how long it takes to calculate the diffs? I'm especially curious as the btrfs send-stream adds quite some metadata.

Speed-wise it was pretty fast / negligible compared to the transfer time for all my cases.

@kpoeppel
Copy link
Author

kpoeppel commented Aug 3, 2023

Another general comment:
Handling the stderr output is not optimal in the current code, and I used a workaround to feed stderr of the subprocess directly to stderr of the main script. This way errors are not captured.
Maybe the chomp + readline combination should be replaced by something involving select. Also handling the stderr updates of pv or mbuffer (removing parts of their outputs over time) makes it difficult probably.

@digint digint force-pushed the master branch 6 times, most recently from 594fc6d to dac6350 Compare August 5, 2023 18:28
@kpoeppel
Copy link
Author

Any update / further requests for the PR?

@tbertels
Copy link

I'd suggest updating the --progress entry:
--progress show progress bar on send-receive operation
There's no progress bar with --progress since the switch to mbuffer, just transferred size and speed.

A man entry should be added too.

Note that there wasn't any commit here since August 2023, so kpoeppel is probably busy.

It would be nice to show total progress even for the initial snapshot, at least if quotas are enabled (otherwise it can be pretty slow to compute for HDDs with lots of data).
sudo btrfs subvolume show [snapshot]
then displays the snapshot size.

Still, it's pretty useful already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable some time estimate for --progress option (or a new --progress-total) option.
3 participants