Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Super slow backup to B2 #1383

Closed
the-destro opened this issue Oct 24, 2017 · 84 comments
Closed

Super slow backup to B2 #1383

the-destro opened this issue Oct 24, 2017 · 84 comments
Labels
state: need feedback waiting for feedback, e.g. from the submitter

Comments

@the-destro
Copy link

This is a fresh bucket and fresh upload to B2

Output of restic version

restic 0.7.3
compiled with go1.9 on freebsd/amd64

How did you run restic exactly?

enter password for repository:
scan [/share_data]
scanned 108338 directories, 270163 files in 0:12
[6:07] 0.01%  138.515 KiB/s  49.643 MiB / 467.293 GiB  55 / 378501 items  0 errors  ETA 982:31:51
[26:27] 0.01%  32.031 KiB/s  49.643 MiB / 467.293 GiB  55 / 378501 items  0 errors  ETA 4248:49:03```

## What backend/server/service did you use?
B2
@fd0
Copy link
Member

fd0 commented Oct 25, 2017

Hey, what upstream bandwidth do you have available? What's your location (network-wise)? B2 is not the fastest service, the latency to the API (at least from Europe) is pretty high.

How did you run restic exactly?

Did you check the bandwidth used with external tools?

What throughput do you get using other programs like rclone?

@fd0 fd0 added state: need feedback waiting for feedback, e.g. from the submitter not enough information labels Oct 25, 2017
@haraldkoch
Copy link

anecdote: I back up two identical servers, one in New Jersey and one in Germany. The former takes about 20 minutes and the latter 90 minutes backing up to B2.

@GreenBlood
Copy link

GreenBlood commented Oct 30, 2017

Hi,
Adding my feedback here. My server is in Amsterdam, upload speed is about 100Mb/s.
Restic version 0.7.3

I'm backing up my nextcloud data folder, about 65-70GB of various size unencrypted files
I created to repositories, one local and one on B2. The local one seems to process the data at ~5MB/s

restic backup -r ./resticbackup /srv/nextcloud
scanned 6320 directories, 47379 files in 0:00
[2:53] 1.12%  4.326 MiB/s  748.373 MiB / 65.470 GiB  619 / 53699 items  0 errors  ETA 4:15:47

When I run the same initial backup to B2, after a short burst, the speed quickly goes down

I've waited a few minutes for the speed to go down after the burst and it sat around 1MB/s. It's confirmed by using the tool "nethogs" to monitor bandwith by process, restic seems to oscillate between 500KB/s and 2000KB/s

Nethogs speed restic

I then used rclone with no special options to upload a similar folder structure as the backup repository and got around the same speeds

Nethogs speed rclone

It oscillated mostly between 500 and 1500 KB/s during my tests

So imho the issue is not with restic itself but with B2.

Edit:

I've re-tried the backup with a huge B2 connections (-o b2.connections=50) and the speed drastically increased.
I'm currently backup-ing at nearly 10MB/s (between 6.5MB and 10MB per second) so the limit seems to be per-connection to the API

@fd0
Copy link
Member

fd0 commented Oct 30, 2017

That's interesting, thanks for reporting back!

@the-destro
Copy link
Author

I think I found my issue as I tried a different ISP and have received very different results. I wonder, is there logic in restic to detect "dead" connections?

@oullah
Copy link

oullah commented Nov 26, 2017

B2 is known to have limited bandwidth per connection, I've seen this same issue with other backup software as well. The solution is always to use multiple connections to upload. I'm glad I found this thread because I didn't know about the b2.connections option. Where are all the options documented?

@rienafairefr
Copy link

I second the solution with b2.connections, from 0.5 to 15 MB/s is a nice relief :-)

@curtwarfield
Copy link

Restic definitely has potential but not sure if it is ready for prime time yet especially with Backblaze.
When I back up to Backblaze B2 with duplicity it is blazingly fast.
--------------[ Backup Statistics ]--------------
StartTime 1511881648.02 (Tue Nov 28 10:07:28 2017)
EndTime 1511881650.63 (Tue Nov 28 10:07:30 2017)
ElapsedTime 2.60 (2.60 seconds)
SourceFiles 10915
SourceFileSize 7991047699 (7.44 GB)

Backing up to Azure using restic is fast but not quite as good as duplicity:
[0:00] 96 directories, 10819 files, 7.441 GiB
scanned 96 directories, 10819 files in 0:00
[0:08] 100.00% 0B/s 7.441 GiB / 7.441 GiB 10915 / 10915 items 0 errors ETA 0:00
duration: 0:08, 861.53MiB/s

And now for the slowest of the group, is backing up to Backblaze B2 using restic:
scanned 4 directories, 2781 files in 0:00
[0:23] 100.00% 24.855 MiB/s 571.663 MiB / 571.663 MiB 2785 / 2785 items 0 errors ETA 0:00
duration: 0:23, 24.60MiB/s

I don't have enough storage on my account to backup the 7.4GB directory on the last test as in the first two examples, but either way you can see that using duplicity to backup to Backblaze is the fastest.

And to even get restic down to 23 seconds, I had to use the connections options and specify 500 b2 connections.

@fd0
Copy link
Member

fd0 commented Nov 28, 2017

@curtwarfield can you please retry with rclone and report back?

@kurin
Copy link
Contributor

kurin commented Nov 28, 2017

@curtwarfield 2.6GiB/s to B2 seems kind of high. Can you confirm this is for a dataset that is not already partially backed up? It takes longer than 2.5s for me just to walk a stat directory tree of about that size.

If duplicity really is getting that kind of performance I would like to run some tests.

@curtwarfield
Copy link

No kurin, that speed with duplicity is on a re-sync with no new data.

I just ran a new 570MB directory backup to B2 using rclone as requested. The new backup took 10 minutes total which seems quicker than with restic.

@kurin
Copy link
Contributor

kurin commented Nov 28, 2017

Are the restic backups listed also resyncs?

@curtwarfield
Copy link

yes kurin. Only takes duplicity 2.6 seconds for a re-sync and restic on Azure takes 8 seconds and restic on B2 takes 23 seconds.
I just checked the B2 re-sync with rclone. rclone is more in line with the Azure performance.

2017/11/28 16:58:20 INFO :
Transferred: 0 Bytes (0 Bytes/s)
Errors: 0
Checks: 2781
Transferred: 0
Elapsed time: 8.7s

@kurin
Copy link
Contributor

kurin commented Nov 28, 2017

Okay, gotcha. I'm not an expert on restic internals (I'm here for B2 muckery). I think there's performance to squeeze out of the actual data transfer but that wouldn't help for your examples.

However, your examples probably do highlight the most common use case, which is backing up a dataset that is 95%+ unchanged, with lots of small files. It would be nice if restic could quickly identify and skip these files.

@kurin
Copy link
Contributor

kurin commented Nov 28, 2017

We should probably split this into two bugs, one being specific to B2 bandwidth, the other being restic handling unchanged data (if that doesn't already exist).

@curtwarfield
Copy link

It does skip the files, it's just takes restic 12 times longer on B2 than duplicity does.
Unfortunately the initial backup is also slow too. Even using the b2 connections options.

@curtwarfield
Copy link

The B2 bandwidth is still too slow using restic, not just for handling unchanged data but for the initial backup too.

@kurin
Copy link
Contributor

kurin commented Nov 28, 2017

Do we have any data on how long it takes Duplicity to talk to B2? 570MB in 10min is ~8mbps, which is already in line with the speeds I see from restic to b2 today.

@curtwarfield
Copy link

I'm going to run the same test for a new back up with duplicity and post results here.

@kurin
Copy link
Contributor

kurin commented Nov 28, 2017

Thanks!

@curtwarfield
Copy link

The proof is in the pudding !!!!
So I backed up the same directory from my server to the SAME B2 bucket using restic and duplicity.
Restic actually took about 12 minutes to back up

Duplicity only took 4 minutes to back up the same directory to the same B2 bucket
Last full backup date: none
Reuse configured PASSPHRASE as SIGN_PASSPHRASE
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1511908655.75 (Tue Nov 28 17:37:35 2017)
EndTime 1511908907.31 (Tue Nov 28 17:41:47 2017)
ElapsedTime 251.57 (4 minutes 11.57 seconds)
SourceFiles 2785
SourceFileSize 599784317 (572 MB)
NewFiles 2785
NewFileSize 599784317 (572 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 2785
RawDeltaSize 599432061 (572 MB)
TotalDestinationSizeChange 297380160 (284 MB)
Errors 0

@curtwarfield
Copy link

Running my backup script the 2nd time on the B2 bucket was again so much faster than restic. So it's not Backblaze that is slow. Out of all my cloud services I use, Backblaze is quicker than all of them. That includes AWS and Google Cloud.

Last full backup date: Tue Nov 28 17:37:34 2017
Reuse configured PASSPHRASE as SIGN_PASSPHRASE
--------------[ Backup Statistics ]--------------
StartTime 1511909417.28 (Tue Nov 28 17:50:17 2017)
EndTime 1511909417.76 (Tue Nov 28 17:50:17 2017)
ElapsedTime 0.48 (0.48 seconds)
SourceFiles 2785
SourceFileSize 599784317 (572 MB)
NewFiles 0
NewFileSize 0 (0 bytes)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 0
RawDeltaSize 0 (0 bytes)
TotalDestinationSizeChange 716 (716 bytes)
Errors 0

@kurin
Copy link
Contributor

kurin commented Nov 28, 2017

That's ~20Mbps, which is totally achievable today in restic with B2 (I just got 750Mbps from a GCE instance on a single large file).

I believe most of this is in restic's handling of directory trees. But that doesn't account for the difference between Azure and B2.

@fd0
Copy link
Member

fd0 commented Nov 29, 2017

@curtwarfield sorry for not having asked this earlier: Which version of restic did you use in your tests? What's the output of restic version?

The background is that with restic 0.8.0 which was released just a few days ago we've added a local cache for metadata. Before that (e.g. with restic 0.7.3) it would by default query the B2 API for each directory it encounters, at least for non-initial backups. So that's slow just because the B2 API has a high latency for getting small chunks of data.

Would you mind re-running your tests with restic 0.8.0 or the master branch?
You can find compiled binaries of the restic master branch here: https://beta.restic.net/?sort=time&order=desc

@addos
Copy link

addos commented Nov 29, 2017

@fd0 pretty sure he is using 0.8.0. It will be interesting to see what it does when he uses the build from the master branch.

@curtwarfield
Copy link

Hello,
Yes I am using 0.8.0 :-)

@fd0
Copy link
Member

fd0 commented Nov 29, 2017

Hm, I've had at least one integration test on Travis fail due to a too slow B2 backend test, so I suppose something at B2 is/was wrong.

@curtwarfield
Copy link

I would like to switch all my backups from duplicity to restic but B2 is too slow right now :-(

@richard-scott
Copy link

After all was said today's fresh backup to B2 is going nicely. I deleted the old Bucket and started from fresh. Even with debug logging going to a tmp file things are moving nicely. I update my restsic binary from master via a jenkins build, so whenever you updated the code, my next backup will use that version. Could the "Relax backend timeout test" change have helped??

@richard-scott
Copy link

Over the last 18hrs, I've uploaded 65GB, that's about 8Mbps. So far I've spent $0.24 on Class B Transactions and $0.10 on storage.

In my debug logs I saw a lot of "404 Not Found" errors, followed by an upload of that exact same object. It is the same in the log posted earlier in this thread.

Surely that can't be right?

@fd0
Copy link
Member

fd0 commented Feb 16, 2018

That's the way we're using the B2 client library right now:

_, err := obj.Attrs(ctx)
if err == nil {
debug.Log(" %v already exists", h)
return errors.New("key already exists")
}
w := obj.NewWriter(ctx)

I'm not sure if this is the most efficient way. We're also making sure that the file does not already exists before writing to it.

Maybe @kurin can have a look where the class B transactions come from...

@kurin
Copy link
Contributor

kurin commented Feb 16, 2018

24c on class B transactions is ~600k transactions, which seems high. I know that restic chunks data, but if you assume ~5MB/chunk there should only be ~13k chunks in a 65GB data set. The only class B transactions are "stat"-like functions (either get_file_info or download_file, which we use instead of get_file_info for good reasons that escape me atm). So issuing 600k of them during an upload session does seem like a lot. @fd0, does restic do any background maintenance while uploading chunks? Juggling locks etc?

@fd0
Copy link
Member

fd0 commented Feb 16, 2018

During upload, there's not much going on. A bit of lock juggling, but that's like one file every five minutes or so, should be negligible. For each file that's to be uploaded, Attrs() is called, followed by writing the file itself. Hm. Shouldn't be that many transactions...

@kurin
Copy link
Contributor

kurin commented Feb 16, 2018

I may have found a redundant call to one of the class B APIs. I'll try to confirm and get a fix out tonight.

@kurin
Copy link
Contributor

kurin commented Feb 21, 2018

The restic nightlies have some fixes in them that may help with this issue: https://beta.restic.net/restic-v0.8.2-19-g9f060576/

I'd be interested to see what difference this makes, if any.

@fd0
Copy link
Member

fd0 commented Feb 21, 2018

For the record, the PRs are: #1634 #1623

@fd0
Copy link
Member

fd0 commented Feb 24, 2018

I think this issue is resolved with 0.8.2 or later, so I'm going to close it for now. Please leave a comment should you still experience very slow uploads with B2 with 0.8.2 or later. Thanks!

@fd0 fd0 closed this as completed Feb 24, 2018
@fd0 fd0 added state: need feedback waiting for feedback, e.g. from the submitter and removed help: wanted labels Feb 24, 2018
@richard-scott
Copy link

I can confirm that 0.8.2 is quickest for me. The upload speed varies with B2, just the same now as it does with S3.

I'm currently uploading 0.01% every 18 seconds. I have 224GB I'm uploading.

Its taken 21hrs to upload 62% which is about 138GB.

@curtwarfield
Copy link

Hey now !!! I just tested 0.8.3 and it's finally working as expected !!!! Great job to the developers.
I'm still not sure why people didn't have issues with 0.8.2 but that version was definitely slow for B2. But I am pleasantly pleased with 0.8.3. I just backed up 600MB consisting of 2800 files in only 3 minutes 20 seconds !!!!! Again kudos and GREAT JOB to all those responsible. You gotta winner !

@fd0
Copy link
Member

fd0 commented Feb 27, 2018

Awesome, thanks for the feedback :)

@cweilguny
Copy link

I'm using restic since about two weeks, backing up to B2 (restic v0.8.3). I have about 300GB of data in about 41.000 files stored on a Raspberry Pi 3 B+ which has a 2TB Western Digital USB disk attached. My internet connection is LTE, I have upload speeds about 20-40 Mbit/s. The first 240GB went fine in under a day. As I'm currently rebuilding my whole network and servers, I had to stop the backup at about 240GB. Since then I tried to start it a couple of times. The first 240GB take about 3-4 hours, there no uploads are done, i think it only checks the files and if they exist in the B2 bucket. That's ok for a raspberry i think, as i guess that this is all about calculating hashes which is probably pretty slow on a Pi. As soon as it gets to fresh files where uploads are done, it's really slow. That means about 15 hours for 1GB (the first 240GB took nearly the same time). I checked which files restic has open using lsof. There where a couple of small 3-5MB files, and one large 2,5GB file. restic worked through those small files the last three days, and also the 2,5GB file is done. Now it's working on only large files, each about 3-6GB in size, with the same "speed".

This sounds like @richard-scott commented: #1383 (comment) - his backup also was slow when it reached uploads. I guess I could solve it by deleting all snapshots of that server, and starting over like richard-scott did. But as long as my backup is in that slow behaviour, maybe I can give some debugging information?

Could a dstat 20 be helpful?

image

@fd0
Copy link
Member

fd0 commented May 8, 2018

The first 240GB take about 3-4 hours, there no uploads are done, i think it only checks the files and if they exist in the B2 bucket.

Roughly, yes.

There where a couple of small 3-5MB files, and one large 2,5GB file. restic worked through those small files the last three days, and also the 2,5GB file is done. Now it's working on only large files, each about 3-6GB in size, with the same "speed".

This may be caused by several different things:

  • The large file restic reads may have much duplication in it (e.g. it contains only null bytes), so restic reads a block of data, then detects that this block is already in the repo, so it moves on to the next block without uploading anything
  • Sometimes B2 is slow, we don't know why, so maybe it's a limit on the receiving end at B2...

You can test the following:

  • Observe the IO bandwidth to the hard disc, e.g. using iostat. Does it change over time, when restic reaches the point where it reads new files?
  • You can create a second repo on B2 and backup some data to see if restic uploads faster then
  • Maybe try the latest master branch, we've reworked the archive code completely since 0.8.3. You can find compiled binaries of the restic master branch here: https://beta.restic.net/?sort=time&order=desc

Richards observations may be related, but that was with a much older version of restic (0.8.1), we've improved handling B2 a lot since then.

If this still is an issue, please open a new issue so we can discuss it.

@richard-scott
Copy link

@Netdesk Have you tried setting this:

--option b2.connections=50

I think the default is something like 5 parallel connections.

@cweilguny
Copy link

I tried the latest dev version as suggested by @fd0. It was a bit faster (and I like the new output details) and the rest of the 300GB are done now after one day, so I can't really reproduce it. But before (regular 0.8.3) lsof showed ten open files, using the dev version it opened only 2-3 files at the same time. I don't know how much connections to B2 it opened. I did log iostat and vnstat every minute during the backup with the dev version. Gonna check the files and report in a new issue if I find something.

Just now I tried a restic check which failed because it said, that a lock is in place which was created five days ago. That was about the day and time when the first try of the 300GB backup was interrupted. Can't imagine if that lock could slow down the backup process, just something I found. As it's probably hard to reproduce now, digging in the dark isn't worth the time I guess. Like @fd0 said, I'll open another issue if it occours again.

@blastrock
Copy link

blastrock commented Apr 16, 2020

Hi,

I'm using restic to backup to b2, and it's very painfully slow. A backup I started about 24h ago of 300GB isn't even finished yet. I am using -o b2.connections=200 but I can see at most 4 open connections and they are idle most of the time.

I tried hacking a bit in the code and found this. So I have done a few benchmarks.

I did these benchmarks under the following conditions:

  • fast.com measured my upload speed at 300Mbps
  • I am backing up a hard drive spinning at 7200RPM. hdparm reports 175MB/s for non-cached reads. I don't know how to measure the seek latency, but I can check if you want.
  • I am backing up 2.8GB, 2413 directories and 16155 files. When uploaded to the bucket, they only take 1.2GB, I guess this is due to restic's compression.
  • I live in France and upload to b2 in their US region (I didn't run thorough benchmarks on the EU region, but the numbers seemed to be the same)
  • I used -o b2.connections=200 because even with that, restic uses very few connections anyway
  • Each test is run on a bare repository
  • I ran restic with --no-cache, but I've been told caches aren't shared between repository, so this shouldn't change anything.
  • I compiled commit 5a7c27d from master
  • I am using a debian unstable with linux 5.4.0

I tweaked two of the variables in the code linked above and got these results:

FileRead SaveBlob SaveTree Total time
2 16 16*20 7min06s (default parameters)
16 16 16*20 2min27s
16 64 64*20 2min33s

It seems that 2 isn't that good of a value for this FileRead parameter. If it's really something that depends, maybe it should be made configurable from the command line. Or maybe it is and I missed it, please tell me.

I did some tests on a SSD too but with different conditions, and saw a similar change. Changing it from 2 to 8 changed the backup time from 52 minutes to 19 minutes.

Finally, I am posting this here because it happens I'm using b2 and everybody complaining about speed is using b2 as well, so I thought this was related. If someone can do a similar benchmark on another storage service, we could know if this setting really is suboptimal for everyone and not just b2 customers.

Please tell me if my reasoning is wrong, or if you have any suggestions to make my backups faster.

EDIT: added actual bucket size after the backup
EDIT2: added note about --no-cache
EDIT3: clarified that each run is done on a new repository

@MichaelEischer
Copy link
Member

Does backing up to a local backup repository change anything regarding the time required for backup?

@er1z
Copy link

er1z commented Apr 16, 2020

@blastrock PL here and changing DC to EU increases speed dramatically. US speeds to my systems were ~30 KiB/s (on 20 Mbps uplink).

So the target DC is the most important matter.

@blastrock
Copy link

Does backing up to a local backup repository change anything regarding the time required for backup?

A local backup of the same folder from the HDD to that same HDD, with default settings, takes 27s. With FileReadConcurrency set to 16, it takes 19s. The difference is smaller here, and this test also proves that the time to read the files is negligible compared to the time of the whole backup to b2.

@blastrock PL here and changing DC to EU increases speed dramatically. US speeds to my systems were ~30 KiB/s (on 20 Mbps uplink).

Actually, the backup I talk about in my first paragraph was being done in the EU region. I got confused with the accounts and did the benchmark on US though. I can try to run that same benchmark on the EU region, just to have comparable numbers.

@MichaelEischer
Copy link
Member

MichaelEischer commented Apr 16, 2020

The difference is smaller here, and this test also proves that the time to read the files is negligible compared to the time of the whole backup to b2.

What I don't understand in that regard is that for SaveBlob=16 the number of FileReaders makes a large difference. The actual upload to B2 is happening in the SaveBlob threads, so if the B2 upload is the bottleneck then I would expect that the number of FileReaders does not make a larger difference.

In what regard does --no-cache help with ensuring comparable runs? If you start with a new repository then there's no cache for that repository and if the repository already contains the backup data then you'd just measure how fast restic can scan the backup data set, in which case the upload to B2 would no longer be the bottleneck. Did you flush the filesystem cache between the backup variants or did you run a warmup backup run to prime the filesystem cache?

@blastrock
Copy link

What I don't understand in that regard is that for SaveBlob=16 the number of FileReaders makes a large difference. The actual upload to B2 is happening in the SaveBlob threads, so if the B2 upload is the bottleneck then I would expect that the number of FileReaders does not make a larger difference.

I agree, but I don't really know how restic works internally. With SaveBlob=16 and b2.connections=200, I can only see 4-5 connections from restic, and they are mostly idle. So my guess was that the "saving" part was not receiving enough data, and that somehow the "reading" part was waiting for the "saving" part to finish its work instead of just sending more.

If you start with a new repository then there's no cache for that repository

Oh, my bad, I thought some part of the cache was shared between repositories, like file modification timestamps. I did start with a new repository for each test. I'll edit my post to clarify that.

Did you flush the filesystem cache between the backup variants or did you run a warmup backup run to prime the filesystem cache?

I haven't thought about it at all actually ^^
I have just re-run the same test with FileRead=2 twice (to make sure the cache is warmed up) and got 6min12s and 6min00s this time. Not quite the same timings as before, but still bad. I have rerun the test just after that with FileRead=16 and got 2min00s.

@MichaelEischer
Copy link
Member

I think this is in fact a performance problem in restic. Each file_saver splits a file into chunks and passed them to the blob_saver. A file_saver only continues with the next file after all chunks were uploaded. Storing a blob can either be fast, if the pack is not yet completed or slow if the pack get's finalized and uploaded. Or in other words the file_saver and the blob_saver are not completely decoupled. For small files the effective uploaded parallelism is limited to FileRead.

In my opinion this warrants its own issue, could you please open a new issue and add your observations to it? Could you also try how long it takes to upload an 1GB file containing random data (e.g. dd if=/dev/urandom of=test bs=1m count=1024)? I'd expect that that would end up in the 2min range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state: need feedback waiting for feedback, e.g. from the submitter
Projects
None yet
Development

No branches or pull requests