-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Super slow backup to B2 #1383
Comments
Hey, what upstream bandwidth do you have available? What's your location (network-wise)? B2 is not the fastest service, the latency to the API (at least from Europe) is pretty high. How did you run restic exactly? Did you check the bandwidth used with external tools? What throughput do you get using other programs like |
anecdote: I back up two identical servers, one in New Jersey and one in Germany. The former takes about 20 minutes and the latter 90 minutes backing up to B2. |
Hi, I'm backing up my nextcloud data folder, about 65-70GB of various size unencrypted files
When I run the same initial backup to B2, after a short burst, the speed quickly goes down I've waited a few minutes for the speed to go down after the burst and it sat around 1MB/s. It's confirmed by using the tool "nethogs" to monitor bandwith by process, restic seems to oscillate between 500KB/s and 2000KB/s I then used rclone with no special options to upload a similar folder structure as the backup repository and got around the same speeds It oscillated mostly between 500 and 1500 KB/s during my tests So imho the issue is not with restic itself but with B2. Edit: I've re-tried the backup with a huge B2 connections (-o b2.connections=50) and the speed drastically increased. |
That's interesting, thanks for reporting back! |
I think I found my issue as I tried a different ISP and have received very different results. I wonder, is there logic in restic to detect "dead" connections? |
B2 is known to have limited bandwidth per connection, I've seen this same issue with other backup software as well. The solution is always to use multiple connections to upload. I'm glad I found this thread because I didn't know about the |
I second the solution with b2.connections, from 0.5 to 15 MB/s is a nice relief :-) |
Restic definitely has potential but not sure if it is ready for prime time yet especially with Backblaze. Backing up to Azure using restic is fast but not quite as good as duplicity: And now for the slowest of the group, is backing up to Backblaze B2 using restic: I don't have enough storage on my account to backup the 7.4GB directory on the last test as in the first two examples, but either way you can see that using duplicity to backup to Backblaze is the fastest. And to even get restic down to 23 seconds, I had to use the connections options and specify 500 b2 connections. |
@curtwarfield can you please retry with rclone and report back? |
@curtwarfield 2.6GiB/s to B2 seems kind of high. Can you confirm this is for a dataset that is not already partially backed up? It takes longer than 2.5s for me just to walk a stat directory tree of about that size. If duplicity really is getting that kind of performance I would like to run some tests. |
No kurin, that speed with duplicity is on a re-sync with no new data. I just ran a new 570MB directory backup to B2 using rclone as requested. The new backup took 10 minutes total which seems quicker than with restic. |
Are the restic backups listed also resyncs? |
yes kurin. Only takes duplicity 2.6 seconds for a re-sync and restic on Azure takes 8 seconds and restic on B2 takes 23 seconds. 2017/11/28 16:58:20 INFO : |
Okay, gotcha. I'm not an expert on restic internals (I'm here for B2 muckery). I think there's performance to squeeze out of the actual data transfer but that wouldn't help for your examples. However, your examples probably do highlight the most common use case, which is backing up a dataset that is 95%+ unchanged, with lots of small files. It would be nice if restic could quickly identify and skip these files. |
We should probably split this into two bugs, one being specific to B2 bandwidth, the other being restic handling unchanged data (if that doesn't already exist). |
It does skip the files, it's just takes restic 12 times longer on B2 than duplicity does. |
The B2 bandwidth is still too slow using restic, not just for handling unchanged data but for the initial backup too. |
Do we have any data on how long it takes Duplicity to talk to B2? 570MB in 10min is ~8mbps, which is already in line with the speeds I see from restic to b2 today. |
I'm going to run the same test for a new back up with duplicity and post results here. |
Thanks! |
The proof is in the pudding !!!! Duplicity only took 4 minutes to back up the same directory to the same B2 bucket |
Running my backup script the 2nd time on the B2 bucket was again so much faster than restic. So it's not Backblaze that is slow. Out of all my cloud services I use, Backblaze is quicker than all of them. That includes AWS and Google Cloud. Last full backup date: Tue Nov 28 17:37:34 2017 |
That's ~20Mbps, which is totally achievable today in restic with B2 (I just got 750Mbps from a GCE instance on a single large file). I believe most of this is in restic's handling of directory trees. But that doesn't account for the difference between Azure and B2. |
@curtwarfield sorry for not having asked this earlier: Which version of restic did you use in your tests? What's the output of The background is that with restic 0.8.0 which was released just a few days ago we've added a local cache for metadata. Before that (e.g. with restic 0.7.3) it would by default query the B2 API for each directory it encounters, at least for non-initial backups. So that's slow just because the B2 API has a high latency for getting small chunks of data. Would you mind re-running your tests with restic 0.8.0 or the master branch? |
@fd0 pretty sure he is using 0.8.0. It will be interesting to see what it does when he uses the build from the master branch. |
Hello, |
Hm, I've had at least one integration test on Travis fail due to a too slow B2 backend test, so I suppose something at B2 is/was wrong. |
I would like to switch all my backups from duplicity to restic but B2 is too slow right now :-( |
After all was said today's fresh backup to B2 is going nicely. I deleted the old Bucket and started from fresh. Even with debug logging going to a tmp file things are moving nicely. I update my restsic binary from master via a jenkins build, so whenever you updated the code, my next backup will use that version. Could the "Relax backend timeout test" change have helped?? |
Over the last 18hrs, I've uploaded 65GB, that's about 8Mbps. So far I've spent $0.24 on Class B Transactions and $0.10 on storage. In my debug logs I saw a lot of "404 Not Found" errors, followed by an upload of that exact same object. It is the same in the log posted earlier in this thread. Surely that can't be right? |
That's the way we're using the B2 client library right now: restic/internal/backend/b2/b2.go Lines 199 to 205 in 4eb9df6
I'm not sure if this is the most efficient way. We're also making sure that the file does not already exists before writing to it. Maybe @kurin can have a look where the class B transactions come from... |
24c on class B transactions is ~600k transactions, which seems high. I know that restic chunks data, but if you assume ~5MB/chunk there should only be ~13k chunks in a 65GB data set. The only class B transactions are "stat"-like functions (either get_file_info or download_file, which we use instead of get_file_info for good reasons that escape me atm). So issuing 600k of them during an upload session does seem like a lot. @fd0, does restic do any background maintenance while uploading chunks? Juggling locks etc? |
During upload, there's not much going on. A bit of lock juggling, but that's like one file every five minutes or so, should be negligible. For each file that's to be uploaded, |
I may have found a redundant call to one of the class B APIs. I'll try to confirm and get a fix out tonight. |
The restic nightlies have some fixes in them that may help with this issue: https://beta.restic.net/restic-v0.8.2-19-g9f060576/ I'd be interested to see what difference this makes, if any. |
I think this issue is resolved with 0.8.2 or later, so I'm going to close it for now. Please leave a comment should you still experience very slow uploads with B2 with 0.8.2 or later. Thanks! |
I can confirm that 0.8.2 is quickest for me. The upload speed varies with B2, just the same now as it does with S3. I'm currently uploading 0.01% every 18 seconds. I have 224GB I'm uploading. Its taken 21hrs to upload 62% which is about 138GB. |
Hey now !!! I just tested 0.8.3 and it's finally working as expected !!!! Great job to the developers. |
Awesome, thanks for the feedback :) |
I'm using restic since about two weeks, backing up to B2 (restic v0.8.3). I have about 300GB of data in about 41.000 files stored on a Raspberry Pi 3 B+ which has a 2TB Western Digital USB disk attached. My internet connection is LTE, I have upload speeds about 20-40 Mbit/s. The first 240GB went fine in under a day. As I'm currently rebuilding my whole network and servers, I had to stop the backup at about 240GB. Since then I tried to start it a couple of times. The first 240GB take about 3-4 hours, there no uploads are done, i think it only checks the files and if they exist in the B2 bucket. That's ok for a raspberry i think, as i guess that this is all about calculating hashes which is probably pretty slow on a Pi. As soon as it gets to fresh files where uploads are done, it's really slow. That means about 15 hours for 1GB (the first 240GB took nearly the same time). I checked which files restic has open using lsof. There where a couple of small 3-5MB files, and one large 2,5GB file. restic worked through those small files the last three days, and also the 2,5GB file is done. Now it's working on only large files, each about 3-6GB in size, with the same "speed". This sounds like @richard-scott commented: #1383 (comment) - his backup also was slow when it reached uploads. I guess I could solve it by deleting all snapshots of that server, and starting over like richard-scott did. But as long as my backup is in that slow behaviour, maybe I can give some debugging information? Could a |
Roughly, yes.
This may be caused by several different things:
You can test the following:
Richards observations may be related, but that was with a much older version of restic (0.8.1), we've improved handling B2 a lot since then. If this still is an issue, please open a new issue so we can discuss it. |
@Netdesk Have you tried setting this:
I think the default is something like 5 parallel connections. |
I tried the latest dev version as suggested by @fd0. It was a bit faster (and I like the new output details) and the rest of the 300GB are done now after one day, so I can't really reproduce it. But before (regular 0.8.3) lsof showed ten open files, using the dev version it opened only 2-3 files at the same time. I don't know how much connections to B2 it opened. I did log iostat and vnstat every minute during the backup with the dev version. Gonna check the files and report in a new issue if I find something. Just now I tried a |
Hi, I'm using restic to backup to b2, and it's very painfully slow. A backup I started about 24h ago of 300GB isn't even finished yet. I am using -o b2.connections=200 but I can see at most 4 open connections and they are idle most of the time. I tried hacking a bit in the code and found this. So I have done a few benchmarks. I did these benchmarks under the following conditions:
I tweaked two of the variables in the code linked above and got these results:
It seems that 2 isn't that good of a value for this FileRead parameter. If it's really something that depends, maybe it should be made configurable from the command line. Or maybe it is and I missed it, please tell me. I did some tests on a SSD too but with different conditions, and saw a similar change. Changing it from 2 to 8 changed the backup time from 52 minutes to 19 minutes. Finally, I am posting this here because it happens I'm using b2 and everybody complaining about speed is using b2 as well, so I thought this was related. If someone can do a similar benchmark on another storage service, we could know if this setting really is suboptimal for everyone and not just b2 customers. Please tell me if my reasoning is wrong, or if you have any suggestions to make my backups faster. EDIT: added actual bucket size after the backup |
Does backing up to a local backup repository change anything regarding the time required for backup? |
@blastrock PL here and changing DC to EU increases speed dramatically. US speeds to my systems were ~30 KiB/s (on 20 Mbps uplink). So the target DC is the most important matter. |
A local backup of the same folder from the HDD to that same HDD, with default settings, takes 27s. With FileReadConcurrency set to 16, it takes 19s. The difference is smaller here, and this test also proves that the time to read the files is negligible compared to the time of the whole backup to b2.
Actually, the backup I talk about in my first paragraph was being done in the EU region. I got confused with the accounts and did the benchmark on US though. I can try to run that same benchmark on the EU region, just to have comparable numbers. |
What I don't understand in that regard is that for In what regard does |
I agree, but I don't really know how restic works internally. With SaveBlob=16 and b2.connections=200, I can only see 4-5 connections from restic, and they are mostly idle. So my guess was that the "saving" part was not receiving enough data, and that somehow the "reading" part was waiting for the "saving" part to finish its work instead of just sending more.
Oh, my bad, I thought some part of the cache was shared between repositories, like file modification timestamps. I did start with a new repository for each test. I'll edit my post to clarify that.
I haven't thought about it at all actually ^^ |
I think this is in fact a performance problem in restic. Each file_saver splits a file into chunks and passed them to the blob_saver. A file_saver only continues with the next file after all chunks were uploaded. Storing a blob can either be fast, if the pack is not yet completed or slow if the pack get's finalized and uploaded. Or in other words the file_saver and the blob_saver are not completely decoupled. For small files the effective uploaded parallelism is limited to In my opinion this warrants its own issue, could you please open a new issue and add your observations to it? Could you also try how long it takes to upload an 1GB file containing random data (e.g. |
This is a fresh bucket and fresh upload to B2
Output of
restic version
restic 0.7.3
compiled with go1.9 on freebsd/amd64
How did you run restic exactly?
The text was updated successfully, but these errors were encountered: