New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restores could be even faster #2074
Comments
160mbit/sec does indeed look slow. I was getting close to 200Mbps on a puny macbook pro downloading from onedrive over (granted fast) 1Gbps ftth connection. I can't really say anything specific without access to your systems and repository, but here are few things I'd to do narrow down the problem, in no particular order:
|
Hi there! Thanks for the reply. This was interesting to dig into tl;dr: bumping up s3 connections is the key, but the performance boosts it provides are temporary, oddly.
The bandwidth number are in kbps and were taken at 3 second intervals over 10 minutes. For the restic runs, we started collecting data after restic had already begun to actually restore files (ie, it excludes startup time and the time that restic spends at the beginning of the run recreating the directory hierarchy. During all runs, CPU utilization on the instance did not exceed 25%. We get a slight bump from increasing the workerCount, but the S3 concurrency looks like where the real win is. But while it starts out strong (approaching rclone speed, at times!), rates drop abruptly and stay down for the rest of the run. restic also throws errors that look like "ignoring error for [path redacted]: not enough cache capacity: requested 2148380, available 872640" that it does not throw with lower s3 concurrency. As you can see, the rclone performance starts high and stays high, so it's not a situation where writes are going to the instance's buffer cache and then stalling when they get flushed to disk. The nvme array is faster than the network pipe, throughput-wise. Given the above, it looks like bumping up the S3 concurrency is what it will take to get reasonable rates here, but need to figure out why the performance is dropping (and whether it's related to the cache errors or not). If making a fast i3 instance and some s3 space and bandwidth available for your testing would be helpful, let me know, happy to sponsor that. |
As for |
Hi there! Whoops, I'm sorry. The scale in the graph is KB/sec not kbit/sec. So 1,012,995 in the first row is 8.1gbit/sec. I understood that files were restored sequentially, but didn't understand that there is no parallelism for pack retrieval for a single file. That's definitely a bit of a stumbling block for backups with large files, since that becomes your limiting factor. It would be fantastic to have some parallelism here for that reason. Sampling 10 random buckets, our maximum pack size is just below 12MB and our average pack size is 4.3MB. Perhaps with more workers and S3 connections, we are exceeding the packCacheCapacity of (workerCount + 5) * averagePackSize. I'll try upping that and see if the errors go away. |
Cache size is calculated based on 5MB pack file size and |
No cache errors with averagePackSize set to 12. If there's any other info we can provide that would be helpful, let me know! Thanks again. |
It does look like we are losing parallelism as the restore progresses and it doesn't look to be related to large files. I'll put together a test case and report back. |
Hi there! I attempted to reproduce the performance drop offs that I was seeing with my production file mix with three different artificial file mixes. For all tests, I used c0572ca with S3 connections increased to 128, workerCount increased to 128, filesWriterCount increased to 128 and averagePackSize increase to 12 * 1024 * 1024. All tests used files with random data, to avoid any impacts from deduplication. For the first test, I created and backed up 4,000 100MB files, split evenly across 100 directories (~400GB total). Backups and restores were run from striped nvme volumes on a i3.8xlarge instance. The backup bucket was location in the same region as the instance (us-west-1). With this file mix, I saw average speeds of 9.7gbit/sec (!) with no loss of parallelism or speed across the full restore. These numbers are on par or above the rclone numbers and are essentially line speed, which is fantastic. I then subsequently created and backed up 400,000 1MB files, split evenly across 100 directories (again, ~400GB total). Same (excellent) results as above. Finally, I created 40 directories with 1 10GB file per directory. Here, things got interesting. I expected that this restore would be slightly slower, since restic would only be able to do 40 simultaneous restores with 40 connections to S3. Instead, though, while Restic opens up all 40 files for writes and writes to all 40 files simultaneously, it only keeps a single TCP connection to S3 open at a time, not 40. Let me know what stats or instrumentation you'd like to see. |
Can you confirm there were 128 S3 connections during "fast" tests? |
Yes, there were. |
Curious... Large file support isn't high on my priority list to be honest, but I may find some time to look at this in the next few of weeks. If anyone wants to dig into this before I do, please let me know. |
Did some more testing and it actually looks like file size is a red herring, it's file counts that is driving the AWS connection count. With 128 10 MB files in 4 directories, restic only opens 6 connections to AWS, even though it's writing to all 128 files. Wtih 512 10 MB files in 4 directories, restic opens 18 connections over its lifetime, even though it has 128 files open at a time. With 5,120 10 MB files in 4 directories, restic opens only 75 connections to AWS over its lifetime, again holding 128 files open at a time. Odd! |
I'd be really surprised if Go S3 client didn't pool&reuse http connections. Most likely there is no one-to-one correlation between number of concurrent workers and open TCP sockets. So, for example, if restore is slow processing downloaded packs for whatever reason, the same S3 connection will be shared by multiple workers. |
There are two properties of the current concurrent restorer that are responsible for the bulk of the implementation complexity and very likely cause the slowdown reported here:
Implementation will be much simpler, use less memory and very likely be faster in many cases if we agree to write file blobs in any order and to allow any number of in-progress files. The downside, it won't be possible to tell how much data was already restored in any given file until the end of restore. Which may be confusing, especially if restore crashes or gets killed. So you may see 10GB file on filesystem, which in reality has only few bytes written at the end of the file. @fd0 do you think it's worth improving current start-to-finish restore? Personally, I am ready to admit it was over engineering on my part and can provide simpler out-of-order implementation if you agree. |
I don't remember if Restic supports restoring to standard output. If it does, you'll obviously need to keep start-to-finish restore for it (possibly as a special case). |
@ifedorenko I personally am in favor of simplification, generally, especially if it comes with performance boosts. I'm trying to understand the tradeoffs though:
If we're not talking thousands of concurrently-written files here, maybe a map of file name to bytes written could be used? Obviously won't make the info available to the file system, but, for progress reporting, could still be done, no? (And to resume a restore that was interrupted, perhaps just checking the blobs and offsets of the file for non-null bytes, perhaps? I dunno.)
I don't think it does -- not sure how that would work anyway, since restores are multiple files, you'd need to encode them somehow and separate them somehow I think. |
The simplest implementation is to open and close files to write individual blobs. If this proves to be too slow, then we'll have to find a way to keep files open for multiple blob writes, for example by caching open file handles and ordering pack downloads to favour already open files. Restore already tracks what blobs were written to what files and what are still pending. I don't expect this part to change much. Progress is tracked in a separate data structure and I don't expect that to change either.
Resume needs to verify checksums of files on disk to decide what blobs still need to be restored. I believe this is true regardless whether restore is sequential or out of order. For example, if resume recovers from power failure, I don't think it can assume all file blocks were flushed to the disks before the power cut, i.e. files can have gaps or partially written blocks. |
Absolutely. I am traveling this week but will test as soon as I can and report back. |
@ifedorenko I tried this PR and it succssfully creates the directory structure, but then all file restores fail with errors like: Load(<data/9349b00f78>, 3172070, 0) returned error, retrying after 12.182749645s: EOF master restores fine. My restore command is: /usr/local/bin/restic.outoforder -r s3:s3.amazonaws.com/[redacted] -p /root/.restic_pass restore [snapshotid] -t . Let me know what additional info I can provide to help debug. |
How many concurrent S3 download requests do you allow? If it's 128, can you limit it to 32 (which we know works)? Semi-related... do you know how many index files does your real repository has? Trying to estimate how much memory restorer needs. |
I didn't specify a connection limit, so I assume it was defaulting to 5. I get the same errors with -o s3.connections=2 and -o s3.connections=1. I currently have 85 index blobs in the index/ folder. They are 745MB in total size. |
Hmm. I'll have another look when I get to my computer later today. Btw, "returned error, retrying" is a warning, not an error, so it may be related to the restore failure or it may be just a red herring. |
Not sure how I missed it... the restorer didn't fully read all pack files from the backend in most cases. Should be fixed now. @pmkane can you try your test again? |
I’ll give it a whirl!
|
Confirmed that files successfully restore with this fix. Testing performance now. |
Unfortunately it looks like it's slower than master. All tests run on the i3.8xlarge instance described above. With a workerCount of 8 (the default), the out of order branch restored at 86mbit/sec. With a workerCount bumped to 32, it did a bit better -- averaging at 160mbit/sec. CPU utilization with this branch is significantly higher than in master, but it is not maxing out the instance CPU in either case. Anecdotally, the restore UI makes it seem like something is "sticking", almost like it's competing against itself for a lock somewhere. Happy to provide more details, profiling or a test instance to replicate. |
Just to confirm, you had both restorer and s3 backend worker count set to 32, right? Couple of thoughts that may explain the behaviour your observed:
|
Hey there! That's right. workerCount was set to 8 or 32 in internal/restorer/filerestorer.go and -o s3.connections=32 was passed via the restic cli in both cases. |
(used 32 workers and 32 s3 connections across all runs w/ ead78b3) |
Do you know if I need to do anything special to access that bucket? I never used public buckets before, so not sure if I am doing anything wrong or by user does not have access. (I can access my team's buckets just fine, so I know my system can access s3 in general) |
Hey @ifedorenko, hang on, let me get someplace unprivileged and try. |
Sorry about that, I had applied a public bucket policy, but had not updated the objects in the bucket itself. You should be able to access it now. Note that you'll need to use --no-lock, as I've only granted read permissions. |
Yup. I can access the repo now. Will play with it later tonight. |
And in case it's helpful/easier to test, we see similar performance characteristics when doing restores of the same files to/from a repo on fast SSD, taking S3 out of the equation. |
@pmkane I can reproduce the problem locally and don't need access to that bucket any more. this was very useful, thank you. |
@ifedorenko, fantastic. I will delete the bucket. |
@pmkane please give latest out-order-restore branch another try when you have time. I didn't have time to test this in ec2, but on my macbook restore seems to be limited by disk write speed and matches rclone now. |
@ifedorenko, that sounds very promising. I'm firing up a test now. |
@ifedorenko, bingo. Restored 133GB worth of blobs representing a mixture of file sizes, the largest being 78GB, in just under 16 minutes. Previously, this restore would have taken the better part of a day. I suspect we can get this faster still by playing with the number of restoreWorkers, but it's plenty fast as it stands. Thanks for your hard work on this! |
And for posterity: our restore performance doubles from 8->16 and again from 16->32 restore workers. 32->64 is only good for a ~50% bump on top of 32, at which point we are restoring at around 3gbit/sec. Nearly on par with rclone. I know that there is a desire to minimize the amount of configuration required to extract the best performance from restic, but this one is a big enough jump, especially for users with large file sets, that it would be nice to be able to specify the number of workers at runtime. |
Not sure why restore still can't get full wire speed. There is redundant blob hash check you can comment out to see if it's responsible. |
I'll give it a try. |
Hi all, I ran some tests using the latest pull request #2195 and the performance continues to improve! (20k small files, 2x70G files, 170G total) 8w_32c
32w_32c
64w_32c
64w_64c
Not sure about the performance drop from 32w to 64w (tested several times and seems legit). I attach some graphs during the process, seems that there is some degradation or limit, should be like this? For example, with 64 workers the process starts at 6gbit/sec but after that drops to less than 1gbit/sec until the end of the process (which I think that correspond to the time to process those big files). The first screenshot is with 32w, 32c and the second with 64w, 32c. I also agree with @pmkane , it would be useful to change the worker number from the command line. It would be very useful for disaster recovery scenarios when you want to restore your data as fast as possible. There is a pull request about this by my coworker #2178 Anyways I'm really impressed with the improvements done! thanks a lot @ifedorenko |
++. Thank you @ifedorenko, this is game changing stuff for restic. |
Thank you for the detailed report @robvalca. Any chance you can provide a test repository I can either in AWS (or GCP or Azure) or locally? I did not see large file restore speed drop in my tests and would like to understand what's going on there. |
@pmkane I too would like to configure these things at run time. My suggestion was to make them |
hi @ifedorenko, I've created a public repo with junk data at |
@robvalca I am unable to reproduce the problem using your test repository. In AWS (us-east-2, s3 repo, @pmkane interestingly, I can't confirm your observations either. As I mentioned above, I see 0.68 GB/s restore speed using latest |
@ifedorenko I agree wholeheartedly re: diminishing returns. It's plenty fast enough for us as it is. |
@ifedorenko interesting, I will investigate this at our side. Anyways It's fast enough for us too, thanks a lot for your effort. |
Curious about the status of getting this merged in. This branch, combined with @cbane's work on prune speed up, make restic usable for extremely large backups. |
Soon... 👀 |
Closing this now that #2195 has been merged. Feel perfectly free to reopen it if the specifics that this issue is about has not been resolved. If there's still improvements to be made that are not of the type discussed in this issue, please open a new issue. Thanks! |
Thanks to the merge of PR1719, restores in restic are way faster.
They could be faster still, however.
For test purposes, I restored from an restic AWS S3 bucket in the same region (us-west-1) as a dedicated i3.8xlarge EC2 instance. The restore went against the 4x1.9TB nvme in the instance, striped RAID-0 via LVM with an xfs filesystem underneath. The instance has a theoretical 10gbit/sec of bandwidth to S3.
With workerCount in filerestorer.go bumped to 32 (from the compiled-in limit of 8), restic restores a mix of 228k files with a median file size of 8KB and a maximum file size of 364GB at an average of 160mbit/sec.
By comparison, rclone with --transfers=32 moves data from the same bucket at 5636 mbit/sec, more than 30x's faster.
It's not an apples to apples comparison. Restic data blobs are 4096KB-ish in size, not 8KB, and opening/closing files certainly has some overhea. But it's still a big enough difference that it likely points to a bottleneck in restic.
I'm happy to test things, provide instrumentation or help in any other way!
The text was updated successfully, but these errors were encountered: