New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restic prune: out of memory #1723
Comments
Looking at the backtrace, I guess the core issue lies in goroutine 1: restic/internal/repository/repack.go Line 60 in 6a34e0d
This raises the question: Can we repack more efficiently? For example, loading blobs in sequential bulk operations or maintaining a blob's data with copy-on-write … requires someone with more architectural knowledge than me. |
Yeah, prune needs way too much memory and it can be much more efficient. I just need to find some time to optimize it. Thanks for the report! |
Not sure the best place to share this, but one more data point to add here. I'm disabling restic again after it caused some OOM errors last night. It got to where it was using about 25GB of RAM during a backup on our server:
On a server that has a database using about 20GB of RAM, we've got overhead that we keep around for things like restic, but we just can't keep 30GB of RAM (or more?) free just to do backups. We're still using restic for streaming backups of the DB, but dang, this is just too much memory. |
Uhm, this was during backup? Or prune? |
During backup. We've got prune disabled. Not having prune just means that Backblaze charges us more, so disabling it until the performance/memory problems are fixed is OK. But now we've had to disable the backups too. |
Sigh, sorry about that. |
Well, it is what it is! We know as well as anybody that scaling ain't easy and that we push things sometimes. I'm still loving restic though in general. The streaming backups? Mmmm. kisses fingers. Delicious. |
@mlissner what is your repository size? |
I wasn't in a great mood yesterday, but it's much better today :) |
Hey Alexander, I don't have anything technical to add here, but after your last comment I feel I need to say this: In any and all of the exchanges that I've had with you, you have been nothing but professional, kind and incredibly quick in your responses. I've reported bogus bugs here before (that turned out to be rooted in HW issues, for example) and you have been super supportive always. Any open source project can count themselves lucky to have you as their patron and it clearly shows that you're going above and beyond for restic. It's an amazing tool and we all know it. Rest assured that even though I believe you when you say you had a bad mood yesterday, it certainly didn't show in your remarks. Keep up the awesome work and thanks again for continuing to improve Restic tirelessly. Cheers, Johannes |
I have the same issue as in #1830 with restic check. using temporary cache in /tmp/restic-check-cache-961467159 And |
Sorry, I'm not sure what "repository" means in this context, but I'm guessing you mean the amount of stuff we have on backblaze? If that's right, the size is about 5.2TB in the bucket. If you mean how much are we backing up, then it's somewhere around 2TB with maybe 15M files? Most of this never changes, but we get more every day. I run an archive of legal documents, CourtListener.com. |
@johndoe31415 thanks for the kind words! Sometimes the amount of work still to do can be overwhelming... |
I have the same problem (8GB RAM, repo is 267GB)
Command:
|
Not a permanent solution, but what I did - thanks to restic being so "portable" - was to simply spin up a VM in a public cloud (I used DigitalOcean, but AWS or whatever should do just fine). Give it a bunch of vCPUs and RAM (I think I used 16 cores and 64 GB ram or whatever) and ran restic prune there. It ran for a some hours, and cost me a just few $, but at least it finished. After it had cleaned up old stuff, I am now able to run forget --prune on my regular server. This requires that your backup-server is available from the public cloud of course... |
(Un)fortunately, it isn't... Thanks anyway! |
Hi I am having the same problem even though the server has plenty of memory. I am backing up about 150T and trying to remove a few snapshots. The error and memory utilization show in attached screenshots. Running restic 0.9.3. |
I'm running this:
on a ReadyNAS Duo v1 that only has 512MB RAM. I have a total of 130190 files @ 689.646 GiB (according to the Restic output) stored in B2. A prune takes a while (~32hrs), but completes without issue.
As my Repo is in the cloud, maybe the issue is not with all setups? |
Something very odd is going on here... I can't point to it though. It feels to me |
If anyone would like to spend some time profiling restic and identify the cause of high memory use with their repositories, it would be appreciated, and would help speed up a fix. |
I am willing to do some profiling and testing, if anyone can tell me exactly what they need me to do. |
@Olen thanks for your offer to help. You'll need experience in Go to get any useful results, and the process is iterative and rather complex and hard to describe over text. Unfortunately we're not at a stage where you as a user can be of much help, despite the great intentions :) |
What would be interesting to know is: At which stage in the pruning process does the memory usage get out of hand? |
I'm not sure I understand correctly, you're running These are two different operations, although |
For what it's worth I'm seeing this on a few different servers as well just running a check:
This is on a server with a 200GB repository, 4GB of RAM and currently sitting with 2.3GB free. |
I've just also run into memory problems, but with
It then prints backtraces for each goroutine (which seem suspiciously numerous, grep counts 90 of them, though I guess there's typically a handful of goroutines associated with each actual worker thread). This is on a somewhat older Atom system (4GB RAM, about 2.5G free, swap is mostly full it seems). I was running restic version 0.9.4+ds-1 from Debian testing. Not sure if this is actually the same problem as the OP has, but they seem related enough to add some info. |
These are actually valuable data points: it means we have an easier situation to debug this issue (since |
General memory usage has already been reduced in current master and some more optimizations are on the way.. About the main topic: @johndoe31415 Can you please confirm that the issue is still open with current master? If yes, can you try #2718 if this improves your situation? |
Hey @aawsome -- unfortunately I cannot. The machine which I originally used to reproduce this error has been decommisioned a while back and migrated to a different one with a differently structured repository that never exhibited the described issue. I'm sorry about that. |
Output of
restic version
restic 0.8.3 (v0.8.3-0-g272ccec7)
compiled with go1.10 on linux/amd64
How did you run restic exactly?
RESTIC_PASSWORD=foobar
RESTIC_REPOSITORY=/data/joe/restic
restic prune
What backend/server/service did you use to store the repository?
Direct file access.
Expected behavior
Restic prunes the repo.
Actual behavior
Steps to reproduce the behavior
Get a Debian system (Linux backup 4.9.0-6-amd64 #1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 GNU/Linux) with 4 GiB of memory and a Intel(R) Core(TM) i3-3225 CPU @ 3.30GHz. Then create a repo with ~1.5 TB in size and run restic prune on the server side.
Do you have any idea what may have caused this?
The system on which I run restic has 4 GiB of memory. Sure, more would be better. But it would be even better if restic wouldn't fail for this type of error.
Do you have an idea how to solve the issue?
Sure. Add more memory to the system would be one solution. Or make restic more efficient (and ensure there's no resource leaks).
Did restic help you or made you happy in any way?
Fishing for compliments, eh? Sure, why not. Restic is pretty cool stuff, no doubt about it. I like the client-side encryption, proper use of KDFs, support for ACLs, mounting backups via FUSE. It beats my previous rsync approach dead out of the water. But could it pleeease be a little bit less resource hungry? :-)
The text was updated successfully, but these errors were encountered: