-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using rclone as a backend #1561
Comments
What's been said about the redundancy that will arise with the existing backends, e.g. S3, Swift, etc? |
There hasn't been a decision yet. I don't see the point in duplicating all the work and support all the thousands of cloud services, but for the existing backends I also don't see any point in removing them. The existing backends have the nice property of not requiring the setup of a third-party tool, you can get away with restic alone, which is also desirable. For now this is just an experiment, we'll see how it goes. |
I made a little experiment in the serve-restic branch for rclone. I took the restic/rest-server repo and mashed it into rclone with great force and little subtlety ;-) Run I've tested it and it seems to work - I tried a few operations. It doesn't have any tests. Note I didn't vendor the dependencies either. It isn't ideal as we have the whole listening socket thing to contend with, ie it isn't secure from other applications. It would be relatively easy to pass a password in the environment when it is started and require that. Choosing a random port would probably be necessary also. Thoughts? |
I've just tried it: Needs some polishing ;) I've just pushed a few commits to the
There seems to be an issue with listing files:
And with byte ranges:
From our conversation via email:
|
Hehe! Thanks for giving instructions on how to run the tests.
Not sure what that is about. I'm sure it will become clear though!
I didn't implement byte ranges as I didn't think they were needed. A mistake I now see!
I think that is probably a good idea. Most of the backends rclone supports need to know the length of the file in advance of the upload, so that will make rclone's life easier. It will also change the format of the POST from chunked transfer encoding to straight forward POST (which probably makes no difference). It will allow rclone to check that the right amount of data arrived in the POST which is good.
I was going to ask you about that - whether restic did retries or not. rclone will still do low level retries when it tries to open the objects for read, doing listings etc, but I'll make it so it doesn't buffer or retry the data. There is one other thing that I've been thinking about... I'd quite like to cache the listings that restic does. Rclone has a vfs layer which will do this - it will make list a directory, fetch objects much more efficient as rclone won't have to look up each object again. The cost is a small amount of memory. The VFS layer will make implementing Range requests trivial too. If in the future restic uses the rclone backends directly then I'd expect restic to cache the listings itself and we could stop using the VFS layer. I'll have a go and create v2 in the next day or two and I'll post an update here. |
Have you considered embedding rclone as a library into restic? As a home user I am unlikely to benefit from rclone server but will it will be more difficult for me to install, configure and maintain two pieces of software. I do understand how separate rclone server can be attractive in enterprise environment, so guess it really depends on who your target audience is. |
@ifedorenko wrote:
I'm sure we'll get to that point eventually - we are just experimenting at the moment :-) |
Yes, but at the moment, restic doesn't support all the services rclone does, which even has a nice dialogue system for configuration. From my perspective it's also about using the resources (mainly developer time) wisely, using rclone for accessing services we don't support and providing a nice way of configuration is great, even if it comes with the additional work of having to install rclone. :) |
@fd0 wrote
I couldn't find that branch on the restic repo? Am I looking in the wrong place? |
Oh, sorry, I've already merged the patches into master in #1569, so you can just use the master branch. |
I see what was happening with my previous test... I was running the command you gave which was attempting to start I've settled on this as a test routine Run this in one terminal window
And run this to test
Does that seem sensible? I'll attempt to make that pass!
I've realised I'm going to need restic to supply |
I've updated the branch with some code which now passes all the tests :-) I've also sent a PR which would have saved me loads of time had it been in place! |
Awesome, I just had a quick look and it works very well! Thank you! So, what's the next step? From my point of view it'd be best to try out the |
Or maybe |
I've played around with gRPC over yamux today, works quite well. The pros are:
Cons:
We could ditch yamux and use HTTP2 (without TLS) instead, we could then use either HTTP directly or use gRPC again. I wonder if there's a simpler solution. Hm. Not very satisfying. |
From what I understand, Yamux is inspired by SPDY (the predecessor of HTTP2, IIRC) but incompatible with it. What would be the advantage of using a custom protocol over a standard one? It seems to me HTTP2 would be a lot of advantages in terms of interoperability, design, portability and security (why no TLS?). Could you expand on why you are hesitant in using it? Regarding latency, Google is working on QUIC to resolve those problems (LWN has a good series about it, although APNIC has a caveat as well), so there's a standard way out there as well. |
I've played around with gRPC over HTTP2 via stdin/stdout (without TLS) today, also works quite well. Although I had to fiddle with it a bit to get HTTP2 over stdin/stdout working, but that's done now. It requires Also, saturating a high-latency link is better when using HTTP2 (compared to yamux) |
@ncw do you have any preferences or experiences in regards to an RPC or stream muxing framework? |
I've been working on integrating the REST API server into rclone and I've now got something I'm happy with! I had to make a common HTTP serving layer for the two existing http servers in rclone (!) before I added another one! I ended up re-writing the restic http server almost from scratch. You can find the code in the If you have the time I'd appreciate some feedback on:
I re-wrote the mux from first principles and managed to make it almost policy free. The only bit of policy is here where config objects can be overwritten but nothing else can. That could maybe be replaced with a parameter The REST API for restic seems to do the job well. The only thing that would make rclone's life easier would be to have the Content-Length on the POST a new object call. I looked at patching restic to add that but my conclusion was that it was more difficult that I first thought as the backend Save() call doesn't know the length of the object it is POSTing. A hint here would be helpful :-) The consequences of rclone not knowing the length of the thing it is saving is that on some remotes (any which don't support stream upload), rclone will have to spool it to memory/disk before uploading it. I'd quite like to merge the @fd0 wrote
Servicng HTTP2 over stdin/stdout sounds very interesting... Do you have some code I could look at so I can implement that in rclone? The next evolution for I don't have a strong opinion on gRPC having not used it in earnest. It certanly looks industrial strength though :-) That said, do we need it if HTTP2 is doing the job for us? As I understand it HTTP2 will multiplex connections over a single socket so I would have thought it would work quite well without gRPC and we could get the existing REST backend to use it? |
@ncw I'm not very familiar with gRPC either, but from what I gather, it's basically HTTP2 + protobufs + magic sauce (wikipedia specifically says "authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts")... "REST" doesn't say much: what's the actual serialization format? JSON? XML-RPC? how are actual RPC calls made? gRPC has the advantage of standardizing all of this with protobufs and a predefined way of calling functions and so on. good job! |
Hey Nick, that's great news! I'll have a look now. Could you maybe open a Pull Request in the rclone repo so we can attach comments to the code and iterate on it (if necessary)? Or what's your preferred way of communication here? Shall I send you patches? ;) Random issues I see while scrolling through the code:
During restic development, we had several different repository layouts, that is described here: https://restic.readthedocs.io/en/latest/100_references.html#repository-layout We've now settled on the default layout, which means the files in At the moment, rclone will put those files into This is one of the reasons I'm not really a fan of implementing the REST backend in rclone, but rather build something else, e.g. based on gRPC (see below). But now that we have it, we should fix it and get it working. This is the only critical issue I can see. I'd like to move to using the default repo layout everywhere, that's the only sensible way IMHO. What needs to happen from my point of view is that rclone needs to implement (or copy) the so-called When we add a backend which uses e.g. gRPC via HTTP2 via stdin/stdout, there are user interface design issues to consider. How would the new backend be called? How would users specify which command is to be run (like So much for now :) |
@fd0 sorry for another long delay - swapping badly at the moment ;-)
Patches - how quaint! I'll open a PR with the next iteration and we can see how that works! You can send PRs against a branch too (though if you want to do that I'll need to stop rebasing the branch!)
I've done that :-)
Yes you can set lots of exiting stuff for the server! Not keen on forcing the user to set a password, but suggesting strongly in the docs is a good idea! I've put a note in the docs about that.
Err, I used the v2 REST API 7e6bfda which was only released in v0.8.2 I think.
It is more of a limitation of the rclone backend. It tells files and directories apart as to whether they end in a / or not.
Done!
Sure! Happy to adjust them when they break!
:-)
Ah. I missed that bit... Easy to fix though
rclone will create intermediate directories as it goes along Ideally I'd like to merge this in time for the next rclone release which should be in a couple of weeks... I put the next version in the serve-restic branch and I made a pull request this time for easier commenting: rclone/rclone#2116 |
@ifedorenko wrote:
Hmm, possibly though draining a non HTTP reader (eg a disk file) doesn't make sense. I don't think this will happen in normal operation though and if it does rclone prints an error and we have to remake the persistent http connection which shouldn't be a big deal. I think this is more of a cosmetic issue for the tests. |
Yeah, was thinking about this too. Individual HTTP-based backends should return readers that drain streams on close. Also, I just realized rclone backend is using http2, which supports cancellation of in-progress streams via RST_STREAM frames. Wonder if Go http2 client/server implementations actually take advantage of that. |
Ah, I used to think that this is the right way, but now I think we should allow this and not drain the reader completely in // Close the previous response's body. But
// read at least some of the body so if it's
// small the underlying TCP connection will be
// re-used. No need to check for errors: if it
// fails, the Transport won't reuse it anyway.
const maxBodySlurpSize = 2 << 10
if resp.ContentLength == -1 || resp.ContentLength <= maxBodySlurpSize {
io.CopyN(ioutil.Discard, resp.Body, maxBodySlurpSize)
} In HTTP 1.1, closing a connection is the only reliable way to to tell the server to stop sending data, so depending on the amount of data it's oftentimes more efficient to close the connection and create a new one instead of loading megabytes of data just to be able to reuse the connection. So, I've fixed the test to first drain the reader and then return the custom error, that was really easy. In retrospect it's also easy to see what's happening ;) --- a/internal/backend/test/tests.go
+++ b/internal/backend/test/tests.go
@@ -147,6 +147,10 @@ func (s *Suite) TestLoad(t *testing.T) {
}
err = b.Load(context.TODO(), handle, 0, 0, func(rd io.Reader) error {
+ _, err := io.Copy(ioutil.Discard, rd)
+ if err != nil {
+ t.Fatal(err)
+ }
return errors.Errorf("deliberate error")
})
if err == nil { |
I found the cause for the strange message printed on server close, I left a call to |
Thanks for the fix :-) I had a go with your draining fix and that works fine too :-) I'm just going to have a quick whizz through the rclone-backend branch and comment on anything I see there. |
So, the code is basically done, it just needs some docs. I could use some help testing the backend (in the branch Also, I've amended the REST protocol and the REST backend, so that the base URL always ends in a slash. |
Docs are done, please give it a try! Ping @mholt |
Hi @fd0! thank you very much for putting in the work to integrate I have an issue with a restic repository on Google Drive. Creating a new repository works fine, but when I try to access an existing repository (with around 7 TiB of data), restic fails with an error:
The is a local repository that has been synced to Google Drive. Using a custom restic build I can access the repository just fine. |
@fd0 Thanks for the ping! I've been getting the emails about this and am super excited to try it out! Just came at a bad time for me as I'm really busy wrapping up ACMEv2 + wildcard support in Caddy right now. But I'll get back to this ASAP as I am anxious to see how to use it, especially the proxying to avoid giving cloud credentials to backup clients, as we talked about in the other issue. Keep up the great work! |
You can get rclone to log much more stuff for debugging
before the restic run. It might be that the timeouts in restic for the rest backend are too short - what do you think @fd0? Also note that drive can be really slow! Have you got your own credentials or are you using rclone's. If the latter then I recommend making your own. |
Ah, that's a special timeout: restic starts rclone in the foreground process group (so that things like password prompts work), and in the background it tries to establish the HTTP2 connection. Once that is done, restic moves rclone into the background. At the moment, the timeout for the first HTTP request to complete is quite low (5s), I'll update this to 60s. After 60s, the process should have booted.
Probably, but not this one: This one is restic's internal timeout ;) I'll add hints for debugging rclone to the docs. Being able to configure rclone indirectly via inherited environment variables is awesome :) @mathiasnagler can you retry please? |
Thank your for your feedback! @ncw I am using my own credentials. Using the build from the commit I linked above, restic can fully saturate my upload bandwidth with those credentials (about 40Mbit/s). @fd0 Will do asap. Update: Unfortunately, even with 60 seconds the same issue occurs. I think this is caused by the repository size / amount of files in the repository. I even increased the timeout to 180 seconds, but still no luck. What actually happens during the bootup period? rclone debug output suggests, that a ls is executed:
Can I manually check how long this takes by running Update: I increased the timeout to 1800 seconds (30 minutes) to see how long it will take. After 7 minutes, restic prompted for the repository password. From there on, listing the snapshots worked as expected. |
Yes that should do it. Note that listing in drive is relatively expensive :-( |
I'm interested in testing (using Google Drive as a backend), but I'll need precompiled win64 editions of both restic and rclone to do so.. Currently I'm getting random timeouts on restic check --read-data runs with a few repositories using the Drive FS client (as a paying user for GSuite for Business) |
|
Oh, that's my fault. I thought it'd be a good idea (as a test when rclone is ready to accept HTTP requests) to issue an HTTP I've changed the code so it'll try to request a random file name, which does not exist. We just want to make sure rclone is ready to respond to HTTP requests. |
Startup is quick now. Thanks for the update! I started a new backup to test some more. @fd0 Upon further inspection, I have another issue. To test the new feature I created a fresh repository on gdrive (Google Drive). Creating the repository worked as expected and I can see the files and folders using gdrive webinterface.
restic reports that the backup is running and seems to have backed up some amount of data:
The issue is that the backup will never actually continue. I can observe that no data leaves the machine. There is no outgoing traffic at all. Using
I think those are fine because the folders will be created when the first backup happens and not during repo creation, but I am not entirely sure.
Is anybody else able to reproduce this? |
@fd0 Any chance for a restic beta with rclone support, so I can do some testing? |
Another comment - not directly related to restic (but it might be interesting) and could possibly lead to issues/support: As I am using G Suite for business I am using a service account with rclone, and in that regard it is extremely important to remember to use rclones --drive-impersonate option if you want to be able to see the files that restic/rclone uploads through the normal web-UI in a browser. Its described in details here: https://rclone.org/drive/#use-case-google-apps-g-suite-account-and-individual-drive Without using the --drive-impersonate option with rclone, all files are invisible to my regular user - however this might be considered a security inhancement as it makes it impossible to damage or delete restic repos through other means |
@mathiasnagler I'll have a look @naffit I can upload a build later today, but it's really easy for you to build restic yourself once you've installed Go >= 1.8:
Then you have a working restic binary in the current directory. |
@fd0 Sorry if I'm blind, but where are the docs? 😅 I'm ready to try this out. Got it cloned down and everything. |
There's documentation in the manual in This helps you getting started with rclone as run by restic. You can also just call Or did you expect something else? |
That's perfect, thanks! |
This issue is there to discuss prototype implementations which enable restic to use rclone as a backend. This makes a lot of sense since rclone already has all the cloud backends implemented (including user configuration and so on). I've had a very productive discussion with @ncw, who is the author of rclone, about how to make restic use rclone to access data, and we've decided to continue our discussion in a public issue here on GitHub.
My idea was to add a new backend to restic which talks to a second process via stdin/stdout, using a protocol that is still to be defined. I read nice things about https://github.com/hashicorp/yamux, which allows mixing several streams in parallel over a single "connection" (very similar to what HTTP2 does). We'd need to define a protocol for accessing files on top of that, with just the basic operations that restic needs.
This backend can then be implemented in rclone, so that restic just starts
rclone serve restic-backend
(or whatever command line we need), talks to rclone, and rclone takes care of saving/loading/removing the files somewhere.Taking this a step further, we could also implement a server side for this backend in restic itself, which can then be started e.g. via SSH on a remote server, so restic runs
ssh user@host restic serve backend
, and use that connection, which will be way more efficient than sftp. If we make the protocol extendable, we could also add features such as remote repacking (if supported by the "server"), so that data repacked during prune does not need to be downloaded and re-uploaded.The protocol could then be also implemented in other programs, so we can use them as "plugins" to access data stored anywhere.
A slightly different approach would be to speak HTTP2 over stdin/stdout with a process like rclone, or use something like https://github.com/hashicorp/go-plugin and implement the backend in Go as a plugin.
So, this issue is to advanced the discussion further and play with a few sample implementations, so we can get a feel for what works best.
The text was updated successfully, but these errors were encountered: