New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage Backend: Amazon Cloud Drive #212
Comments
Just thought the same thing. I may look at this once #21 is in place - uploading without compression seems like a waste of bandwidth. |
That depends on your use case ;) |
There’s some proof of concept code I found: http://sprunge.us/fdQF — it requires that an oauth token is in /tmp/token.json, but seems to work for me. Motivated people could turn that into a clean backend for ACD :). |
+1 from me :) |
+1 too |
I'm currently reworking the interface to the backends, this includes a radical simplification. This is basically done, but not yet merged. For the plan, see #383, the PR is #395. Afterwards it will be much easier to implement new backends. Before implementing many new backends, I'd like to have a list of rules that services we write backends for must fulfill, this may include that a test instance of the service must be available that we can run the integration tests against. Do you by chance know whether there is a test service for ACD we can use for tests? |
As far as I know, there is no test instance for ACD. No mention of such a thing here : https://developer.amazon.com/public/apis/experience/cloud-drive/content/restful-api But the https://github.com/ncw/rclone project already did an ACD backend in go. It seems to be fairly reusable as demonstrated by the proof of concept shared by @stapelberg . |
Actually, looking at the revised interface, it would be reasonably easy to do a full wrapper for rclone filesystems. Maybe that way separate implementations isn't needed? |
I don't know what @fd0 vision for Restic future is, but it would seem logical to focus the project on the backup intelligence instead of re-implementing a ton of remote filesystems one by one. Besides both project licenses are compatible. @klauspost was your idea to create a wrapper around rclone/fs/fs.go ? Is it doable without being tightly coupled with the internal logic of rclone ? |
Hm, interesting idea, I have to think about it. Not having to implement all the backends by ourselves looks like a good idea, on the other side (at least at the moment) I must admit that I don't like the thought of a tight coupling between restic and rcclone, as this introduces a dependency that we can't control... I envision for restic that it should be easy to configure and use with a variety of suitable backends. This includes (in my opinion) only one place for configuration e.g. of the backends. Maybe that's possible with rcclone or at least part of their code. The interface looks suitable to be used with restic. |
I pledged a 5$ bounty for this feature. Some thoughts:
|
in case the priority of this FR depends on the popular vote: +1 |
👍 |
1 similar comment
👍 |
Yes please 👍 |
How about adding some more bounties to this feature? See: https://www.bountysource.com/issues/23684796-storage-backend-amazon-cloud-drive |
The bounty is now 35 USD: https://www.bountysource.com/issues/23684796-storage-backend-amazon-cloud-drive |
Hey, thanks for your interest in restic in general and this backend in particular. Just to give you a heads-up what's the blocker here: I'm not sure how to handle third-party web services. For the How can we run CI tests for backend implementations that require a third-party service? Is there e.g. a test service for ACD we could use? Or maybe just take well-tested code from other projects such as rclone? |
One solution might be to register an account with Amazon, whitelist it for the Cloud Drive API and then use that for the CI tests? The downside is that such a test depends on Cloud Drive being available, but I guess we can wait for an hour or so occasionally before merging a PR? :) |
That's the only solution I can imaging right now that allows us to run the tests against a live service (and that's desirable in my opinion). When we add more backends for other services the following will happen:
Did I forget anything? |
Your list looks good. There are of course more effects, but I’m not sure whether they are in scope for the question you’re trying to answer:
I think a simple way to take care of this requirement is to use different directories for each test invocation. Sending requests in parallel is usually not an issue with these services, and the different directories make sure the tests don’t clash. |
What I meant was more of a question how many parallel connections a service accepts. For most web-based services this won't be limited (at least concerning the number of connections we require), but this may not be the case for other, more obscure services. |
When a service limits the number of connections so aggressively that our testing is impacted, we could ask the service owner for an exception or rate-limit on our end as well. As a last resort, we could disable the tests for the backend in question or remove that backend altogether. But, I suggest we cross that bridge when we get there :). |
Thanks for describing the process in such great detail, that is already very similar to what I had in mind. I'm wondering: Why is the webserver needed at all? This process works for a "workstation" type of machine, but not on a server (where there is not browser). The workflow used by rclone is described here: http://rclone.org/remote_setup/ I don't know why we need a webserver for this, but I haven't implemented an oauth-based login workflow yet. We'll also need a config file to store the token configured for the remote in, that's also not yet done. |
From my understanding, which is limited, the oauth data is provided to the user using the GET data in a redirect. Have a look at the URL in my screenshot of the browser. That was put there by Amazon. After I hit sign-in on my amazon cloud drive, it redirected me, immediately, to that 127.0.0.1 URL. Perhaps that is the only way to get this data. This is likely the case, because rclone implemented a webserver instead of picking another simpler solution. When I implemented oauth before, this seemed to be the implication. If I am correct, then it follows that you must run your own webserver to provide a page to redirect to amazon, and a page to handle the redirect from amazon to do this, and this must be accessed through a web-browser. As for config file, I think all we need is a file that's in a default location (~/.restic.conf) but can be configured via a flag or environment variable. I think this is a bit dirty, but it's only viable solution that is transparent to the average user, but powerful for those who wish to do it "their way" |
That sounds plausible. Let me think about a strategy here, this may take some time. We'll need to:
Anything else I'm missing here? |
I think you got the big stuff outlined there. Would you want to move all current backends into a single abstraction that supports this, or would this whole system become a "cloud" backend in the current sense of a backend (which itself is configured through special restic commands)? Each instance of a cloud backend (google drive, onedrive, amazon cloud drive, S3?) has the following components:
and maybe some other stuff I'm missing The current abstraction, from my quick read, only relates to the last thing. I think this is a pretty smart way of handling backends, if you're looking to revamp it a bit. The other option is to simply, as I said, implement a "cloud" backend which does all of these things and rolls all the different providers together under it's umbrella. |
|
Here's some background in regards to embedding a client secret in open source applications: http://stackoverflow.com/a/28109307 As far as I understand the problem: You're not allowed to embed a client secret in an open source application. rclone employs some obfuscation to hide what they're embedding. I doubt that embedding a static client id/secret in restic's source code is a good idea. On the other hand, having the user register an application themselves is complicated. This article describes how to do oauth2 with Go: https://jacobmartins.com/2016/02/29/getting-started-with-oauth2-in-go/ |
There is no real solution, it is a broken concept to assume that any client can keep a secret. However, if you consider what the client secret contains, it is not that important. The only real thing it allows is for Amazon (and others) to be able to identify a specific client, nothing more. It does not grant any special access - your tokens are used for that. Sure a publicly available "client secret" can make other application identify themselves as restic, but other than risk that "restic" will be banned (or more likely rate limited) as a client, there is not much risk at exposing the client "secret". It will never put any user data in jeopardy. |
The problem here is that somebody needs to register the clientID, for example me. If I'm using my normal Amazon account (or even worse, my Google account), and "violate" the TOS for the service by publishing the client secret, they can terminate my account. That's not something I'm going to risk. Another problem is that once the client secret changes (or is revoked), we're stuck with older versions of restic e.g. in Debian stable which are unable to communicate with the service because of a hardcoded (and now invalid) client secret. This is the case even if access to the service is restored shortly after, but the client secret has changed. I've thought about possible solutions and found only two:
Currently, I'm in favor of the second option, we need a UI for the oauth token thing anyway. What do you think? |
I know that Nick has had some correspondence with Amazon, since rclone was being rate limited due to many users. It is however my impression (from memory) that they were quite forthcoming and encouraged OS development, and have made exemptions for his client. So I guess my advise would be to contact them and see how things go from there. In the overall picture I don't think they would mind the business coming from restic users. |
Interesting idea, do you have any hint on who to contact at Amazon? For Microsoft OneDrive he said that he did not contact anyone: rclone/rclone#372 |
I know that @breunigs had bad luck with his amazon cloud drive duplicity backend — they wouldn’t give him any rate limit exemptions AFAIK. |
I have only read the last few comments, so please forgive me if this info is not needed:
Also, a final word of advice: read through rclone's workarounds for Amazon Drive. The API contains a lot of undocumented "eventual consistency" gotchas. It even goes out of its way to cache an outdated response it gave you, so that you need to wait even longer if you were too hasty to begin with. This is on top of it reporting errors when there are none, one just needs to wait. HTH, |
Thanks for the information! |
Just throwing something out there: What is we remove all (but local and REST) backends from restic and stick them into restic/rest-server? This allows restic to focus on doing backups properly and filesystem implementations are done in the rest-server. This doesn't solve the testing problem, but will certainly help keep the restic source clean/focussed and it is easier to make API changes inside restic. |
Thanks for the suggestion. Unfortunately I don't like it at all, in my opinion this approach (adding an intermediate layer including a new transport via HTTP) will lead to even more problems. The backend API interface was stable for a long time, then changed recently, and will be stable again. The interface is already rather small. We should try to get backends into restic (including proper CI tests) as soon as possible, that's IMHO the only way to make sure they work. In case of the Amazon ACD backend, we need to answer the outstanding questions first. |
The Amazon Developer Guide for Amazon Drive (what's it called these days) states that:
I feel that Amazon Drive is not the right platform for securely storing encrypted backups. |
Interesting. This must be a new addition, as it definitely was not the case when ACD support was added to Arq. Seems ACD is not a real storage option after all. |
Indeed an addition within the last year, wasn't listed one year ago: http://web.archive.org/web/20160322034250/https://developer.amazon.com/public/apis/experience/cloud-drive/content/developer-guide |
Amazon has since clarified this in https://forums.developer.amazon.com/questions/54909/impact-of-dont-encrypt-customer-data-part-of-drive.html:
So, restic and other apps should be good. |
I think their intention is to protect the users having their data encrypted without a way to recover it.
Steffen
|
One other motivation which I find plausible is to increase interoperability — if each application encrypts their files, the user’s ability to switch between applications is severely hampered. |
I asked Arq Backup support. They encrypt everything, and said that their app had been approved by Amazon, and to not worry. I'm not sure what Amazon is trying to say. But seems that are now evaluating each case as they come in. |
Not sure if anybody is aware of the recent ACD drama with acd_cli and rclone, but a TL;DR of the situation is that they have had their ACD API access revoked due to TOS violations. Their efforts to regain API access are apparently being hampered by the fact that Amazon has stopped accepting new third-party apps for ACD. I assume this latter revelation stops any Restic ACD support in its tracks, unless the project had already obtained ACD API access. |
acd_cli API access was revoked due to a security issue with their oauth app, not a TOS violation. The problem has been fixed and Amazon re-instated their key. Although this is off topic from this project. New ACD API access is currently closed. |
Thanks for posting this here, I wasn't aware of it. I had reservations implementing ACD, and it seems that Amazon indeed did not like secrets in the code of an Open Source program: rclone was banned for it: https://forum.rclone.org/t/rclone-has-been-banned-from-amazon-drive/2314 On the other hand, acd_cli implemented an OAUTH auth service (not sure what the correct nomenclature here is). This handles authorization for all users, and there apparently was a bug that allowed people to access/modify other people's files. Since Amazon isn't accepting new clients anyway I'm closing this issue for now. Thanks! |
The "Unlimited Everything" plan of Amazon Cloud Drive is a quite affordable backup storage option. Amazon Cloud Drive has its own RESTful API.
The text was updated successfully, but these errors were encountered: