Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for Google Cloud Storage repositories using the existin… #449

Closed
wants to merge 4 commits into from

Conversation

ckemper67
Copy link
Contributor

…g s3 support to address #211.

Added a new package to handle GCS configuration. GCS repositories get specified
using gs://bucketname/prefix syntax, similar to gsutil.

Added tests for the various cases to config_test and location_test.

@fd0
Copy link
Member

fd0 commented Feb 15, 2016

Thanks for your contribution!

Can you describe why this needs a separate prefix? As far as I understood the code, it is sufficient to call restic with --repo s3:https://storage.googleapis.com/bucket/prefix. Why the extra gs:?

@ckemper67
Copy link
Contributor Author

It is equivalent, but I think it is easier to use that way.

This is similar to the s3:region/bucket/prefix shorthand for s3:https://s3-region.amazonaws.com/bucket/prefix

@fd0
Copy link
Member

fd0 commented Feb 16, 2016

It is equivalent, but I think it is easier to use that way.

I appreciate your work, but to be honest, I think that this may not be enough for justifying 200 lines of extra code. Code that we have to maintain, and a URL scheme we'll never get rid of once users start using it: we'll have to support it for a long time.

What about the other services the minio s3 library supports (OpenStack Swift, Ceph, Riak), should we also add custom backend scheme names and extra code for them?

What do you think about documenting how to use the existing s3 backend to use GCS, instead of adding extra code?

@ckemper67
Copy link
Contributor Author

Well the 200 lines are mostly tests. In theory since this is a simple
translation to the s3 scheme for now, we could reduce them and rely on the
existing s3 tests. I could even call s3.createConfig() to save an
additional dozen lines.

I do understand your concern regarding supporting the Url scheme, though.
Long term I would like to implement proper support for GCS using the native
security features of the platform instead of the legacy S3 secrets. Is that
something that you are still interested in supporting in Restic? Or do you
consider the existing support through minio good enough?

I assumed that from the existing discussion on #211 (and
#304 which you closed three days ago)
that you would like to see support for GCS but wanted it to pull in fewer
dependencies. So I decided to support it using the existing minio library
as a first step. It certainly wasn't obvious to me that Restic supports GCS
before I started tinkering with it.

I will definitely add something to the documentation to make it easier to
discover.

@ckemper67 ckemper67 force-pushed the fix-211 branch 3 times, most recently from 02ceddb to 3b6784e Compare February 21, 2016 15:56
@fd0 fd0 added the type: discussion undecided topics needing supplementary input label Feb 21, 2016
@ckemper67 ckemper67 force-pushed the fix-211 branch 5 times, most recently from 477b557 to be3ce4b Compare February 28, 2016 01:22
}

func createConfig(endpoint string, p []string, useHTTP bool) (interface{}, error) {
// Creates a Config at the specified endpoint. The Bucket is the first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported function NewConfig should be of the form "NewConfig ..."

@ckemper67
Copy link
Contributor Author

I extended the s3 implementation to allow a different repository layout (similar to what I asked for in #478) and used that new layout in the GCS implementation. I was able to copy a local restic repository into GCS and successfully ran restic check on it.

Let me know what you think.

@fd0
Copy link
Member

fd0 commented Feb 28, 2016

Hey, thanks for your work. In order to ease the review this PR, may I ask you to:

  • Rebase the commits on the current master branch. This will be a bit of work since we switched to gb and gb-vendor for development, and the file system structure was changed.
  • Drop the commits that aren't relevant any more, and remove the .a files that went in by accident.

If I understand your code correctly, you implemented a new backend-scheme gs which uses the minio backend with a structure compatible to the local and sftp backends, i.e. with the packs below a second level of subdirectories (data/aa/bb[...]). Is that correct?

I'm still not convinced (as discussed in #478) that we should force/allow technical implementation details of one backend to leak into the other backends. But I can see your point that it may be valuable to be able to use rsync/cp/gsutil/mc to manually copy data to and from different repos.

Would a copy command (see #323) which automatically handles the specifics of the backends and only copies data (without decrypting/reencrypting, so that you end up with the same pack files in the repo) suffice for you?

@ckemper67
Copy link
Contributor Author

I had already rebased my pull request on the master branch after you made the switch to gb. I guess the Godeps were a remnant of the switchover. I removed those files from the pull request.

All the changes in this branch are necessary: I first introduced the gs: scheme to write to GCS, simplified it a bit and then changed the layout of the GCS repository by adding the handling of different layouts to S3.

So in the end this whole change introduces a way to store restic backups in GCS that can be easily synced on a filesystem basis.

As to if this is better than copy is up for debate, I guess. On one hand I can see the desire of not leaking backend implementation details, but on the other hand once you start writing backups it will be difficult to change the backend storage model without writing the complete repository again. So if two repository implementations happen to have the same layout on disk they will be synced for a while.

If we store the backups in a compatible way on filesystem we can give people the flexibility to use different ways to synchronize their backups between the different storage models. i.e. you can run restic backups to local storage, rsync that to your sftp server periodically and then simply restore directly from the sftp repository. That already works in the current implementation and I added support to make that work with the cloud storage as well: Sync your repository to S3/GCS/OneDrive etc. periodically in case you lose your local backup and then restore from the cloud in case of disaster.

As an aside: Another interesting side effect of my change is that you can experiment with the efficiency of local caching. Running a backup to a local repository and syncing to the cloud is much faster than running the backup the cloud directly. Ideally restic can approach the same speed in the long run.

@ckemper67
Copy link
Contributor Author

One more thing: I noticed that the test coverage went down because there are no tests for the new repository layout. What is the best way for me to test this? i.e. where is the entry point for the existing S3 tests, so I can inject a different Config? I can see the Config in s3_test.go, but I don't know how that gets used during testing or how I can reuse that for GCS testing.

@fd0
Copy link
Member

fd0 commented Mar 2, 2016

Just a quick heads-up: I haven't forgotten, I just don't have any time at the moment.

path.Join already automatically skips empty path segments when
joining, so this simplifies the s3Path code.
…g s3 support

to address restic#211.

Added a new package to handle GCS configuration. GCS repositories get specified
using gs://bucketname/prefix syntax, similar to gsutil.

Added tests for the various cases to config_test and location_test.
gcs.ParseConfig.

Updated gcs.ParseConfig to use s3.NewConfig
repository layout.

Updated the s3 config to use the s3 repository layout to keep backward
compatibility with existing s3 repositories

Updated the gcs config to use the new local compatible repository
layout, allowing the use of "gsutil rsync" to sync content into Google
Cloud Storage and transparently access the synced content using
restic.

Added support for using gs:http:// to enable the new repository layout
on other s3 compatible cloud providers as well.
@fd0
Copy link
Member

fd0 commented Jan 23, 2017

I'm really sorry that I didn't give any further feedback on this PR. I thought about it, though, and I'd like to move forward in supporting GCS with a native backend. I hope you're still interested! :)

I'm going to close this PR for now, let's move the discussion to the issue (#211).

As a side note, I've just merged #741 which slightly changes the backend API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: discussion undecided topics needing supplementary input
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants