Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathnames depend on backend #654

Closed
jayme-github opened this issue Nov 1, 2016 · 9 comments
Closed

Pathnames depend on backend #654

jayme-github opened this issue Nov 1, 2016 · 9 comments
Labels
type: question/problem usage questions or problem reports

Comments

@jayme-github
Copy link

jayme-github commented Nov 1, 2016

I was trying to move a file based repository to S3 by copying the files but I'm unable to access it ("unable to open repo: wrong password or no key found")

Looking closer reveals that the file backend uses plural path names ("snapshots", "keys", "locks" probably from src/restic/backend/paths.go) while (at least) S3 ans Swift backend uses the singular FileType (src/restic/file.go) directly ("snapshot", "key", "lock").

It would be cool to have the same path names in every backend so we can move the repos around.
restic check --read-data finishes successfully when I rename the directories to their singular names and remove the sub directory structure below data:

Original

./data
./data/e3
./data/e3/e37d9fc8074d7032010a5876df3aaa4008c404227fea426ee4f46b02762ff396
./snapshots
./snapshots/0a8fd93c7b58c254d255e1b796b2b4a005f2e6b2a4d954cad0c465e56733a1ae
./index
./index/975c562709214460143b85fdf3f335761dcb08a8ea9944416a22e86ce20140ac
./locks
./keys
./keys/e8a7991a36bc46d51a5a59a3fd218396dc679f5406ddedc9ae535fa83b6792c0
./tmp
./config

Modified

./data
./data/e37d9fc8074d7032010a5876df3aaa4008c404227fea426ee4f46b02762ff396
./index
./index/975c562709214460143b85fdf3f335761dcb08a8ea9944416a22e86ce20140ac
./config
./snapshot
./snapshot/0a8fd93c7b58c254d255e1b796b2b4a005f2e6b2a4d954cad0c465e56733a1ae
./key
./key/e8a7991a36bc46d51a5a59a3fd218396dc679f5406ddedc9ae535fa83b6792c0

Output of restic version

restic 0.3.0 (v0.3.0-16-g011aee1)
compiled at 2016-10-31 08:38:12 with go1.6.2 on linux/amd64

@fd0
Copy link
Member

fd0 commented Nov 5, 2016

Thanks for bringing this up. When we first added the s3 backend, this just slipped in. Probably we'll always have slightly backend-specific filenames. I've thought about adding an option to synchronize (copy) data within backends, #323 mentions this. What do you think about that?

@fd0 fd0 added the type: question/problem usage questions or problem reports label Nov 5, 2016
@zcalusic
Copy link
Member

zcalusic commented Nov 5, 2016

I agree that synchronizing backend file structure and keeping it in sync would be great. 'Cause what I saw by playing with minio S3 backend, behind all the bells and whistles is just a bunch of files.

At some point, not being happy with the speed of my restic/minio setup, I wanted to benchmark the difference between local and S3 backends. That's where I was bitten with these subtle differences, cause trying to use S3 files directly as a local repo soon turned out to be an impossible mission. I had to gave up after half a dozen careful renames, not seeing any progress, and errors just piling up.

@fd0
Copy link
Member

fd0 commented Nov 6, 2016

Is it possible that the minio server adds data to the stored files? I'm pretty sure that if this is not the case, you can just use the files and access them locally once they're in the right directory structure.

@jayme-github
Copy link
Author

Thanks for bringing this up. When we first added the s3 backend, this just slipped in. Probably we'll always have slightly backend-specific filenames. I've thought about adding an option to synchronize (copy) data within backends, #323 mentions this. What do you think about that?

A "sync/copy" mode would be a good thing to have at least when there is an actual need for backend-specific file/path names. As far as I understand the existing difference was introduced by accident. I think it would be a good idea to try to keep the structure compatible wherever possible to make handling of files easy.
There even might be situations where the sync/copy mode is of no help. Think of, for example, backblaze b2 storage where you can order a flash-/harddrive with your data. How would one "convert/migrate" that data to file backend format?

I would suggest to align all existing and new backends to file backends structure and add a compatibility layer, maybe even a "migrator" for existing S3 repos (new ones could be created with the correct/fixed structure).

@zcalusic
Copy link
Member

zcalusic commented Nov 6, 2016

I brought REST backend (restic-server) up to date with local backend with this commit. Now, it's possible to access the same repo via restic-server or locally.

I'm already seeing interesting things, local restic-server is 26% faster than local backend. Time will tell how this can even be possible :), but for now it looks like this (backup ~6GB/224k files):

  • duration: 1:52, 44.16MiB/s (using local backend)
  • duration: 1:29, 55.74MiB/s (using restic-server)

There are still lots of improvements I intend to implement in restic-server, you can follow development in this repo: https://github.com/zcalusic/restic-server

@fd0
Copy link
Member

fd0 commented Nov 6, 2016

In my opinion it won't be possible to have exactly the same structure/filenames for all backends (think more obscure ones such as a MySQL database, or a DHT). Different backends will always have different requirements, for example the local backend creates sub-directories for data files so that the number of files in a single directory is reduced.

I think we should always try to create similar structures where possible. And discuss how we can correct the structure the s3 backend uses without breaking anything.

@jayme-github
Copy link
Author

jayme-github commented Nov 6, 2016

Agreed, that's what I meant.

I think we should always try to create similar structures where possible. And discuss how we can correct the structure the s3 backend uses without breaking anything.

By correct you mean "the structure of the file backend" right?
There are probably two ways to go:

  1. A compatibility layer in S3 backend that tries the "file backend structure" first and falls back to "legacy S3 structure"
  2. A one-time "migration" that renames object in S3 to the "file backend structure". (Not sure but I guess "rename" in S3 is more like "copy and delete")

Option two could be costly for big repos (just a guess) but would not require permanent maintenance of the compatibility layer in option one.

@zcalusic
Copy link
Member

zcalusic commented Nov 6, 2016

@fd0, of course this ticket reflects only to file system based backends, currently local, sftp, s3 & rest. Which, by coincidence, is all we have right now (ignoring mem backend, which is only for testing purposes). :)

But, having DB backend is a neat idea, I must admit. I had this idea for a long time, to keep backups in PostgreSQL, and then have the database replicated for additional security. Someday, I could even attempt to do that, even if only just to see what will happen. :)

@jayme-github, your solution 2) is not costly at all. I just did it with my 67GB S3 repo, as a preparation to put it under restic-server. Basically, it's a few mv/mkdir operations, so I decided to do it from the shell. It took less than 5 minutes (on a low end rotational hard drive), but most of that time was spent execing the mv command from shell on a slow CPU. A simple Go program would be an order or two magnitudes faster, so I guess you could convert even multi TB repositories in a matter of minutes.

I'm attaching the shell script below, the repo passed local restic check aftwerwards, so it should be pretty safe. Last few mv/mkdir/rmdir commands are there only to bring data subfolder to a manageable size (it was quite large before that, hashing data folder is definitely important for any repo larger than a few GB). Currently only S3 backend is missing that feature.

cd $PREVIOUSLY_MINIO_S3_REPO
mv key keys
mv snapshot snapshots
mkdir locks
mkdir tmp
cd data
for i in {0..255}
do
        mkdir `printf "%02x\n" $i`
done
for file in `find . -type f`
do
        mv $file `echo $file | cut -c3-4` 
done
cd ..
mkdir data2
mv data/* data2
rmdir data
mv data2 data

jayme-github added a commit to jayme-github/restic that referenced this issue Dec 12, 2016
This aligns the path names generated for S3 backend to the ones used by
the file backend allowing S3 objects to be used as file backend and
vice versa.

Dirname and Filename generation logic have moved from file backend to
tha backend package.

Added a environment variable (AWS_LEGACY_PATHS) to S3 backend which
cat be set to true to switch to legacy pathnames (to be used with
existing repositories).

Fixes restic#654
@fd0
Copy link
Member

fd0 commented May 25, 2017

We're moving towards unifying the repo layout, this is tracked as #965. I'm closing this issue.

@fd0 fd0 closed this as completed May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question/problem usage questions or problem reports
Projects
None yet
Development

No branches or pull requests

3 participants