[RFC] Design Proposal: Cloud Service backups using Restic #3434

vrusinov · 2021-06-17T22:06:34Z

Proposed: 2021-06-17

Last update: 2021-06-17

Status: proposed

View this document in: Google docs | my website | Github

Problem statement

Restic is a modern backup program that can backup your files. Individuals and SMBs may solve their file backup problem using restic.

However there is still a lot of data in cloud services, and it’s often more important than local files. E-mail, social network profile data, online documents and spreadsheets are often more important than data on one’s HDD or SSD. And while the majority of online/cloud providers do a good job at keeping the data safe and taking care of durability, mistakes still happen.

It is possible to be locked out of a cloud account (especially if it's a free one) or remove data accidentally. Sometimes services get it wrong and lose your data, or even just shut down.

The cloud data needs to be backed up too.

Some services make it possible (e.g. Google lets you get a copy of your data via Google Takeout), but almost none make it easy and convenient. Wouldn’t it be great if cloud data could be backed up just as easily as the local files using restic?

Design proposal

Summary

Restic already has an ‘FS’ interface which abstracts away filesystem access. There are implementations for local Windows and Unix filesystems. We can add ‘cloud’ filesystem implementations which will represent various objects as files. Depending on the backup source the corresponding ‘FS’ implementation will be chosen and the rest of the restic code will be unaware whether it is working with local filesystem or some virtual one representing some cloud service.

This idea is partially implemented by YoshieraHuang in pull request #2995 (for sftp) and by KrustyHack in pull request #2223 (for Google Cloud Storage).

UX

Pull requests #2995 and #2223 referenced above introduce a large number of additional flags to handle authentication. If we were to implement a dozen different backup sources, we’d have to add even more different flags (or environment variables), and it may quickly become messy.

It is also not clear how to choose the correct ‘FS’ implementation.

I propose to solve this via turning the backup source argument to be url-like and implementing authentication via configuration files. Examples:

restic -r <repo> backup /home/user/ - will do a backup of local /home/user/ files.
restic -r <repo> backup file:/home/user/ - same as above
restic -r <repo> backup sftp:user@host/home/user/ - will log in as user@host via sftp and do a backup of /home/user/.
restic -r <repo> backup gmail:/home/user/.config/restic/gmail-auth.conf - will do a backup using ‘gmail’ ‘FS’ implementation and will use authentication from /home/user/.config/restic/gmail-auth.conf file.

And so on, with the general structure being <fs_implementation>:.

It will be the responsibility of each fs implementation to interpret a path. For file FS implementation it will be a local directory. gmail may open and parse settings from a local file, etc.

Where possible different ‘FS’ implementations will share similar config format and behaviour.

Restore

Restoring cloud backups may not be straightforward. It is easy to restore filesystem-like ‘sftp’ or ‘gcs’ data by copying/uploading files to corresponding service. However ‘facebook’ or ‘strava’ may not provide the ability to restore data in an automated way, if at all.

restic will provide tools to convert mounted (e.g. via fuse) backup to something usable. Having social network post history in some human- and machine-readable formats may be still worthwhile even if it’s not possible to re-import it back.

Hostname and path handling

By default restic uses local hostname and path to identify snapshots.

This may not work well for cloud services, especially for hostname. Using local hostname and path can easily lead to mess, e.g. if backups of the same cloud service account are taken from different hosts.

Different ‘FS’ implementations may override hostname (unless one is explicitly provided via --host flag). It will be recommended to use @ format as default hostname and avoid using local hostname for non-local ‘FS’ implementations. Examples could be:

vladimir.rusinov@gmail.com@google_mail

vladimir.rusinov@gmail.com@google_calendar

zuck@facebook

bill@msn_mail

etc.

rdiff-backup "frontend"

Similarly to rdiff-backup backend, rdiff-backup "frontend" may be integrated to provide support for a bunch of storage/cloud services. One integration may unlock support backups of a large number of filesystem-like services, but will not allow backups of less file-like services. E.g. it may help backup Dropbox but may not help with Google Calendar backups. Also, it’ll be likely more awkward to use than "native" service support.

More research and more specific design may be needed.

Advantages of this design

One restic repository can be used for all backups - local and cloud
All benefits of restic snapshot management
Some deduplication possible, e.g. for when some subset of data is synced to local filesystems

Downsides

One restic repository can be used for all backups - local and cloud - can be dangerous if backup repository is compromised
Increased restic binary size. Since it’s in Go and statically-linked, adding more ‘FS’ implementations may pull more dependencies and increase ‘restic’ binary size for everyone.
UX is not perfect - we mix paths and config files.

Next steps / Milestones

Write design proposal - done
Send proposal to review
In parallel to (2), start implementing cloud backup for one provider as a proof of concept. Having actual code will help refine design and may help discussion.
Iterate on design comments, adjust the code from (3) accordingly.
Finalize design and code of the first cloud backup source, send PR, merge it into the upcoming version of restic.
Implement support for popular cloud service providers: SFTP and GCS as there are already pull requests which may need a small number of changes, Gmail, Facebook, Github, Hotmail, Dropbox, Google Drive, etc.

Alternatives

Do nothing

Too late, I already wrote this design.

Also, I still need my backups.

Keep restic for local backups only

One can simply have a service-specific backup/dump program and save backups as local files, to be picked up by restic backups. This is approach currently used by the author of this document and it has several downsides:

Requires managing different tools and different backup schedules
Makes it difficult to see which services were backed up when
Requires enough local storage to store a copy of all cloud data

Use stdin source + 3rd party binaries

Backup source can be implemented as a separate binary that simply dumps backup into stdout (e.g. in tar format if the source is file-based). Restic will then consume backup from stdin.

Such an approach is possible today, and no code changes are required. Restic may provide better documentation with specific examples of how to do this at least for popular services.

Advantages:

No code changes are required
No restic binary size bloat and no additional code to maintain

Disadvantages:

Worse UX
Worse deduplication (tar may add its own headers or realign blocks in a way that makes deduplication impossible).
Impossible to recover from partial failures - the whole backup/export will have to be started from scratch
No advantage from restic cache.

The text was updated successfully, but these errors were encountered:

underdpt · 2022-01-08T09:52:25Z

Hello,

I would like to add another alternative, not sure if it's doable or the complexity for it to be added: use rclone a a source.

Today we can use rclone as a backend, which adds tons of backends to restic. How about using it as a source? You can then provide cloud-to-local backups and even cloud-to-cloud deduplicated and encrypted backups and both projects would benefit from it.

vrusinov · 2022-01-26T10:53:21Z

Yes, that's certainly an option. I don't want to limit to just rclone. E.g. as a prototype I was trying to modify restic to back up github repositories along with metadata such as issues and issue comments.

I almost implemented backing up issues, and have them as "virtual" json files (e.g. this issue and comments would be backed up as /github/restic/restic/issues/3434.json). I've mixed up absolute and relative paths somewhere so prototype isn't functional yet and I did't have time to work on it further yet.

dimejo · 2022-01-26T15:17:29Z

Sounds like your proposal would solve #299.

vrusinov · 2022-02-20T11:01:11Z

Partially, I read #299 more as a request to transition to client-server architecture, although I agree the goals of #299 may be archived by having a "chain" of backups.

rmanibus · 2024-01-07T23:53:12Z

very interested in this. I did some experiment here to implement FS for google drive:
rmanibus@cd6eecd

The main issue I am seeing for now is that files are uniquely identified by their id and not by their name. It is not possible from the API to get the file by it's path in a single request.

I partially solved it by:

making name return the Id in file info
making Join just return the last segment on the path
making LStat accept an id instead of a path

But this would need to be worked on a bit more, mainly because for now:

it is not retaining the file name
it is backuping the entire drive using the 'root' alias

I am also thinking of another issue: If we retain the id in the backup, it might work fine until we try to restore it on a blank drive. At this point we will recreate each file under a new id and wont be able to trivially match the file in the next backup.

It is also worth mentioning that in google drive two files in the same dir can have the same name.

rmanibus mentioned this issue Feb 25, 2024

Backup from Cloud Service #4711

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Design Proposal: Cloud Service backups using Restic #3434

[RFC] Design Proposal: Cloud Service backups using Restic #3434

vrusinov commented Jun 17, 2021 •

edited

underdpt commented Jan 8, 2022

vrusinov commented Jan 26, 2022

dimejo commented Jan 26, 2022

vrusinov commented Feb 20, 2022

rmanibus commented Jan 7, 2024 •

edited

[RFC] Design Proposal: Cloud Service backups using Restic #3434

[RFC] Design Proposal: Cloud Service backups using Restic #3434

Comments

vrusinov commented Jun 17, 2021 • edited

Problem statement

Design proposal

Summary

UX

Restore

Hostname and path handling

rdiff-backup "frontend"

Advantages of this design

Downsides

Next steps / Milestones

Alternatives

Do nothing

Keep restic for local backups only

Use stdin source + 3rd party binaries

underdpt commented Jan 8, 2022

vrusinov commented Jan 26, 2022

dimejo commented Jan 26, 2022

vrusinov commented Feb 20, 2022

rmanibus commented Jan 7, 2024 • edited

vrusinov commented Jun 17, 2021 •

edited

rmanibus commented Jan 7, 2024 •

edited