Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make restic work without permission to delete files on GCS #1544

Open
gebi opened this issue Jan 11, 2018 · 47 comments
Open

Make restic work without permission to delete files on GCS #1544

gebi opened this issue Jan 11, 2018 · 47 comments

Comments

@gebi
Copy link

gebi commented Jan 11, 2018

Output of restic version

restic 0.8.1
compiled with go1.9.2 on linux/amd64

How did you run restic exactly?

export GOOGLE_PROJECT_ID=xxx
export GOOGLE_APPLICATION_CREDENTIALS=
restic --no-lock -p -r gs::/ ~/doc
password is correct
scan [/home/gebi/doc]
scanned 1852 directories, 5764 files in 0:00
Remove(<lock/991ceef8fd>) returned error, retrying after 424.215262ms: client.RemoveObject: googleapi: Error 403: SERVICE-ACCOUNT does not have storage.objects.delete access to /locks/991ceef8fd55609bb08303bf43c637d12de122800627e71492d10be6474334f0., forbidden

What backend/server/service did you use to store the repository?

GCS (google cloud storage bucket)

Expected behavior

not creating lock files when called with --no-lock, because the service account has actually no permission to delete objects in the GCS bucket.

Actual behavior

restic creates lock files which is then not able to delete.
The backup is successfully created, but restic hangs on deleting the lock file needing a kill. (strg+c does not work)

Steps to reproduce the behavior

Create GCS bucket and service account with service account having only the following permissions in the bucket:

  • Storage Object Creator
  • Storage Object Viewer

Do you have any idea what may have caused this?

yes, missing permissions and --no-lock which creates lock files

Do you have an idea how to solve the issue?

not create lock files :)?

Did restic help you or made you happy in any way?

AWESOME tool, i use it daily, especially since GCS support was added (thx again for the effort!).
We are currently testing the new permissions in GCS and try to get a setup where the local machine is not able to delete it's own backups anymore.
(GC not working is a non-issue in this case for me).

@fd0
Copy link
Member

fd0 commented Jan 11, 2018

Hey, thanks for raising this issue. The --no-lock option is only for supported operations, like check. Any operation that may add data (such as backup) does not support it, that's not the way the repo was designed.

Is it maybe an option to grant the service account deletion on the locks/ subdir?

Or backup to a local directory, and then use e.g. rclone to sync new files to the cloud?

@fd0 fd0 added the state: need feedback waiting for feedback, e.g. from the submitter label Jan 11, 2018
@tamalsaha
Copy link

Is it maybe an option to grant the service account deletion on the locks/ subdir?

That is not possible in GCS.

@gebi
Copy link
Author

gebi commented Jan 11, 2018

@fd0 restic backup with --no-lock works here though and i'm deleting everything under /locks after a few hours automatically.

% myrestic check
password is correct
load indexes
check all packs
check snapshots, trees and blobs
no errors were found

no, that's not posible because of size constraints.

@fd0
Copy link
Member

fd0 commented Jan 14, 2018

@gebi "it works" means restic does not return an error when you specify --no-lock, but for the backup operation it will still create a lock file. The --no-lock switch is checked for each operation individually, and only some (like check) respect it.

I understand your use case, but I must say that I'm very reluctant to add support for --no-lock to backup, because of the high potential that people use it without having understood what it's for. For example, let's say a backup takes much longer than anticipated, and while the backup (without a lock) is still running, the prune operation is started on a different machine. It won't see any lock, and since the other process isn't finished yet, it won't see the snapshot it created. So the prune process won't know which data is referenced by the new snapshot, and it will even remove newly uploaded files, since these aren't referenced by any existing snapshot. Then the backup process is done, uploads a new index (referencing removed files) and a new snapshot that cannot be restored any more.

How do you make sure that there's no restic backup process running when you run prune?

@gebi
Copy link
Author

gebi commented Jan 17, 2018

@fd0 The service account used by restic is simply not allowed to delete files on the GCS bucket. So a collision between those two commands is not possible.

IMHO it would be a very important property and worthwhile goal to support GCS as a tamper-proof storage in restic because intruders more often than not go on and delete everything they can find including backups, with a service account with full delete/rewrite permissions the backup is effectively worthless.

@fd0 but as you said, there should be a warning printed that there is a possibility for data corruption if some priviledged account executes prune in parallel (maybe possible to supress with some i-know-what-i-m-doing switch).

ah... and restic prune is never called on this datapools, they are sharded by year and only whole years are deleted (because deleting individual snapshots out of a repository is unnecessarry and too slow).

@gebi
Copy link
Author

gebi commented Feb 1, 2018

@fd0 Any idea to my suggestion?
If i understand the code correctly it would just be not creating any lock files and thus automatically a clean exit from restic (currently it needs to be killed with -9).

Supporting tamper proof backups in restic would be awesome as pretty few backup systems support such a mode and it is a property you nearly always want.

As for parallel prunes, why not use a two-phase "commit' for prunes?
restic prune only writes the objects it would delete out to a file, on the next run it would run normally meaning first create a list of objects to prune from the whole repository, and after that only delete files found in the list written on the run before. All objects not in the list are written out into a new list. This would remove the need for any locking with the safety guarantee to not delete objects from running backups which where shorter than the prune interval (which if prune only runs every month is quite a good safety margin).
This idea would have the additional advantage as to not need a new repository format too.

@ifedorenko
Copy link
Contributor

See #1141 for lock-less prune discussion.

@gebi
Copy link
Author

gebi commented Feb 5, 2018

@ifedorenko yes, i've read the discussion, but as already mentioned there the duplicacy aproach needs an update to the repository format, this simple albeit not that fancy aproach does not and also does not have extensive requirements regarding atomic operations in the backend store.

@ifedorenko
Copy link
Contributor

I was merely suggesting to discuss lock-free prune in #1141, so we have all ideas in one place.

@fd0
Copy link
Member

fd0 commented Feb 25, 2018

I've decided to not add support for --no-lock for backup, at least not for now. If you want this behavior (and it feels to me you know what you're doing), one way is to patch it into restic manually and build it yourself. Here's a patch: https://gist.github.com/fcaf7a0cbc35b4e0bebc901fbacd3860

@fd0 fd0 added type: feature suggestion suggesting a new feature and removed state: need feedback waiting for feedback, e.g. from the submitter labels Feb 25, 2018
@gebi
Copy link
Author

gebi commented May 4, 2018

@fd0 that's unfortunately as we really require the functionality for WORM backups (write once read many), and most backup users do so too, they just don't realize it, or only after their first compromise where most backups are then deleted too.

I've forked restic and made the first release of restic-worm that we have to use and will keep up to date to your upstream restic as it makes sense for our usecase.
It's currently backing up about 10PB of data and running fine so far, i really wish we would have found a possibility to work together and add this functionality to upstream even if it meant we would have to test it and keep it in shape, a fork is of no help for both sides.

https://github.com/mgit-at/restic/tree/v0.8.3-worm
https://github.com/mgit-at/restic/tree/backup-nolock

@fd0
Copy link
Member

fd0 commented May 4, 2018

@gebi I can understand your use case and what you're trying to do. In my opinion, just adding the small patch to allow --no-lock during backup has the potential to be (ab)used by way too many users in the wrong way, which may lead to data loss. That's the reason I don't like just adding it.

In general, the pruning process is not optimal, and even using lock files is unfortunate, especially for your use case. It was what I came up with during the initial design phase, and it's the simplest to implement. We will change it and move to something better in the long run, for sure.

I agree that a fork is unfortunate and won't help us both, even if it is only the added --no-lock to support your use case. I could live with a patch that enables this only with a special worn build flag that disables the prune command altogether and runs backup and check without lock files. Would that maybe work for you?

I'm very interested in your results of working with 10PB within a restic repo, that's awesome!

@gebi
Copy link
Author

gebi commented May 9, 2018

@fd0 hmm... the more i think about that the more i came to the conclusion that maybe we should just refuse to allow restic prune on such backends receiving snapshots with no-lock.

So why not let the restic snapshot --no-lock create a lock file if it's not already there but just not delete it?
This would prohibit restic to run any prune operations on the data in parallel, but would not restrict future snapshots (as far as i can see they just work, regardlessly of how many lock files are present).

If the user want's to prune such storage backends he has to ensure no one is writing to it concurrently, which is totally fine for the intended usecase.

If we get #1141 into usable shape that might lift that restriction later on, but for now i would be totally fine to restrict it, it would just be awesome to have the functionallity for writing with snaphots to WORN GCS buckets in upstream.

would this path work for you?
It would create a safe way to do WORN backups, have the functionality included in upstream, and concurrent prune is the responsibility of the user if he wants to use/implement it.

@fd0
Copy link
Member

fd0 commented Jun 8, 2018

I assume you mean restic backup --no-lock? This would work for backup, but you'll end up with many lock files over time...

For other operations (such as snapshots) changing the behavior won't work: Originally we've added the --no-lock switch in order to support accessing a repository on read-only media (like DVDs).

@gebi
Copy link
Author

gebi commented Jun 9, 2018

@fd0 yea i meant restic backup --no-lock, yes many lock-files would be the outcome but they could either be deleted from the machine pruning the data or first check if at least one lock file exists and only create a new one if no lock file exists.

And for operations such as snapshots it would be either be the same behaviour as now or ignore the error?

@fd0
Copy link
Member

fd0 commented Jun 9, 2018

Hm. Maybe we can really do that: Not exit with an error if the lock file could not be removed. That'd work in most cases, especially in the ones you're interested in.

@gebi
Copy link
Author

gebi commented Jun 9, 2018

Yea that would be awesome!

btw... the current behaviour is a loop with error output where restic can't be killed normally but only with kill -9 (for this case)

@fd0
Copy link
Member

fd0 commented Jun 9, 2018

Uh, that's not good, thanks for pointing it out again.

@lukastribus
Copy link

I think you should be able to add full write and delete permissions only for the lock folder by using ACL's:
https://cloud.google.com/storage/docs/access-control/lists

I did not test it myself though and could be mistaken.

@gebi
Copy link
Author

gebi commented Jul 30, 2018

@lukastribus it would be awesome if that works, but it doesn't.

There is no "folder" lock/ to put ACLs on it, google cloud storage buckets don't work like that, sorry.
One would have to put ACLs on each individial object within the namespace lock/ but that would defeat the purpose.

@gebi
Copy link
Author

gebi commented Jul 30, 2018

@fd0 any news on the feature "Hm. Maybe we can really do that: Not exit with an error if the lock file could not be removed. That'd work in most cases, especially in the ones you're interested in."

It would be really awesome if you could add this to restic, would make our life a whole lot easier and it would IMHO be a worthwhile addition.

@mholt
Copy link
Contributor

mholt commented Jul 30, 2018

What if it was a specific exit code? It could be useful to know that there's a stale lock in the repo...

@fd0
Copy link
Member

fd0 commented Jul 31, 2018

@fd0 any news on the feature "Hm. Maybe we can really do that: Not exit with an error if the lock file could not be removed. That'd work in most cases, especially in the ones you're interested in."

Nope, no news unfortunately. I don't have much time at the moment, so somebody needs to actually do the work here and build a prototype ;)

One small issue which isn't mentioned here (as far as I can see) is that when long-running operations such as backup run, restic replaces its own lock file every few minutes with a new one with a new name. So there's no "single" lock file, but a bunch of them.

What if it was a specific exit code? It could be useful to know that there's a stale lock in the repo...

Good point, hm.

@gebi
Copy link
Author

gebi commented Aug 1, 2018

The base of this discussion is making the --no-lock flag available for backup, which already exists as code (from you). It's also what we are currently using to backup everything.

@fd0 what would be your preferred way? i've submitted the patch we are using on top of restic (which was thankfully provided by you, i've just rebased it to master). #1917

@matejdro
Copy link

Maybe another way could be option of having lock files stored separately (for example in a bucket that you DO have write access to)?

I wonder what everyone else does to resolve this security situation? If backuping machine gets compromise how do you make sure attacker does not just delete all the backups?

@parkerp1
Copy link

I'm running into this same issue as well. I have no need to (and because of data requirements can't) prune backups. Running backup --no-lock would be the perfect solution for me. Is there a viable workaround out there?

@sdudley
Copy link

sdudley commented Jun 24, 2019

@parkerp1 We're using effectively the method suggested here, which is to have restic talking to two copies of rclone fronted by tinyproxy, and having one read/write bucket for the lock files and one write-only bucket for the data. The article is about Wasabi, but the back end is irrelevant and we're using it with GCS. While not particularly clean, Dockerizing this solution helps to hide the complexity.

@gebi
Copy link
Author

gebi commented Jun 25, 2019

@parkerp1 we are still using a small patch on top of restic https://github.com/mgit-at/restic/tree/backup-nolock to backup a few hundred TB of data. Works like a charm.
Sadly it was rejected upstream...

@parkerp1
Copy link

Thanks @gebi and @sdudley. Both look like good options

@onionjake
Copy link

onionjake commented Nov 19, 2019

I agree with @fd0 that adding a --no-lock flag would be dangerous. If someone was getting a lock error they might try adding that flag to get past it and end up corrupting their data.

Instead perhaps on init there could be an alternate lock file location specified (with the fact that an alternate lock location is being used stored in the original repo). Then any command that is invoked and forgets to specify the alternate lock location could fail (e.g. "error: repo uses alternate lock location and no alternate location given"). The commands would also fail if an alternate lock location was provided but the original repo wasn't setup to use an alternate lock location.

This would make the advanced behavior possible and also keep the regular commands pretty foolproof for those not using the advanced behavior.

@rawtaz
Copy link
Contributor

rawtaz commented Apr 8, 2020

I'm running into this same issue as well. I have no need to (and because of data requirements can't) prune backups. Running backup --no-lock would be the perfect solution for me. Is there a viable workaround out there?

Two options off the top of my head:

  • Use rest-server with the --append-only flag - this lets your users only back up to their repositories, they won't be able to delete data.

  • Use a filesystem on the repository server that allows you to snapshot the relevant parts of the storage. I would recommend using ZFS because that will be extremely cheap snapshots, and easily accessible if need be. This will not be the same thing as not allowing deletions, but you will be able to always still have a copy of the latest snapshots anyway, so any deletions are pointless.

@m37r
Copy link

m37r commented Apr 17, 2020

Another approach would be to use object versioning. With that enabled, deletions may be allowed for the service account used by restic as they only actually add deletion markers to the version history. Only permission for the DeleteObjectVersion action must be refused in order to prevent an attacker from doing permanent damage.
Frankly, I am not sure about GCS, but I'm currently implementing this on S3/Wasabi and it looks promising.

@apollo13
Copy link

@fd0 Would an environment variable ala RESTIC_DANGEROUSLY_DO_NOT_LOCK_BACKUP be an option? :)

@fd0
Copy link
Member

fd0 commented Nov 5, 2020

@fd0 Would an environment variable ala RESTIC_DANGEROUSLY_DO_NOT_LOCK_BACKUP be an option? :)

No, I don't think so. I've outlined in #1544 (comment):

I could live with a patch that enables this only with a special worn build flag that disables the prune command altogether and runs backup and check without lock files.

@rptaylor
Copy link

rptaylor commented Dec 31, 2020

The backup is successfully created, but restic hangs on deleting the lock file needing a kill. (strg+c does not work)

Sounds like this 95% works, just runs into a small bump at the end. If there was a way to get around cleaning up this lock file after completing the backup, this approach to immutable backups based on lack of deletion privilege would work, and restic would not even need to know/do anything to support immutability.

#1141 and https://gist.github.com/fd0/d24abf8301724988919c80e65aba9b6d describe some ideas for operation without locks, but in the context of this issue (the cloud storage credential stored on the system does not have delete privilege, effectively achieving immutable backups if an attacker compromises the server), couldn't locking be implemented in an alternative way that works on immutable storage?

Instead of creating a lock file and then removing it when done, restic could create a lock file (named with some ID, e.g. lock-0001) and then create a lock release file matching that ID (e.g. lock-0001-release) when done, signifying the lock is no longer in use. Maybe the IDs always go up incrementally, so you can always tell which is the latest. (Yes, you would end up with tons of old lock files littering the directory, but maybe that's just the price of admission if using immutable cloud storage. Besides, this approach to realizing immutable backups does not prohibit the user from using a different credential with delete privileges to do cleanup - if they know what they're doing.) (I admit I don't know much about how the locking works in restic, just an idea.) Restic could even test creating and deleting a file to see if it needs to use an alternative locking approach that works on immutable storage. And this should only be needed for doing backups, since with immutable storage you would not be able to prune anyway.

Anyway this should be considered in the larger context of immutable backups and potential lock-free operation.

@rptaylor
Copy link

rptaylor commented Dec 31, 2020

Another workaround for this is: https://www.howtoforge.com/economical-append-only-offsite-backups-with-restic-and-wasabi-debian/

It relies on having 2 buckets (an immutable one for data and a regular one for locks), running a local rclone server as a gateway, and a local tinyproxy server to alternate between the regular and immutable buckets depending on whether a data file or lock file needs to be written, which is a lot of extra hoops to jump through.

Another alternative is: https://blog.heimsbakk.net/posts/20190911-restic/
operating a remote rclone proxy server which enforces append-only mode to the outgoing backups (not really a complete solution IMHO because you have to run another server and assume it won't be compromised - if that were true you wouldn't have to worry about your backups being compromised in the first place.)

So it would be nice if restic could natively support an approach to locking that would work on immutable storage , if possible.

@aawsome
Copy link
Contributor

aawsome commented Dec 31, 2020

When backing up to a storage backend that permits deletion of files, there is no risk that a prune removes files parallel to a backup run which would allow running restic without locks safely.

So the best solution would be to enable the --no-lock option to backup. Implementing this is very easy, just change cmd/restic/cmd_backup.go, lines 560ff from

	if !gopts.JSON {
		p.V("lock repository")
	}
	lock, err := lockRepo(gopts.ctx, repo)
	defer unlockRepo(lock)
	if err != nil {
		return err
	}

to

	if !gopts.NoLock {
		if !gopts.JSON {
			p.V("lock repository")
		}
		lock, err := lockRepo(gopts.ctx, repo)
		defer unlockRepo(lock)
		if err != nil {
			return err
		}
	}

The only point here is: If we allow this, how many bug reports from users do we get who end up using this option in unsafe environments (for whatever reason...)?

And the locks also ensure that check is not run parallel to a backup which again could give misleading results (but of course does no harm except again maybe strange "bug reports")

@dimejo
Copy link
Contributor

dimejo commented Dec 31, 2020

Another workaround for this is: https://www.howtoforge.com/economical-append-only-offsite-backups-with-restic-and-wasabi-debian/

It relies on having 2 buckets (an immutable one for data and a regular one for locks), running a local rclone server as a gateway, and a local tinyproxy server to alternate between the regular and immutable buckets depending on whether a data file or lock file needs to be written, which is a lot of extra hoops to jump through.

Kudos for creativity but this has to be the most complicated solution one can think of. 😉

A single bucket and this policy should be enough for append-only backups with restic and Wasabi (or other compatible backends). It allows to add (and read) all objects but disallows to remove any objects except within the locks/ directory.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::BUCKETNAME",
        "arn:aws:s3:::BUCKETNAME/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": "s3:DeleteObject",
      "Resource": "arn:aws:s3:::BUCKETNAME/locks/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::BUCKETNAME"
    }
  ]
}

@rptaylor
Copy link

rptaylor commented Jan 4, 2021

A single bucket and this policy should be enough for append-only backups with restic and Wasabi (or other compatible backends).

Nice, good to know there is a way to do that with some S3-compatible APIs.
With Backblaze S3 IIUC it would not work: https://www.backblaze.com/b2/docs/s3_compatible_api.html
"object-level ACLs are not supported"
On Backblaze it is possible to make app keys with limited capabilities, including delete privilege on a specific file prefix, but then restic would need to use different keys for different operations.
And in the context of this issue, as noted earlier this is also not possible on GCS. So if restic could operate in a way that does not rely on deleting files such as locks at all, it would enable immutable backups on a wide variety of cloud backends.

@rptaylor
Copy link

rptaylor commented Jan 4, 2021

If we allow this, how many bug reports from users do we get who end up using this option in unsafe environments

Would it help if perhaps the option is instead enabled at the repository level, when you do restic init?
There could be some sort of immutability setting , and a prompt or message that warns users of the implications.

@aawsome
Copy link
Contributor

aawsome commented Jan 4, 2021

Would it help if perhaps the option is instead enabled at the repository level, when you do restic init?
There could be some sort of immutability setting , and a prompt or message that warns users of the implications.

There is almost no config saved on the repository level, so this would mean changing the repo format...

I would rather allow the --no-lock flag as depicted above. Maybe change the description to "do not lock the repository, this allows some operations on read/append-only repositories". Also we could change the sensible commands like prune, forget and rebuild-index to not simply ignore the flag but to abort if the flag is set. Then users running all commands with this flag would just loose the possibility to prune, which they anyway would not want for append-only repositories.

An interesting fact is that restic tag respects the --no-lock flag but does delete files (and maybe should not be run parallel to some of the other commands...)

But in fact it's not a question about the implementation which is easy, but about directions from the maintainers if restic should be able to solve this problem and if in which way.

@HiddenRambler
Copy link

HiddenRambler commented Oct 7, 2021

Could we use the approach of manually creating a file in the lock folder with a special name suck as WRITEONLY.LOCK. Restic could check for the existence of such a file prior to creating any locks, and if a file with such a name exists, treats the repository as write only, allowing backup and restore but not prune or any other operation that would require deletions.

This in practice would remove the prune flag for such a repository, until the special lock file is manually removed at which point restic would use the normal locking mechanism for any operations.

@MichaelEischer
Copy link
Member

Wouldn't it be possible on GCS to let restic just hide files and cleanup old ones with a lifecycle policy as e.g. on B2 #2134 ?

@aawsome
Copy link
Contributor

aawsome commented Oct 31, 2022

FWIW, rustic implements a completely lock-free pruning. As a consequence, there are no lock files needed and thus not created/deleted by rustic. This makes this problem vanish: If you don't have the permissions to delete files, you can still backup to a repository (this only adds new pack/index/snapshot files).
Removing snapshots or pruning, however, would simply not work in this environment.

@MichaelEischer
Copy link
Member

@aawsome What happens if two prune tasks run in parallel? Are they going to delete not indexed files created by the other prune instance? And without a session concept (aka. something similar to lock files), pruning will have to rely on guesswork whether there are long-running backup tasks etc. that are still in progress.

@aawsome
Copy link
Contributor

aawsome commented Oct 31, 2022

@aawsome What happens if two prune tasks run in parallel? Are they going to delete not indexed files created by the other prune instance?

No. The prune is done two-phase, i.e. packs are marked for removal but still exist (even in the index where there is an extra section for those marked pack files.) So, unindexed files are marked for removal but only removed (by a future prune run) if they are marked long enough and are still not needed after that time.

And without a session concept (aka. something similar to lock files), pruning will have to rely on guesswork whether there are long-running backup tasks etc. that are still in progress.

This is true. The "long enough" which I wrote before must be long enough such that all parallel runs creating or relying on existing files are finished. rustic allows to define a user-provided time interval and defaults to 23h.
It's a pity that restics lock files contain a time when the lock was created but not the time when the running process started. Else this time could be used to allow safe parallel pruning - of course only for repos with delete permission.

@dfyz011
Copy link

dfyz011 commented Aug 9, 2023

any plans for this feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests