Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omit snapshot creation if there was no change #662

Open
jayme-github opened this issue Nov 7, 2016 · 57 comments · May be fixed by #4816
Open

Omit snapshot creation if there was no change #662

jayme-github opened this issue Nov 7, 2016 · 57 comments · May be fixed by #4816

Comments

@jayme-github
Copy link

Feature request/discussion about implementing a switch that omits snapshot creation if there was no change in metadata, and data.

From IRC:

is there a way to omit snapshot creation if there was no change at all?
(I have a large dataset that does not chnange very often, like once a month, but I would like restic to run at least once a day)
jayme: no, that's currently not possible with restic alone. every run of 'restic backup' will create a snapshot
jayme: but you can easily script that: use 'find' to find files that have been modified since the last backup, if there are any run 'restic backup', otherwise do nothing
fd0: thanks for your response. Do you think that feature is worth an issue? Or do you want that to stay out of restic?
jayme: it wouldn't take much code to add this... I'm not sure if it's worth it though
jayme: if you create an issue in the GitHub issue tracker, we can discuss it (and people can find it)
jayme: we'd need to talk about what a "change" is for you
jayme: only content? or metadata+content?
jayme: what about a file that has the same content as before, but was moved and has a new inode?
fd0: with "change" I ment "anything worh mentioning" e.g. move, metadata, content
just want to avoid creating "empty" snapshots as thats probably a waste of space/time

@Crest
Copy link

Crest commented Nov 7, 2016

Maybe it would be a good idea to still create a kind of alias name.

@fd0
Copy link
Member

fd0 commented Nov 7, 2016

We'll need to define what "no change" means: "No files were added/removed and no files have different content" and/or "no metadata and no content has changed".

@fd0
Copy link
Member

fd0 commented Nov 7, 2016

As discussed on IRC: Not making a new snapshot may interfere with the forget policy...

@fd0
Copy link
Member

fd0 commented Nov 7, 2016

I'm curious though: Why do you need this functionality? What's your use case?

@jayme-github
Copy link
Author

I'm curious though: Why do you need this functionality? What's your use case?

I just felt as it is unnecessary to clutter up the repo with snapshots that aren't of any use to me.
My use case is a set of files that I want to backup like twice a day but they don't change often (once a month, even less than that maybe). That would leave me with ~59 "empty" snapshots a month in my repo probably slowing down operations (as it is a remote repository with high round-trip-times). I could run forget & prune regularly but that would cost round trips as well as API calls etc.

All in all this is of cause a "nice to have" as there are plenty of ways to work around this (or better: to correctly use restic 😃). I just wanted to bring that up as I thought about it and so might be others.

@fd0
Copy link
Member

fd0 commented Nov 8, 2016

Thanks for the explanation. What I had in mind when building restic and the repository structure was that a "snapshot" captures the state of the data at one point in time. If the data hasn't changed at all compared to any previous snapshot, an additional snapshot is very cheap and only uses a few hundred bytes and one additional file in the repository. You are right that more snapshots may slow operations down a bit, especially for high-latency remote backends, but I'm convinced this effect is negligible. If it isn't we can certainly optimize it (compare #523), but then I'd like to measure/benchmark first to get hard data :)

I'll close this issue for now, you can still add comments (and we can easily reopen it later).

@fd0 fd0 closed this as completed Nov 8, 2016
@ignus2
Copy link

ignus2 commented Oct 10, 2017

Hi, first time restic user here, trying it out. It looks great so far.
However, I was quite surprised restic creates empty snapshots if nothing changed and moreover that there is no flag to skip creation if there was no change.
As a first time user, I expected this (ie. skip empty snapshots) to be the default behaviour, or at the very least have an option for it. Creating empty snapshots are counterintuitive to me, I don't really see the purpose of them (again: first time user, this is my first instinctive reaction).

Reading the IRC chat log it seems it wouldn't be much effort to add this. Could this be added as a flag to backup, so users could at least have a choice?

@fawick
Copy link
Member

fawick commented Oct 10, 2017

Although I'd never use such an option myself I'd like to chime in on the question what 'no change' should mean.

@fd0 stated

"No files were added/removed and no files have different content" and/or "no metadata and no content has changed".

IMHO the only valid choice here would be the "AND" choice:

  • No nodes (dirs, files, symlinks, devices, special devices etc.) were added or removed, AND
  • No files have different content, AND
  • No metadata (permissions, owner, group, ctime, mtime etc.) changed

On the first glance I'd thought there was some redundancy to this, as usually any content change would also induce a change of mtime. But on second thought there are always tools to set an ctime/mtime explictely so neither checking only the contents alone, nor checking only the metadata alone is enough.
I am not 100% aware about atime semantics but by extrapolation I'd say the same thing should apply, so care must be taken to restore the atime after restic has read a file for checking its contents.

I believe there are some issues that ask for stats collection during a backup (e.g. #693, #874). I'd guess that the stats collection code needed to them would be useful here, too.

@fd0
Copy link
Member

fd0 commented Oct 11, 2017

@ignus2 thank you for describing your expectations and your reaction, that's very valuable for us as a project!

Restic snapshots can be compared more to "virtual machine snaphots" or "lvm/zfs file system snapshots" than e.g. a tar file of what has changed. If nothing has changed, a snapshot is still created to record "this was the current stat" at a particular point in time. Maybe we should add that to the manual.

@ignus2
Copy link

ignus2 commented Oct 11, 2017

So would it be possible to add the flag to skip creating a snapshot if nothing changed?

@fd0
Copy link
Member

fd0 commented Oct 11, 2017

It would be possible to add this, but I don't think we'll add it: It's just not the way how restic works and will cause problems when you use the forget command.

@ignus2
Copy link

ignus2 commented Oct 11, 2017

It wouldn't change the way restic works by default, as it would be an optional flag. What kind of problems would it cause with the forget command btw?

@mlbarrow
Copy link

mlbarrow commented Oct 11, 2017 via email

@fd0
Copy link
Member

fd0 commented Oct 11, 2017

The idea behind the forget command (as explained e.g. in this blog entry) is that you specify a policy for snapshots that you'd like to retain. If you only have snapshots for when data has changed, specifying e.g. --keep-daily does not make sense any more.

There's really no such thing as an "empty" snapshot in restic. Each snapshot captures the data and metadata at a given point in time and is independent (concerning the data structures) from all the other snapshots.

Btw, if you really like to do that, you could use restic snapshots --json, then take the snapshot IDs, use restic cat snapshot <id> for each and drop the ones where the tree IDs haven't changed. That'd amount to roughly removing "empty" snapshots.

@mlbarrow
Copy link

mlbarrow commented Oct 11, 2017 via email

@fd0 fd0 added type: feature suggestion suggesting a new feature and removed feature labels Oct 11, 2017
@fd0
Copy link
Member

fd0 commented Oct 11, 2017

I'm going to reopen this issue.

@fd0 fd0 reopened this Oct 11, 2017
@ignus2
Copy link

ignus2 commented Oct 11, 2017

What is "unusual" is a matter of opinion I believe, for me having "empty" (by the definition of "having little to no value based on the criteria of no files and their metadata changing") snapshots is unusual.

Regarding the forget policy, I don't see how it would interfere. For example running restic backup occasionally (possibly depending on other means to determine whether something changed and a backup needs to be made or not) would have the same effect as skipping creating a snapshot, in which case forget also wouldn't make sense as you write.

I'd like to emphasize again, that it would be an optional feature for those who would like to use restic in a slightly different manner, who perhaps would never use the keep-daily etc forget features at all.

Thanks for mentioning a workaround btw.

@fd0
Copy link
Member

fd0 commented Oct 11, 2017

I'm curious: Does other backup software typically have such an option?

@ignus2
Copy link

ignus2 commented Oct 11, 2017

I'm curious: Does other backup software typically have such an option?

I don't know, but if so, then restic should also, if not, restic could be unique in this regard ;)

BTW, I seem to understand the resistance to this feature, as restic puts emphasis on the "when" or "time" of the backup (also indicated by the way forget works, centered around time), hence a snapshot in time. While the use case I have in mind (and maybe the OP too) is more emphasis on the "changes" with the additional side-information of "when".

EDIT: Something like git, or is thinking about restic like the way git works (from the end user perspective) is a totally bad idea?

@zcalusic
Copy link
Member

IMHO, this kind of "cleverness" has no place in backup software. If I ask backup software to do the backup, I'd like it to not play games behind my back, have opinions of it's own, nothing's changed etc... So, later tomorrow I scratch my head looking for last nights backup, which isn't there?!

Let's keep thing simple, if there's nothing new to backup, well.. then don't backup! Decide outside backup software, then dispatch backup or not. Too clever backup software would be unreliable backup sofware. I'd like it to be reliable, not too clever, if at all possible.

@mlbarrow
Copy link

mlbarrow commented Oct 11, 2017 via email

@ignus2
Copy link

ignus2 commented Oct 11, 2017

@zcalusic This "cleverness" would be hidden behind an optional command line switch, so only those users would be affected by it who specifically ask for it.

@AndrewSav
Copy link

AndrewSav commented Oct 10, 2020

Actually I have an alternate suggestion for the original post, which would satisfy me, and could be more appealing for the devs voicing their concerns: instead of adding the option to not create a snapshot, add a forget policy to find and remove all identical snapshots except the latest, in each set of identical snapshots (if this can be done in a performant manner). I do not mind scheduling an additional forget command after backup to remove that last snapshot if it was duplicate. This could be the best of both worlds.

@darkdragon-001
Copy link
Contributor

Since space requirements should be pretty low because of deduplication, it would help me a lot already if identical snapshots are detected and an additional property/tag is saved along. This way one could filter the list of snapshots to hide duplicates while keeping them for reference (providing information that the backup was still performed correctly and there is no issue with the scheduling/scripts etc.).

@jradxl
Copy link

jradxl commented Apr 11, 2021

I am a new user to Restic and this "Omit snapshot creation if there was no change" was one of my first questions as I start to learn how restic works, so glad google found me this issue.

What is the current situation?

Is there an easy way I can code this feature in any script I write?

  • detect no new, no changed, etc... and immediately delete/prune the snapshot?
  • detect no new, no changed, etc... and abort the creation of the snapshot?

@dimejo
Copy link
Contributor

dimejo commented Apr 11, 2021

This discussion in the forum might have an answer for you:
https://forum.restic.net/t/efficient-check-for-unchanged-source/3729

@Gaibhne
Copy link

Gaibhne commented Dec 1, 2021

It's a shame that this basic functionality has no movement, especially considering that some prune options seem to barely make sense without it - what good is --keep-last 60, if it's sixty identical empty snapshots instead of the last 60 changes to a rarely changed file ? Because that is my use case, and what I would think a lot of people (especially coming from version control software) would expect - if you keep the last X snapshots of a file, one would think that means keeping the last X versions/changes.

The discussion forum solution involves installing additional third party software and learning a new DSL (jq), which seems to be a pretty steep hill for a functionality that I would think would be relatively trivial to implement in backup itself, and is in fact the default behaviour of many if not all other versioning tools (and I understand Restic is not exactly a versioning tool, but it's got a considerable overlap).

Years ago @fd0 asked for no work to be done on this feature due to some reworking of archiver code, and surely that work is either done by now or postponed ? It seems from @ignus2's commit that only an extremely minimal change would be required to support this (optional !) feature that plenty of people seem to be interested in, so I would ask for a renewed assessment whether we can't please have it.

@MichaelEischer
Copy link
Member

The usage problems with forget can probably avoided by now by using --keep-within-hourly etc.

@Folling
Copy link

Folling commented Jan 28, 2023

This feature would help reduce clutter in a system where many backups are required for various different reasons. Append-only backups are probably the biggest player here.

To address some of the arguments against the proposed patch, and my rebuttals for them:

  1. "no other software does this, it's uncommon"
    This one seems fallacious to me. Innovation requires someone going out of their way to be the first to provide something afterall. But let's assume what was meant here is "it could be offputting to users to have an unexpected behaviour". All solutions proposed in this thread (either providing a new policy to forget or a new optional flag to backup) would be opt-in and for users to discover in the documentation.

  2. "it would break the forget semantics in some specific usecases"
    Half the wiki pages are already plastered with this works *but*. Caveats are expected for modern software and users that opt-in to a specific and niche behaviour can be expected to understand the consequences of their actions, assuming a notice is placed in the documentation.

  3. "restic should be a dull backup tool without any cleverness"
    The cleverness described here is the diffing of files and directories to find out what changes occurred. And I get that, this is quite a lot of effort that doesn't need to be part of a backup system. But it already is for restic.
    The changes are already computed for various other features, and this one should/could boil down to

if flags.NoEmptySnapshots && len(changes) == 0 {
    return status.OkButNothingChanged;
}

which really doesn't add any cleverness.

  1. It's a maintenance effort
    I'm not involved in this project so I can't know if it really would have so many implications I'm missing. At least from my standpoint, the above code should be pretty much it which seems to be supported by the 8-line commit that was provided by @ignus2. For you of course the matter boils down to "do we see enough people having this requirement and enough benefit being provided that we want to take up this effort", which I respect given that it's your free time you're spending on this project.
    However, let me explain why I believe that it is paramount to provide features that decide on whether or not a backup is being taken should be part of the underlying backup system if it is required by any significant number of users:

I could just go ahead and use restic-diff, a checksum, restic cat or other such features to build this functionality myself. But that would always depend on some layout of a response that could change with every release. Restic isn't an enterprise project that guarantees the layout of the response of every command for years to come, so I as a user would run into the risk of breakage if the backwards compatibility is ever not upheld.

In the end, it's your decision, I'm not maintaining this project and sadly do not have time to delve into yet another one, but I and others would definitely appreciate if this was added.

@kcandrews
Copy link

kcandrews commented Feb 8, 2023

In this thread I read a proposal for restic gaining a new forgetting policy which keeps just the latest snapshots from series of unchanged snapshots. The feedback was that snapshots are cheap to store and provide additional helpful information: that a snapshot was made. Instead of forgetting, could restic instead provide a convenient way to query a list of all snapshots which included changes from their parents? Then, users could use e.g. the standard shell tool comm to "set-difference" this list from all of their existing snapshots to decide which snapshots to forget by explicitly specifying snapshot ids. Then, there would be no need to make lots of custom forgetting policies which clutter up the feature set and confuse casual users. This seems like something which could be provided as arguments to existing commands like restic snapshots. Maybe restic snapshots changed? This would have the advantage of looking a lot like restic snapshots latest.

@AndrewSav
Copy link

AndrewSav commented Feb 9, 2023

the standard shell tool comm

Are we singling out a particular operating system here? remember, restic is supported on multiple operating systems, unless this is something that can work across the board, I do not think it's the solution.

@patrickdavey
Copy link

Just to chime in here, I'm looking to migrate from https://www.arqbackup.com/ , and I have the same use case as previous commenters (files which don't change often, I only want to see backups when there is a change).

Obviously this is open source and we're free to use work-arounds, but, I thought I'd add a 👍 for the use case.

@bfcns
Copy link

bfcns commented Nov 19, 2023

I am very interested to see such option. I don't know if it is mentioned, but would be nice to see when a change really happened, for some specific cases, just like a git commit.

@sergeevabc
Copy link

Due to the fact that so anticipated option not to create a snapshot 'if nothing has changed' never appeared in Restic despite 7 years of discussion, this dismal state of affairs has now migrated to a more advanced Rustic. Argh!

Is there at least some magic command that could delete snapshots that are not different from the previous ones? forget --prune does not help as it is related to the dates of snapshots creation (keep-daily/weekly/etc), not to the presence of changes.

@mpr1255
Copy link

mpr1255 commented Jan 10, 2024

Heh, am setting up another restic, wanted this functionality, came looking, found this thread. Amusing. My use case is that I want to basically 'time machine' a particular folder with a cron running every minute. Sometimes I screw up and accidentally delete files. I want insurance against that kind of footgun. Time Machine is not fast enough for my purposes, Arq is a resource hog, and I already use restic for stuff so.... I'll just be storing 20 identical snapshots for the last 20 minutes!

It's free software so who am I to complain?

@sergeevabc
Copy link

sergeevabc commented Jan 10, 2024

@mpr1255, free software is paid for by other means. For example, a catchy line in the resume, a sense of belonging to the development of civilization, a sense of superiority over greedy capitalists and clumsy corporations, the attention of the opposite sex, etc. The lack of direct financial reward also does not override the courtesy: imagine that a neighbor announces himself as a musician playing for free at children's parties, you invite him to your house, he plays great, but at the same time he spits on the floor and stubbornly ignores requests to adapt. @fd0 has been ignoring us, this issue…

@rawtaz
Copy link
Contributor

rawtaz commented Jan 10, 2024

@sergeevabc That is extremely out of line. You literally have no idea about what goes on in @fd0's life or how he spends his time. I do, and I can tell you with 100% certainty that he has not been ignoring you or anyone else. You may not be happy with the state of this issue, and that's entirely in your right to be, but let's keep communication civil and not write things which are totally uncalled for. Let future communication here be on topic and not personal reflection or attacks. Thanks.

@sergeevabc
Copy link

sergeevabc commented Jan 10, 2024

@rawtaz, it's called sugar-free feedback — an inevitable consequence of participation in public life. The feelings of the people who put their trust in your project matter, too. The list of feelings is not limited to gratitude. Talking about that is not beyond the bounds of decency. If you keep people without certainty for many months and even years, then the frustration begins to grow. This casts a shadow on the entire enterprise and undermines trust. If you are such a caring friend, then remind the author that he is not in a coma, like his compatriot, the famous racing driver Michael Schumacher, and is able to squeeze out a couple of meaningful lines about this issue.

@mpr1255
Copy link

mpr1255 commented Jan 10, 2024 via email

@AndrewSav
Copy link

I do not support any kind of personal attacks

learn the codebase, and make your own PR!

To make a PR one needs to put in a considerable effort, which is completely unjustified, when there is a good chance of your PR being summarily closed, or simply left unattended. If you do not agree in principle on the content of your PR beforehand there is a good chance for that.

The worst case, is that you are asked to do a bunch of changes, you do them, and then the reviewer disappears for two years. (Note: I think it's fine to disappear for two years, there are always personal circumstances, what I'm trying to say, is that since this is a real possibility, it should be accounted for by someone putting forward a PR.)

It is much better to discuss a proposed PR here first, get an agreement in principle, and then proceed with the PR. But this unfortunately is not happening either.

@mpr1255
Copy link

mpr1255 commented Jan 11, 2024

Anyway -- I find that it doesn't matter for my use case. I didn't want all the annoying blank entries... but just now I used it in EXACTLY the circumstance I thought I would (thought I accidentally rm rfed a folder I shouldn't have -- whoops) and I opened up this other amazing piece of (FREE) software https://github.com/emuell/restic-browser, looked at the time a minute ago and viola, solved my problem. So yeah, it's kind of annoying but I think that is just OCD kicking in. I'm sure there are more important pieces of functionality the devs could be working on.

@flipbit03
Copy link

flipbit03 commented Jan 29, 2024

I would be very interested in having a flag on restic backup like --only-if-changed that only creates the snapshot if there were any actual modifications to the folder being backed up.

The above would be useful for me as the snapshot list would give me a clear signal of 'when' changes to my folder happened, as supposed to an "everyday is a new snapshot regardless of changes or not" situation.

@MichaelEischer
Copy link
Member

Before the discussion here spirals again out of control, let's do the following:

Add an option --skip-if-unchanged that does not create a new snapshot if the root tree is identical to that of the parent snapshot. This will have to be combined with forget --keep-within-* options, but that is a problem that can be solved by a bit of documentation. An additional caveat is that backup using absolute paths will also include the directories from the filesystem root to the backed up folders. Changes to these folder can also cause the snapshots to differ. To avoid that one has to use cd /path/to/backup && restic backup --skip-if-unchanged ..

The proof-of-concept code at ignus2@7ebee9d is only usable to show the idea, but has to be reimplemented from scratch (e.g. there is no longer a need for the archiver to load a snapshot).

@MichaelEischer MichaelEischer added help: wanted and removed state: need direction need key decisions or input from core developers labels Jan 31, 2024
@Folling
Copy link

Folling commented Jan 31, 2024

FWIW and for anyone interested, I've since written a small ruby script that suites my needs: https://gist.github.com/Folling/7c9c35588becd69c7c6fa8d9c880e837

I've adjusted this without testing as the original is a bit more complex with some error-handling and notification logic, use at your own risk or as a base for customization.

@Skillabstinenz
Copy link

Skillabstinenz commented May 8, 2024

I wanted to switch to restic because of the possibilities deduplication and --keep... give. But --keep-last x doesn't keep the last x, which could go back years if the folder isnt changed alot and thanks to dedup not costing much backup space. It becomes keep-last x * backup interval... how does that differ from --keep-daily x if I backup every day? So the daily "empty" backups give a false sense of security: Lets say I keep the last 60 backups and realise that I made an error a long time ago. Going back 60 CHANGES should have me covered easily. But the first 30 "changes" are just getting my hopes up to shatter them directly again, since they are "empty" no change backups. Then finally the 31. backup has a real change, but it's too recent. Before I can waste my time checking another 30 no change backups, I ran out of backups... The erroneous change being in the 61. backup, which was already purged because of all the "empties".

This is supposed to be the intended behaviour? Since everybody is using cron or something similar, why have --keep-last x if it's just --keep-daily x in disguise? Users want to revert back to a specific day instead of a specific change?

Maybe the solution is not 'not creating a backup', but having something along the lines of forget --forget-empty-first or forget --ignore-duplicates. Where --forget-empty-first forgets the "empty" snapshots first and then applies the limits from --keep... and forgets them too. Whereas --ignore-duplicates ignores the "empty" snapshots in the --keep... calculation and just keeps them, since they don't take up much space.

This could even be build upon with a couple of flags to specify what exactly is meant by not changed --forget-empty-first=abdef, replacing some sensible default.

@MichaelEischer MichaelEischer linked a pull request May 22, 2024 that will close this issue
8 tasks
@MichaelEischer
Copy link
Member

I've implemented the option in #4816.

So the daily "empty" backups give a false sense of security: Lets say I keep the last 60 backups and realise that I made an error a long time ago.

The usual approach is to have a limited amount of daily snapshots combined with a few dozen weekly / monthly snapshots and a few yearly ones. But I guess it depends a lot on the use case. Backups of a folder that rarely changes might benefit from only creating snapshots if something actually changed.

Maybe the solution is not 'not creating a backup', but having something along the lines of forget --forget-empty-first or forget --ignore-duplicates.

A backup --skip-if-unchanged option seems like it could be easier to understand. In particular, there's immediate feedback when using that option. A forget-based approach would add an indirection. All current forget policies also specify what to keep, not what to remove. That makes it harder to understand how those options would interact with each other.

@flipbit03
Copy link

A backup --skip-if-unchanged option seems like it could be easier to understand. In particular, there's immediate feedback when using that option.

Fully agree. Simply not creating a new backup, if nothing changed, would set all those discussions on semantics and change filtering moot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.