More stats in snapshots list #874

mnesrine · 2017-03-10T19:56:16Z

It would be useful to have a more stats that are not currently in snapshots list like:

Backup size
Backup duration or end date

Thanks :)

middelink · 2017-03-10T20:32:30Z

Would that be the size read from the local system or or written to the repository? Due to dedupping those two numbers will not be remotely the same. And what does that latter number mean over time anyway? Say you make 2 backups in succession, the first backup backuped 100GB and stores 102GB in the repository. The second backup reads say 1GB locally, but backups only ~50MB. Great numbers now show in your snapshot. But then you "forget" the first backup, and the numbers in the 2nd backup snapshot still show 50MB... You will file a bug complaining restic is lying to you ^^

And in case you want to suggest that prune has to update the stats in the snapshot data, I need to point out that snapshot ids are based of the sha256 of their data, so any change in the snapshot data will make a new snapshot id, adding to confusion.

None the less, if we do add more stats, I would like to see files/dirs added and maybe aggregated cpu cycles used, peak memory used, blobs "created" and blobs "dropped". The later two are important to see how good a "fit" the parent snapshot was. With a bad fit, restic will go thru all the motions to create a blob, aes encrypt it, only to drop it in the end because it's id is already in the index.

We could hide all this extra information behind a -l flag ^^

mnesrine · 2017-03-10T21:30:08Z

Indeed that's a good point, by size i mean a size of 'file/dir' from local system not in remotely.
to make stats about a backup in remotely, we use another way not snaphsots list.

yhafri · 2017-03-11T08:30:22Z

@middelink interesting observations +1

@mnesrine's idea to add more metdata to snapshots command can be very useful.
No need to have a complicated solution here, just the bare minimum for now.

We can think of it as something complementary to the --tag option.
Imagine being able to retrieve:

the original size of the backup (before deduplication): this info can easily be calculated when restic's comparing local blocks with the server's ones
the size of the backup after deduplication: can help compute how much disk saving we've got. Still optional for me, IMHO
the local date when the backup was successufully stored: see no challenge here
the type of the backup: was it a file or a directory?

And now that we've the --json option in place when listing the snapshots, these metadata can be retrieved and parsed in a nice way (thanks to jq).

@middelink I see no conflict when a prune is performed, as nothing has to be updated. These are archive's metadata. They've to always reflect what was true for a backup when it was stored in the first place.

We could hide all this extra information behind a -l flag ^^

+1 for the --l. I just want to get the minimum list of metadata for now (only 3, see above).

fd0 · 2017-03-11T09:47:19Z

I like the idea of storing some statistics with the snapshot. Easy things:

Size of the data before deduplication
Backup duration
Number of new files/directories
Number of changed files

A bit more computationally expensive, but not too hard to do:

Size of the data after deduplication

@yhafri I have some questions:

I don't understand what you mean with "the local date when the backup was successfully stored". Each snapshot already has a time stamp, what's the difference here? Do you mean the finish time for the backup?
What's "the type of the backup"? We already store the list of things to be saved (what's passed to restic on the command line or via --files-from), so if I have two dirs and one file in there, what type of backup should that be? What's the type of backup for reading from stdin? Why is this information relevant, what do you plan to do with it? Restic can just look at the nodes in the repo to determine if it is a file or directory for each of the backup targets (just run restic ls on the snapshot), so what's the point in having the information again in the snapshot?

FWIW I think all this information shouldn't be part of the plain text snapshots output. It's okay to have it in the JSON output (as users can filter that with jq or whatever), and I've already thought of adding a new command that displays details for a particular snapshot, similar to what git show <commit> does for a commit.

yhafri · 2017-03-11T10:11:21Z

I don't understand what you mean with "the local date when the backup was successfully stored". Each snapshot already has a time stamp, what's the difference here? Do you mean the finish time for the backup?

Yes, the finish time.

What's "the type of the backup"? We already store the list of things to be saved (what's passed to restic on the command line or via --files-from), so if I have two dirs and one file in there, what type of backup should that be? What's the type of backup for reading from stdin? Why is this information relevant, what do you plan to do with it? Restic can just look at the nodes in the repo to determine if it is a file or directory for each of the backup targets (just run restic ls on the snapshot), so what's the point in having the information again in the snapshot?

I didn't thought about this use case.
In general, we don't mix things up when using restic. We only backup one file or dir at a time.
Thus, it's easy to know if it's a file or a directory.

FWIW I think all this information shouldn't be part of the plain text snapshots output. It's okay to have it in the JSON output (as users can filter that with jq or whatever), and I've already thought of adding a new command that displays details for a particular snapshot, similar to what git show does for a commit.

Agreed. It's fine to only have them when using --json option with snapshots

middelink · 2017-03-11T14:20:15Z

@fd0 size after dedup?
Did you read my remarks on that? I don't think it is wise to add that to the snapshot information as it needs to be maintained.

Also, what about new blobs vs new-blobs-but-already-there? Or cpu/peak memory impact? I want to be able to observe how much resources taking a backup consumes. (Rationale: currently restic ooms on all my 512MB VMs and about 1/3 on 1GB VMs. Which is kinda ridiculous. So when we start driving memory usage down, I want to be able to tell... The more statistics we have for diagnosing issues, the easier it becomes.)

yhafri · 2017-03-11T16:02:26Z

So fine to not add size after dedup then.

fd0 · 2017-03-11T20:58:47Z

@middelink After re-reading your post, I think we're talking about different things: What I meant with "size before deduplication" and "size after deduplication" is the intra-snapshot deduplication. And this neither depends on what is already stored in the repo nor does it change over time.

Suppose you're saving two files which contain exactly the same 1 MiB of data, which is saved in one blob in the repo. So "size before deduplication" is 2MiB (sum of all file sizes) and "size after deduplication" is 1MiB. Both numbers do not depend on whether or not the blob is already stored in the repo and will be valid as long as the snapshot is there.

I think what you were writing about is inter-snapshot deduplication: How many new blobs have been added to the repo which were not there before. This number is valid only at the time the snapshot is made, and changes over time (e.g. when an older snapshot that shares some blobs with a newer snapshot is removed). I agree that this is not reasonable to store this number.

We could rather compute the "added size" of a snapshot on the fly (ok, maybe we should wait for the metadata cache, otherwise it gets really time consuming): Make a list of all blobs that are only referenced by a particular snapshot, and sum the sizes.

tamalsaha · 2017-03-12T01:48:31Z

@fd0, we would like to see these stats also available with --json

yhafri · 2017-03-12T08:26:43Z

Adding more stats is very good idea indeed +1
But let start with the minimum/basic stats as discussed above guys.

the original size of the backup (before dedup) @mnesrine
finish time of the backup @mnesrine @fd0
the type of the backup: was it a file or a directory?
number of files/dirs in the backup (before dedup) @middelink

From there, we can decide which advanced stats we would like to add.

tamalsaha · 2017-03-14T20:27:17Z

By " finish time of the backup", do you mean duration of the backup?

yhafri · 2017-03-14T20:38:50Z

Yes, the backup duration time (ex. 1h 40mn 15 sec) or the finish time (ex. 2017-03-14 07:34:09).
From one we can deduce the other

middelink · 2017-03-14T21:05:44Z

I very much doubt that backup "type" is a useful item to add for the general population. It seems like a highly specific use-case.

Can you give a clearer definition what you mean by "file"? Just a single file in a backup? Can there be multiple files?
Also, given restic backup -x /home/user/file1, this backup actually has directories, namely /, home and user. So how would you classify it?

(Not trying to be pedantic, but this use-case is so specific that I would like some clarification...)

yhafri · 2017-03-15T04:30:14Z

@middelink as i've explained before, we never use restic with multiple target dirs at a time. Only one.

But you're right, this is a very specific use case and we can live without it.

lathspell · 2017-12-05T21:37:23Z

Adding the dedup'ed size to the snapshot meta data would

help "checking" if the backup worked as intended
help estimation how much longer an e.g. external USB drive lasts if the daily changes are roughly constant (and no old snapshots are removed)
be easy to calculate (if the hash is already present, just add "0" else the filesize)

Of course if the user removes an old snapshot that size information looses its validity but I'm sure your users will understand that.

mholt · 2018-04-20T02:36:29Z

I've got an initial implementation of a restic stats command up in #1729. I need people to test it and see if the counts are accurate.

It could probably be expanded to count more things, but I'm starting simple.

darkdragon-001 · 2020-05-30T16:06:33Z

A lot of information is available by running restic diff against parent in (restic snapshots ID --json). I suggested in #2757 improving restic stats introduced by @mholt. There should be an extensive list, but some sort of overview for restic snapshots like the number of changes would be awesome in order to easily find snapshots where a lot of changes were introduced (and maybe something went wrong).

aawsome · 2022-11-04T06:30:15Z

see also #693
Note, that the solution posted there allows to optionally store the summary information. This allows to add this feature in a backwards-compatible way (as I did in rustic)

fd0 added the type: feature enhancement improving existing features label Mar 11, 2017

middelink mentioned this issue Jun 27, 2017

Information command #1047

Closed

flamingm0e mentioned this issue Aug 30, 2017

[feature-request] Check size of repo in snapshots command #1197

Closed

fawick mentioned this issue Oct 10, 2017

Omit snapshot creation if there was no change #662

Open

mholt mentioned this issue Apr 20, 2018

Implement restic stats command to get more info about a repository #1729

Merged

7 tasks

MichaelEischer added the category: stats label Oct 6, 2020

MichaelEischer mentioned this issue Nov 4, 2022

backup: add flag --summary-filename #3586

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More stats in snapshots list #874

More stats in snapshots list #874

mnesrine commented Mar 10, 2017

middelink commented Mar 10, 2017 •

edited

mnesrine commented Mar 10, 2017

yhafri commented Mar 11, 2017 •

edited

fd0 commented Mar 11, 2017 •

edited

yhafri commented Mar 11, 2017

middelink commented Mar 11, 2017 •

edited by fd0

yhafri commented Mar 11, 2017

fd0 commented Mar 11, 2017 •

edited

tamalsaha commented Mar 12, 2017

yhafri commented Mar 12, 2017 •

edited

tamalsaha commented Mar 14, 2017

yhafri commented Mar 14, 2017

middelink commented Mar 14, 2017

yhafri commented Mar 15, 2017

lathspell commented Dec 5, 2017

mholt commented Apr 20, 2018

darkdragon-001 commented May 30, 2020

aawsome commented Nov 4, 2022

More stats in snapshots list #874

More stats in snapshots list #874

Comments

mnesrine commented Mar 10, 2017

middelink commented Mar 10, 2017 • edited

mnesrine commented Mar 10, 2017

yhafri commented Mar 11, 2017 • edited

fd0 commented Mar 11, 2017 • edited

yhafri commented Mar 11, 2017

middelink commented Mar 11, 2017 • edited by fd0

yhafri commented Mar 11, 2017

fd0 commented Mar 11, 2017 • edited

tamalsaha commented Mar 12, 2017

yhafri commented Mar 12, 2017 • edited

tamalsaha commented Mar 14, 2017

yhafri commented Mar 14, 2017

middelink commented Mar 14, 2017

yhafri commented Mar 15, 2017

lathspell commented Dec 5, 2017

mholt commented Apr 20, 2018

darkdragon-001 commented May 30, 2020

aawsome commented Nov 4, 2022

middelink commented Mar 10, 2017 •

edited

yhafri commented Mar 11, 2017 •

edited

fd0 commented Mar 11, 2017 •

edited

middelink commented Mar 11, 2017 •

edited by fd0

fd0 commented Mar 11, 2017 •

edited

yhafri commented Mar 12, 2017 •

edited