New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More stats in snapshots list #874
Comments
Would that be the size read from the local system or or written to the repository? Due to dedupping those two numbers will not be remotely the same. And what does that latter number mean over time anyway? Say you make 2 backups in succession, the first backup backuped 100GB and stores 102GB in the repository. The second backup reads say 1GB locally, but backups only ~50MB. Great numbers now show in your snapshot. But then you "forget" the first backup, and the numbers in the 2nd backup snapshot still show 50MB... You will file a bug complaining restic is lying to you ^^ And in case you want to suggest that None the less, if we do add more stats, I would like to see files/dirs added and maybe aggregated cpu cycles used, peak memory used, blobs "created" and blobs "dropped". The later two are important to see how good a "fit" the parent snapshot was. With a bad fit, restic will go thru all the motions to create a blob, aes encrypt it, only to drop it in the end because it's id is already in the index. We could hide all this extra information behind a -l flag ^^ |
Indeed that's a good point, by size i mean a size of 'file/dir' from local system not in remotely. |
@middelink interesting observations +1 @mnesrine's idea to add more metdata to We can think of it as something complementary to the
And now that we've the @middelink I see no conflict when a
+1 for the |
I like the idea of storing some statistics with the snapshot. Easy things:
A bit more computationally expensive, but not too hard to do:
@yhafri I have some questions:
FWIW I think all this information shouldn't be part of the plain text |
Yes, the finish time.
I didn't thought about this use case.
Agreed. It's fine to only have them when using |
@fd0 size after dedup? Also, what about new blobs vs new-blobs-but-already-there? Or cpu/peak memory impact? I want to be able to observe how much resources taking a backup consumes. (Rationale: currently restic ooms on all my 512MB VMs and about 1/3 on 1GB VMs. Which is kinda ridiculous. So when we start driving memory usage down, I want to be able to tell... The more statistics we have for diagnosing issues, the easier it becomes.) |
So fine to not add size after dedup then. |
@middelink After re-reading your post, I think we're talking about different things: What I meant with "size before deduplication" and "size after deduplication" is the intra-snapshot deduplication. And this neither depends on what is already stored in the repo nor does it change over time. Suppose you're saving two files which contain exactly the same 1 MiB of data, which is saved in one blob in the repo. So "size before deduplication" is 2MiB (sum of all file sizes) and "size after deduplication" is 1MiB. Both numbers do not depend on whether or not the blob is already stored in the repo and will be valid as long as the snapshot is there. I think what you were writing about is inter-snapshot deduplication: How many new blobs have been added to the repo which were not there before. This number is valid only at the time the snapshot is made, and changes over time (e.g. when an older snapshot that shares some blobs with a newer snapshot is removed). I agree that this is not reasonable to store this number. We could rather compute the "added size" of a snapshot on the fly (ok, maybe we should wait for the metadata cache, otherwise it gets really time consuming): Make a list of all blobs that are only referenced by a particular snapshot, and sum the sizes. |
@fd0, we would like to see these stats also available with --json |
Adding more stats is very good idea indeed +1
From there, we can decide which advanced stats we would like to add. |
By " finish time of the backup", do you mean duration of the backup? |
Yes, the backup duration time (ex. 1h 40mn 15 sec) or the finish time (ex. 2017-03-14 07:34:09). |
I very much doubt that backup "type" is a useful item to add for the general population. It seems like a highly specific use-case. Can you give a clearer definition what you mean by "file"? Just a single file in a backup? Can there be multiple files? (Not trying to be pedantic, but this use-case is so specific that I would like some clarification...) |
@middelink as i've explained before, we never use restic with multiple target dirs at a time. Only one. But you're right, this is a very specific use case and we can live without it. |
Adding the dedup'ed size to the snapshot meta data would
Of course if the user removes an old snapshot that size information looses its validity but I'm sure your users will understand that. |
I've got an initial implementation of a It could probably be expanded to count more things, but I'm starting simple. |
A lot of information is available by running |
It would be useful to have a more stats that are not currently in snapshots list like:
Thanks :)
The text was updated successfully, but these errors were encountered: