Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store snapshot statistics & print snapshot size #4705

Merged
merged 9 commits into from Mar 28, 2024

Conversation

MichaelEischer
Copy link
Member

@MichaelEischer MichaelEischer commented Feb 22, 2024

What does this PR change? What problem does it solve?

Store the backup statistics in a snapshot. Sample output:

{
  "time": "2024-02-22T22:39:19.991058932+01:00",
  "parent": "2c29113d80ef2f4072bae3095963a9c70f9d7014e893afd74887c8b61d30f143",
  "tree": "c163608e60d5bf1921b296528cd28b9844b508d96d9f4ad2564ced61c6dd0a24",
  "paths": [
    "/home/michael/Projekte/restic/restic"
  ],
  "hostname": "host",
  "username": "michael",
  "uid": 1000,
  "gid": 1000,
  "program_version": "restic 0.16.4-dev (compiled manually)",
  "summary": {
    "backup_start": "2024-02-22T22:39:19.991058932+01:00",
    "backup_end": "2024-02-22T22:39:26.37292521+01:00",
    "files_new": 85,
    "files_changed": 155,
    "files_unmodified": 2493,
    "dirs_new": 3,
    "dirs_changed": 110,
    "dirs_unmodified": 328,
    "data_blobs": 234,
    "tree_blobs": 114,
    "data_added": 39375231,
    "data_added_in_repo": 19236421,
    "total_files_processed": 2733,
    "total_bytes_processed": 152165603
  }
}

In addition the snapshots command now also prints the snapshot size (based on total_bytes_processed).

ID        Time                 Host        Tags        Paths                                    Size
-----------------------------------------------------------------------------------------------------------
4e0ff0ef  2024-02-22 22:39:19  host                    /home/michael/Projekte/restic/restic     145.116 MiB
-----------------------------------------------------------------------------------------------------------

Was the change previously discussed in an issue or on the forum?

Fixes #693

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

@darkdragon-001
Copy link
Contributor

Does the --json output also contain the size?

@aawsome
Copy link
Contributor

aawsome commented Feb 23, 2024

@MichaelEischer Is there a reason why your implementation deviates from
#693 (comment) ?

@MichaelEischer
Copy link
Member Author

Differences to #693 (comment):

  • "data_added_in_repo" instead of "data_added_packed": the corresponding fields in ItemStats are called DataSizeInRepo and TreeSizeInRepo. Thus that name is more consistent.
  • "data_added_files", "data_added_files_packed", "data_added_trees", "data_added_trees_packed": Not available in the JSON output
  • "total_dirs_processed", "total_dirsize_processed": these statistics are not tracked so far
  • "total_duration", "backup_duration": both values feel pretty redundant to me
  • "command": I'm somewhat undecided whether it's a good idea or might store unwanted details

Although, the more precise description would be that the summary contains the fields from json.summaryOutput minus SnapshotID, DryRun, TotalDuration and MessageType, and adds BackupStart and BackupEnd.

I thought about omitting BackupStart and BackupEnd altogether and just adding TotalDuration. But as it's possible to pass --time to the backup command, the time of a snapshot and BackupStart can differ.

Does the --json output also contain the size?

Yes, it even includes the full statistics information stored in the snapshot.

@aawsome
Copy link
Contributor

aawsome commented Feb 23, 2024

Differences to #693 (comment):

* "data_added_in_repo" instead of "data_added_packed": the corresponding fields in `ItemStats` are called `DataSizeInRepo` and `TreeSizeInRepo`. Thus that name is more consistent.

If you name them like this you make the snapshot format incompatible to what rustic implemented some 2 years ago. I don't care too much if you omit the other fields, but same things should be named the same, IMO.

@hraban
Copy link

hraban commented Feb 24, 2024

Just for the record this is only meaningful until you start deleting other snapshots which share data, right?

@aawsome
Copy link
Contributor

aawsome commented Feb 24, 2024

Just for the record this is only meaningful until you start deleting other snapshots which share data, right?

No. It is as meaningful as it is saving the output of backup --json in some database. If you are trying to derive "how much size does this snapshot uniquely occupy in the repo" from the information, you are making a mistake as this is not what it states. This information cannot be derived even after a follow-up backup run (which may share data).
The information is only a "snapshot" of what happened during the backup run. Including information about the source (which could also change directly after the backup run). But for analysis reasons the information might be interesting.

@hraban
Copy link

hraban commented Feb 24, 2024

No. It is as meaningful as it is saving the output of backup --json in some database. If you are trying to derive "how much size does this snapshot uniquely occupy in the repo" from the information, you are making a mistake as this is not what it states. This information cannot be derived even after a follow-up backup run (which may share data). The information is only a "snapshot" of what happened during the backup run. Including information about the source (which could also change directly after the backup run). But for analysis reasons the information might be interesting.

Right, that's what it seemed like. It's useful in the moment, but if you have e.g. auto cleanup of older backups it quickly becomes stale. Can still be useful of course, but not the same as e.g. TimeMachine's "calculatedrift" option (which would presumably be a lot more expensive to run).

@rawtaz

This comment was marked as off-topic.

@aawsome

This comment was marked as off-topic.

@rawtaz

This comment was marked as off-topic.

@MichaelEischer
Copy link
Member Author

I've renamed the attribute to data_added_packed. The text output uses that value in %s stored which might also have served as a suffix. But that's just bikeshedding, I'm fine with either variant. So let's use the one that maintains compatibility.

In general, we're interested in a friendly coexistence with rustic. But it should also be clear that we won't always agree how things should be implemented. Nevertheless there's still value in trying to maintain compatibility where it makes sense. That judgement will still be up to us, but that doesn't mean the we do X just because rustic implemented Y.

Is there a reason why your implementation deviates from #693 (comment) ?

Let me add just one thing here: the PR was intentionally still marked as draft as the naming issue was not yet resolved.

@MichaelEischer

This comment was marked as off-topic.

@MichaelEischer MichaelEischer marked this pull request as ready for review February 25, 2024 19:53
@MichaelEischer MichaelEischer mentioned this pull request Feb 25, 2024
8 tasks
@rawtaz

This comment was marked as off-topic.

@aawsome
Copy link
Contributor

aawsome commented Feb 27, 2024

I've renamed the attribute to data_added_packed.

Thanks for changing this. I think this helps user who use both restic and rustic. And it's about our users what we are doing, right?
I will change rustic such that it can work with omitting the fields you don't fill in restic.

In general, we're interested in a friendly coexistence with rustic.

Good to hear. I am also very interested in a friendly coexistence. This is all free open source software!

@aawsome

This comment was marked as off-topic.

@MichaelEischer

This comment was marked as off-topic.

Copy link
Member Author

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@MichaelEischer MichaelEischer merged commit 7f9ad1c into restic:master Mar 28, 2024
13 checks passed
@MichaelEischer MichaelEischer deleted the snapshot-statistics branch March 28, 2024 21:41
created using this or a future restic version. For this, the `backup` command
now stores the backup summary statistics in the snapshot.

The text output of the `snapshots` command only shows the snapshot size. The

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly of all the discussed options is the “snapshot size” here?

Bytes added to the repo when this backup was created?

If so, fine with me, but I would still love to also see the “amount of data backed up” (what others call restore size) for a quick double-check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is total_bytes_processed which IMO should be the restore size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Print out the backup size when listing snapshots (enhancement)
6 participants