Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup: add flag --summary-filename #3586

Closed
wants to merge 1 commit into from

Conversation

kjetilho
Copy link
Contributor

@kjetilho kjetilho commented Nov 29, 2021

The summary statistics in JSON form will be appended to the specified
filename. This works independently of verbosity level and reporter
(json or text) chosen.

What does this PR change? What problem does it solve?

Collecting statistics from the jobs which have run is currently relatively awkward. This patch allows statistics and metadata to be collected and processed by other scripts.

It uses append mode and prepares the JSON string so it is written atomically to the file, so no locking should be required.

A simple example run:

[kjetilho@scribus ~/git/github.com/restic/restic]; ./restic backup doc --summary-filename /tmp/restic-log.json
repository fff7b667 opened successfully, password is correct
no parent snapshot found, will read all files

Files:          75 new,     0 changed,     0 unmodified
Dirs:            8 new,     0 changed,     0 unmodified
Added to the repo: 2.506 MiB

processed 75 files, 2.466 MiB in 0:00
snapshot 53fafae6 saved
: [kjetilho@scribus ~/git/github.com/restic/restic]; cat /tmp/restic-log.json
{"message_type":"summary","files_new":75,"files_changed":0,"files_unmodified":0,"dirs_new":8,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":75,"tree_blobs":9,"data_added":2627290,"total_files_processed":75,"total_bytes_processed":2585467,"total_duration":0.348978869,"snapshot_id":"53fafae6"}
: [kjetilho@scribus ~/git/github.com/restic/restic]; date > doc/new-file
: [kjetilho@scribus ~/git/github.com/restic/restic]; ./restic backup doc --summary-filename /tmp/restic-log.json
repository fff7b667 opened successfully, password is correct
using parent snapshot 53fafae6

Files:           1 new,     0 changed,    75 unmodified
Dirs:            0 new,     1 changed,     7 unmodified
Added to the repo: 16.688 KiB

processed 76 files, 2.466 MiB in 0:00
snapshot dc5ef1b8 saved
: [kjetilho@scribus ~/git/github.com/restic/restic]; cat /tmp/restic-log.json
{"message_type":"summary","files_new":75,"files_changed":0,"files_unmodified":0,"dirs_new":8,"dirs_changed":0,"dirs_unmodified":0,"data_blobs":75,"tree_blobs":9,"data_added":2627290,"total_files_processed":75,"total_bytes_processed":2585467,"total_duration":0.348978869,"snapshot_id":"53fafae6"}
{"message_type":"summary","files_new":1,"files_changed":0,"files_unmodified":75,"dirs_new":0,"dirs_changed":1,"dirs_unmodified":7,"data_blobs":1,"tree_blobs":2,"data_added":17088,"total_files_processed":76,"total_bytes_processed":2585500,"total_duration":0.286814645,"snapshot_id":"dc5ef1b8"}

If this idea is accepted, I would like to add the timestamp for when the backup job started to the metadata.

Ideally this metadata would be available from the snapshot sub-command, but it seems like a chicken and egg problem to add that information as part of the snapshot, so I gave up on that idea.

Was the change previously discussed in an issue or on the forum?

It has been discussed loosely on IRC from time to time.

Checklist

  • I have read the contribution guidelines.
  • I have enabled maintainer edits.
  • I have added tests for all code changes.
  • I have added documentation for relevant changes (in the manual).
  • There's a new file in changelog/unreleased/ that describes the changes for our users (see template).
  • I have run gofmt on the code in all commits.
  • All commit messages are formatted in the same style as the other commits in the repo.
  • I'm done! This pull request is ready for review.

The summary statistics in JSON form will be appeded to the specified
filename.  This works independently of verbosity level and reporter
(json or text) chosen.
@rawtaz
Copy link
Contributor

rawtaz commented Nov 29, 2021

Can you please include a usage example that shows how this is used and solves a problem? You can edit your initial post to add it.

@rawtaz
Copy link
Contributor

rawtaz commented Nov 30, 2021

I feel like an idiot right now, because I don't see at all what this PR does that you can't already very easily do. Here are two examples that as far as I can tell yields the same result as your usage example above (one with a silly grep which can be replaced with various other commands, and one with jq which is arguably a cleaner solution):

$ cat /tmp/restic-log.json
$ ./restic -r apa/ --json backup foo | grep '{"message_type":"summary",' >> /tmp/restic-log.json
$ cat /tmp/restic-log.json
{"message_type":"summary","files_new":0,"files_changed":0,"files_unmodified":5,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":6,"data_blobs":0,"tree_blobs":0,"data_added":0,"total_files_processed":5,"total_bytes_processed":498,"total_duration":0.319408791,"snapshot_id":"daeea9d7"}
$ date > foo/new-file
$ ./restic -r apa/ --json backup foo | grep '{"message_type":"summary",' >> /tmp/restic-log.json
$ cat /tmp/restic-log.json
{"message_type":"summary","files_new":0,"files_changed":0,"files_unmodified":5,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":6,"data_blobs":0,"tree_blobs":0,"data_added":0,"total_files_processed":5,"total_bytes_processed":498,"total_duration":0.319408791,"snapshot_id":"daeea9d7"}
{"message_type":"summary","files_new":1,"files_changed":0,"files_unmodified":5,"dirs_new":0,"dirs_changed":1,"dirs_unmodified":5,"data_blobs":1,"tree_blobs":2,"data_added":2482,"total_files_processed":6,"total_bytes_processed":527,"total_duration":0.400511429,"snapshot_id":"168ca35f"}
$ cat /tmp/restic-log.json
$ ./restic -r apa/ --json backup foo | jq -c 'select(.message_type=="summary")' >> /tmp/restic-log.json 
$ cat /tmp/restic-log.json
{"message_type":"summary","files_new":0,"files_changed":0,"files_unmodified":6,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":6,"data_blobs":0,"tree_blobs":0,"data_added":0,"total_files_processed":6,"total_bytes_processed":527,"total_duration":0.294950148,"snapshot_id":"262253fb"}
$ date > foo/new-file
$ ./restic -r apa/ --json backup foo | jq -c 'select(.message_type=="summary")' >> /tmp/restic-log.json 
$ cat /tmp/restic-log.json
{"message_type":"summary","files_new":0,"files_changed":0,"files_unmodified":6,"dirs_new":0,"dirs_changed":0,"dirs_unmodified":6,"data_blobs":0,"tree_blobs":0,"data_added":0,"total_files_processed":6,"total_bytes_processed":527,"total_duration":0.294950148,"snapshot_id":"262253fb"}
{"message_type":"summary","files_new":0,"files_changed":1,"files_unmodified":5,"dirs_new":0,"dirs_changed":1,"dirs_unmodified":5,"data_blobs":1,"tree_blobs":2,"data_added":2485,"total_files_processed":6,"total_bytes_processed":527,"total_duration":0.375253515,"snapshot_id":"d67c508f"}

So why should we complicate restic with another option/flag for such a specific use, when this is super simple to do with existing tooling? I just don't get it. Restic already outputs summary information in a structured format - how you use or store that information is something that is best kept as a separate concern IMO (similar to UNIX philosophy).

@kjetilho
Copy link
Contributor Author

Yes, but then you don't get any progress indication. You can have more events added to the JSON log, but then it becomes extremely verbose and it is not very human friendly. The built-in terminal based progress indicator in Restic is very nice, and it would be a shame to have to re-implement it in an external script which parses the stream of JSON events.

@kjetilho
Copy link
Contributor Author

Ideally this metadata would be available from the snapshot sub-command, but it seems like a chicken and egg problem to add that information as part of the snapshot, so I gave up on that idea.

You know the code, do you think this is a possible/reasonable thing to do? I would definitely prefer it over maintaining the history outside Restic itself.

Copy link
Member

@MichaelEischer MichaelEischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few comments below, independent of whether we should add that flag or not.

You know the code, do you think this is a possible/reasonable thing to do? I would definitely prefer it over maintaining the history outside Restic itself.

There has been some discussion on that point in #874 (comment) . Adding backup statistics to a snapshot would definitely be possible. The only downside would be that only new restic versions would create and be able to interpret these (restic cat snapshot 12345678 will work though).

@@ -186,3 +186,25 @@ func (b *TextProgress) Finish(snapshotID restic.ID, start time.Time, summary *Su
formatDuration(time.Since(start)),
)
}

// Return finishing stats in a struct.
func (b *TextProgress) FinishSummary(snapshotID restic.ID, start time.Time, summary *Summary, dryRun bool) summaryOutput {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not require any data stored in the TextProgress struct. Thus please extract it into a separate function and remove the duplication with the JsonProgress.

if err := json.NewEncoder(buf).Encode(progressReporter.FinishSummary(id)); err != nil {
return errors.Errorf("encoding summary failed: %v", err)
}
fmt.Fprintf(sf, "%s", buf.String())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use sf.Write(buf.Bytes()) instead. There's no need for format string interpolation and multiple String <-> []byte conversions.

@@ -710,7 +713,21 @@ func runBackup(opts BackupOptions, gopts GlobalOptions, term *termstatus.Termina
if !gopts.JSON && !opts.DryRun {
progressPrinter.P("snapshot %s saved\n", id.Str())
}
if !success {
if opts.SummaryFilename != "" {
sf, err := os.OpenFile(opts.SummaryFilename, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0666)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to extract that code into a separate function. runBackup is already far too long.

@aawsome
Copy link
Contributor

aawsome commented Nov 4, 2022

see also #693 where I posted how rustic saves this information within the snapshot. (I still think this is a better solution than externally processing summary info)

@MichaelEischer
Copy link
Member

@aawsome Was it really necessary to advertise rustic here, in #693 and #874? #693 would have been enough.

Besides that I nevertheless agree with the assessment that storing the backup summary information in the snapshot is the way forward.

@aawsome
Copy link
Contributor

aawsome commented Nov 4, 2022

@MichaelEischer Sorry if it came to you as advertisement of rustic. I only intended to be clear about what I'm talking and prevent confusions as already happened in the forum - so I decided to always put the link when talking about rustic

If you are complaining about advertising other issue solutions which I prefer - well, this is of course intended 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants