-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional metric: last snapshot date/timestamp(per repository) #256
Comments
Exporting a metric with the last time a snapshot was written during the lifetime of a process would not be hard to add. Exporting it for repositories that have not seen a backup since the last rest-server restart would be harder, as it would require scanning the disk for all repositories on startup. We have so far avoided that. This part is probably not necessary to make this useful for monitoring. |
The only way to implement this would be for the
This could be useful, but doing this per repository would scanning all repositories, which rest-server currently does not do.
The rest-server cannot read this data.
The rest-server cannot tell when specific command were run. Creating a new snapshot has the side-effect of creating a new file in the snapshots directory which makes this easy, but this is not true for these commands. This issue is related to #50. |
I actually have metrics for most of these things already using a simple bash script, a systemd timer and prometheus-node-exporter picking up the textfile produced by it. #!/bin/sh
set +e
set +x
NAMESPACE=restic
BACKUP_FOLDER=/mnt/restic
for dir in $(find "${BACKUP_FOLDER}" -maxdepth 1 -mindepth 1 -type d); do
total_size=$(du -bs "${dir}" | cut -f 1)
snapshots_raw=$(ls -t -l --full-time "${dir}/snapshots" | sed 1d)
snapshots_count=$(echo "${snapshots_raw}" | wc -l)
lock_count=$(ls -1 "${dir}/locks" | wc -l)
latest_snapshot=$(echo "${snapshots_raw}" | head -n 1 | awk '{ print $6 " " $7 }')
latest_snapshot_unix=$(date -d "${latest_snapshot}" +"%s")
OUTPUT="${OUTPUT}${NAMESPACE}_repository_size_bytes{repository=\"${dir}\"} ${total_size}\n"
OUTPUT="${OUTPUT}${NAMESPACE}_snapshots_count{repository=\"${dir}\"} ${snapshots_count}\n"
OUTPUT="${OUTPUT}${NAMESPACE}_latest_snapshot_time_seconds{repository=\"${dir}\"} ${latest_snapshot_unix}\n"
OUTPUT="${OUTPUT}${NAMESPACE}_lock_count{repository=\"${dir}\"} ${lock_count}\n"
done
echo $OUTPUT | sort This does make a fair bit of assumptions however, it won't work with restic repositories in subdirectories for example. But this has served me very well so far. |
The REST server only supplies 'protocol', it can't tap into the commands themselves, is that correct ? I can see how that would be problematic and probably make such statistics severely out of scope. Would there be any way for a client script or similar to communicate such data to the server (optionally), or even interest in a solution like that ? I was thinking of trying to bridge or include https://github.com/ngosang/restic-exporter with this project, but if there is no real way for the server to hold that data, each client would have to run their own exporter, which doesn't really seem desirable.
It would solve a large part of our metric/alerting needs, so I would be very much in favor of that. Optionally, probably even opt-in, as the majority of people probably don't use metrics, I would think.
The existing metrics for that could still be improved - latest snapshot timestamp, for example, would be very helpful. Currently, if I run automated forgetting, it would be hard to distinguish between no backup running or a backup + one expired snapshot, since both would result in the same reported snapshot amount/change (+-0), right ? |
Output of
rest-server --version
Not relevant.
What should rest-server do differently?
Export the timestamp of the last successful snapshot (and ideally more, I added a few ideas at the end, but last snapshot timestamp is most critical) as part of the Prometheus metrics.
What are you trying to do? What is your use case?
The Prometheus metrics are perfect to set up a monitoring system to alert on backups not running, because it would allow to monitor the actual result of the backup job, so it would be much better than say the backup job itself sending alerts on failures - if the job doesn't run, for example, it might never send out notifications. Watching the REST servers metrics on the other hand would always be able to confirm that everything else aside, the snapshot made it to the repository.
Did rest-server help you today? Did it make you happy in any way?
It's fantastic, and I am currently working on switching a large part of my personal and professional life to back up to a Restic-REST server we run internally (as a rootless Podman service, which is ever so nice) and it's very exciting to have such a clean backup interface. Thank you guys!
Additional metrics that may be useful; some of which I suspect would need the repositories credentials. I am not sure if the REST server would have the capabilities to hook into that. Maybe it could generate metrics during the running of the actual
backup
command and then store them for the metrics export later, since it can't very well open the repository for each metrics request ?The text was updated successfully, but these errors were encountered: