Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus metric resticprofile_backup_status is 2 even when backups fail #332

Open
deviantintegral opened this issue Feb 27, 2024 · 3 comments
Labels
bug Something isn't working feedback Need some more feedback

Comments

@deviantintegral
Copy link

To test alerting on the resticprofile_backup_status I tweaked my AWS access key to be invalid, and triggered a backup. While the job errored out, I see a fresh metric for resticprofile_backup_status with the status of 2.

Luckily, the Last Backup timestamp isn't changed, so I can probably alert on that. However, I expected the status to be 0.

@creativeprojects
Copy link
Owner

You're right, I wouldn't expect the status to be 2 🤔
Can you please post your profile configuration (with any repository information redacted) so I can get a better idea of what is happening?

@creativeprojects creativeprojects added the feedback Need some more feedback label Mar 4, 2024
@deviantintegral
Copy link
Author

Sure, here it is. I have several other backup sets but they all have the same config.

version: "1"

global:
  scheduler: crond
  priority: low

base:
  initialize: true
  password-file: key
  prometheus-push: "http://metrics-docker.lan:9091/"
  prometheus-save-to-file: "{{ .Profile.Name }}.prom"
  prometheus-labels:
    - host: {{ .Hostname }}
  backup:
    exclude-caches: true
    one-file-system: true
    check-before: true
    extended-status: true
  retention:
    after-backup: true
    keep-daily: 30
    keep-weekly: 4
    keep-monthly: 13
    prune: true

photos:
  inherit: base
  lock: /tmp/photos.lock
  force-inactive-lock: true
  rustic-stale-lock-age: 5m
  repository: REDACTED-S3-ENDPOINT-ON-B2
  env:
    AWS_ACCESS_KEY_ID: REDACTED_ACCESS_KEY
    AWS_SECRET_ACCESS_KEY: REDACTED_SECRET_KEY
  backup:
    source:
      - '/source/photos'
    schedule: "04:00"
    schedule-permission: system

@creativeprojects
Copy link
Owner

Right, I see what's happening:

  • the check command fails immediately since the repository is not available
  • resticprofile stops after the check failed, without trying to run a backup

But only the backup command generates prometheus metrics. So at that point it's keeping the existing metrics and not generating new ones.

I think to fix this issue we would need to generate a status line for each part (check, forget, etc.)

@creativeprojects creativeprojects added the bug Something isn't working label Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feedback Need some more feedback
Projects
None yet
Development

No branches or pull requests

2 participants