Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not showing lock held or backup in progress from info command #2346

Open
DaveSchile-Zonar opened this issue May 7, 2024 · 6 comments
Open
Assignees
Labels

Comments

@DaveSchile-Zonar
Copy link

DaveSchile-Zonar commented May 7, 2024

Please provide the following information when submitting an issue (feature requests or general comments can skip this):

  1. pgBackRest version: 2.49

  2. PostgreSQL version: 9.6

  3. Operating system/version:
    DB: Centos 7
    Pgbackrest: Alma Linux 9

  4. Did you install pgBackRest from source or from a package? Yes, both YUM packages

  5. Please attach the following as applicable:

    • pgbackrest.conf file(s)
 cat /etc/pgbackrest.conf 
[global]

  repo1-type=gcs
  repo1-path=/pgbackrest-dev
  repo1-gcs-bucket=pgbackrest-gcs-backups-dev-na
  repo1-gcs-key-type=auto
  repo1-retention-full-type=count
  repo1-retention-full=2
  repo1-retention-diff=24
  repo1-retention-archive=24
  log-path=/var/log/pgbackrest/ 
  log-level-console=info
  log-level-file=info
  process-max=4
  compress-type=zst 
  compress-level=6
  start-fast=y
  archive-async=y
  spool-path=/var/spool/pgbackrest
  buffer-size=4194304
  delta=y 
  link-all=y
  archive-check=y 
  archive-copy=y 
  resume=y
  checksum-page=n
  archive-get-queue-max=2TB

[dev-gtc-db-1027]
  pg1-host=10.131.40.90
  pg1-path=/var/lib/pgsql/9.6/pgcluster
  pg1-host-config-path=/var/lib/pgsql/9.6/pgbackrest
  pg1-port=6432
  pg1-user=postgres
  pg1-socket-path=/var/run/postgresql

[dev-gtc-db-1029]
  pg1-host=10.131.40.117
  pg1-path=/var/lib/pgsql/9.6/pgcluster
  pg1-host-config-path=/var/lib/pgsql/9.6/pgbackrest
  pg1-port=6432
  pg1-user=postgres
  pg1-socket-path=/var/run/postgresql

[dev-gtc-db-1031]
  pg1-host=10.131.40.118
  pg1-path=/var/lib/pgsql/9.6/pgcluster
  pg1-host-config-path=/var/lib/pgsql/9.6/pgbackrest
  pg1-port=6432
  pg1-user=postgres
  pg1-socket-path=/var/run/postgresql
  1. Describe the issue:
    When a backup is currently running I am seeing a lock file in /tmp/pgbackrest/<stanza>-backup.lock. However when I run info the status shows as ok, as if no backup is running. The info command with --output json status block looks like:
 "status": {
      "code": 0,
      "lock": {
        "backup": {
          "held": false
        }
      },
      "message": "ok"
    }

I am trying to set up monitoring for pgbackrest, shouldn't info show that a backup is in progress and it's status?

Thank you

@pgstef
Copy link
Member

pgstef commented May 8, 2024

Hi,

First of all, a little bit of background for the json output: #2343 (comment)

What command are you running exactly and where? The running backup should only be seen on the repository host.

A quick test (1 repo-host + 1 pg-host) showed me that the info output works as expected, reflecting what's inside the lock:

$ pgbackrest info --stanza=ro9pg
stanza: ro9pg
    status: ok (backup/expire running - 34.12% complete)

$ cat /tmp/pgbackrest/ro9pg-backup.lock 
{"execId":"11623-8a6d5ca8","pctCplt":3412,"pid":11623,"sz":9439150710,"szCplt":3221225472}

Since you have multiple stanzas defined, did you specify the --stanza arg? With only 1 stanza I don't see any issue:

$ pgbackrest info
stanza: ro9pg
    status: ok (backup/expire running)

$ pgbackrest info --stanza=ro9pg
stanza: ro9pg
    status: ok (backup/expire running)

$ pgbackrest info --output=json
"status":{"code":0,"lock":{"backup":{"held":true}},"message":"ok"}

$ pgbackrest info --stanza=ro9pg --output=json
"status":{"code":0,"lock":{"backup":{"held":true}},"message":"ok"}

So it could be useful for you to specify that arg, display the content of the lock and of the info command at the same time for comparison.

Monitoring the locks directly is a good idea anyway IMHO. A lock file staying there for too long might indicate a problem (backup too long or wrongly interrupted for example).

Kind Regards

@DaveSchile-Zonar
Copy link
Author

Thanks for your response. Here are some commands and output:

$ cat /tmp/pgbackrest/dev-gtc-db-1027-backup.lock 
{"execId":"320806-6a9b38ca","pctCplt":2779,"pid":320806,"sz":63318134880,"szCplt":17599045632}

$ pgbackrest info --stanza dev-gtc-db-1027
stanza: dev-gtc-db-1027
    status: ok
    cipher: none

$ pgbackrest info --stanza dev-gtc-db-1027 --output json
"status": {
      "code": 0,
      "lock": {
        "backup": {
          "held": false
        }
      },
      "message": "ok"
    }

These are all running from the repo host. The host running pgbackrest.
After looking at your output, perhaps I'm mistaken in what I thought the info command provides. It looks like in the output you have from your info --output json command, it shows that there is a lock held, but you don't get any backup status (bytes_complete bytes_total, etc) data. I thought that the info command was meant to provide this. In the code it appears that the lock struct is being read.

Either way, my info command is incorrectly reporting no lock held when a backup is actually running and a lock file is in place.

Thank you for looking!

@DaveSchile-Zonar
Copy link
Author

Here's some more output. After some time of running the backup, the lock file changes. The sz attributes are removed. When this happens, I'm not sure whats going on with the backup, but the info output changes to indicate that there is a lock.

$ cat /tmp/pgbackrest/dev-gtc-db-1027-backup.lock 
{"execId":"355116-37557181","pid":355116}

$ pgbackrest info --stanza dev-gtc-db-1027
stanza: dev-gtc-db-1027
    status: ok (backup/expire running)
    cipher: none

 pgbackrest info --stanza dev-gtc-db-1027 --output json | jq
    "status": {
      "code": 0,
      "lock": {
        "backup": {
          "held": true
        }
      },
      "message": "ok"
    }

@pgstef
Copy link
Member

pgstef commented May 13, 2024

Hi,

Maybe adding --log-level-console=debug might help spot something obvious but I don't really know what to add here since I can't reproduce your issue. I even tried with multiple stanzas,etc.

size and size-cplt are displayed in the info json if they are available in the backup lock:

$ cat /tmp/pgbackrest/d11pg-backup.lock
{"execId":"11986-437f5fd4","pctCplt":1606,"pid":11986,"sz":967412859,"szCplt":155385856}

$ pgbackrest info --stanza=d11pg
    status: ok (backup/expire running - 16.06% complete)

$ pgbackrest info --stanza=d11pg --output=json
"status":{"code":0,"lock":{"backup":{"held":true,"size":967412859,"size-cplt":155385856}},"message":"ok"}

Sure, using the info command to get the list of the backups in the repositories is useful. But to be honest, I wouldn't use the info command to monitor the progress of the running commands anyway, the lock files are exactly there for that purpose.

@DaveSchile-Zonar
Copy link
Author

I appreciate your looking at this. Thank you.
My team and I are continuing to test and evaluate pgbackrest. As for this issue:
We are using the lock files themselves for monitoring of backup status and activity. Unfortunately I am still not seeing lock confirmation from info when there is a lock, and I am never seeing percentage complete data from info, whether the lock file has it or not.
@pgstef, I'm a bit confused at your first comment above. In the first code block you show the info command returning percentage complete, but in the second code block, with the same info command, the output you show just shows that the backup/expire is running, but there is no percentage complete. What's the difference between the two?

@pgstef
Copy link
Member

pgstef commented May 22, 2024

You can have a look at this PR if you want to see the details of the implementation.
There are a lot of places where the lock content will be updated, and basically the percent should only be there during the step of copying the files. It reflects what you could grep from the logs.
Imho, you shouldn't include that field in a monitoring code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants