Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blobxfer not uploading the backup files correctly #307

Open
Bennik2000 opened this issue Dec 9, 2023 · 1 comment
Open

blobxfer not uploading the backup files correctly #307

Bennik2000 opened this issue Dec 9, 2023 · 1 comment
Labels

Comments

@Bennik2000
Copy link

Summary

I am trying to backup a postgres database to azure using blobxfer. On the first run everything works fine and the backup files are uploaded. On subsequent runs the uploaded backup files have the correct file size but do not contain any useful data. They are filled with nulls. The checksum files contain some readable checkusum data.

The checksum of one broken data file:
2a1a8a40495c8f95bb6cc64fb4db71274d83cc3c pgsql_hhzb_auth_db_20231209-143122.sql.gz

The checksum stored in the related checksum file:
730d0bba4d54f9725b6bc930a0f99aab4dce5bdd pgsql_hhzb_auth_db_20231209-143122.sql.gz

The backup is stuck at Moving backup to external storage with blobxfer.

Steps to reproduce

  • Create a Postgres database and configure a backup job using blobxfer
  • Run the backup job -> Works correctly
  • Stop the database and backup containers
  • Restart the database and backup containers

What is the expected correct behavior?

Expected behavior is that the files are getting uploaded correctly.

Relevant logs and/or screenshots

db_backup                      | 2023-12-09.14:31:22 [NOTICE] ** [01-auth_db__hhzb] Dumping PostgresSQL database: 'hhzb' and compressing with 'gzip'
db_backup                      | 2023-12-09.14:31:22 [INFO] ** [01-auth_db__hhzb] DB Backup of 'pgsql_hhzb_auth_db_20231209-143122.sql.gz' completed successfully
db_backup                      | 2023-12-09.14:31:22 [NOTICE] ** [01-auth_db__hhzb] Generating SHA1 sum for 'pgsql_hhzb_auth_db_20231209-143122.sql.gz'
db_backup                      | 2023-12-09.14:31:22 [NOTICE] ** [01-auth_db__hhzb] Backup of 'pgsql_hhzb_auth_db_20231209-143122.sql.gz' created with the size of 1953026 bytes
db_backup                      | 2023-12-09.14:31:22 [INFO] ** [01-auth_db__hhzb] Synchronize local storage from S3 Bucket with blobxfer
db_backup                      | 2023-12-09 14:31:22.774 DEBUG - credential: account=XXXX endpoint=core.windows.net is_sas=False can_create_containers=True can_list_container_objects=True can_read_object=True can_write_object=True
db_backup                      | 2023-12-09 14:31:22.778 INFO - 
db_backup                      | ============================================
db_backup                      |          Azure blobxfer parameters
db_backup                      | ============================================
db_backup                      |          blobxfer version: 1.11.0
db_backup                      |                  platform: Linux-6.5.6-76060506-generic-x86_64-with
db_backup                      |                components: CPython=3.11.6-64bit azstor.blob=2.1.0 azstor.file=2.1.0 crypt=41.0.7 req=2.31.0
db_backup                      |        transfer direction: Azure -> local
db_backup                      |                   workers: disk=4 xfer=3 (msoc=8) md5=0 crypto=0
db_backup                      |                  log file: None
db_backup                      |                   dry run: False
db_backup                      |               resume file: None
db_backup                      |                   timeout: connect=10 read=200 max_retries=1000
db_backup                      |                      mode: StorageModes.File
db_backup                      |                   skip on: fs_match=False lmt_ge=False md5=False
db_backup                      |                    delete: extraneous=True only=False
db_backup                      |                 overwrite: True
db_backup                      |                 recursive: True
db_backup                      |             rename single: False
db_backup                      |          strip components: 0
db_backup                      |          chunk size bytes: 0
db_backup                      |          compute file md5: False
db_backup                      |        restore properties: attr=False lmt=False
db_backup                      |           rsa private key: None
db_backup                      |         local destination: /backup
db_backup                      | ============================================
db_backup                      | 2023-12-09 14:31:22.778 INFO - blobxfer start time: 2023-12-09 14:31:22.778579+01:00
db_backup                      | 2023-12-09 14:31:22.778 DEBUG - dest is_dir=True for 1 specs
db_backup                      | 2023-12-09 14:31:22.778 INFO - downloading blobs/files to local path: /backup
db_backup                      | 2023-12-09 14:31:22.779 DEBUG - spawning 3 transfer threads
db_backup                      | 2023-12-09 14:31:22.795 DEBUG - spawning 4 disk threads
db_backup                      | 2023-12-09 14:31:23.436 DEBUG - 0 files 0.0000 MiB filesize, lmt_ge, or no overwrite skipped
db_backup                      | 2023-12-09 14:31:23.436 DEBUG - 2 remote files processed, waiting for download completion of approx. 1.8626 MiB
db_backup                      | 2023-12-09 14:31:23.562 INFO - MD5: SKIPPED, hhzb-test-db-backup/pgsql_hhzb_auth_db_20231209-143000.sql.gz.sha1 None <L..R> None
db_backup                      | 2023-12-09 14:31:24.600 INFO - MD5: SKIPPED, hhzb-test-db-backup/pgsql_hhzb_auth_db_20231209-143000.sql.gz None <L..R> None
db_backup                      | 2023-12-09 14:31:24.601 INFO - attempting to delete 0 extraneous files
db_backup                      | 2023-12-09 14:31:24.601 INFO - elapsed download + verify time and throughput of 0.0018 GiB: 1.497 sec, 9.9520 Mbps (1.244 MiB/sec)
db_backup                      | 2023-12-09 14:31:24.601 INFO - blobxfer end time: 2023-12-09 14:31:24.601382+01:00 (elapsed: 1.823 sec)
db_backup                      | 2023-12-09.14:31:24 [INFO] ** [01-auth_db__hhzb] Moving backup to external storage with blobxfer

Environment

  • Image ID: c59afd3e3bf8
  • Host OS: Ubuntu

docker-compose.yaml:

version: '3.8'
services:
  auth_db:
    image: postgres:latest
    environment:
      POSTGRES_USER: hhzb
      POSTGRES_PASSWORD: hhzb
      POSTGRES_DB: hhzb
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
    networks:
      - db
  db_backup:
    container_name: db_backup
    image: tiredofit/db-backup
    environment:
      - TIMEZONE=Europe/Berlin
      - CONTAINER_ENABLE_MONITORING=FALSE
      - DEFAULT_BACKUP_LOCATION=blobxfer
      - DEFAULT_BLOBXFER_STORAGE_ACCOUNT=XXX
      - DEFAULT_BLOBXFER_STORAGE_ACCOUNT_KEY=XXX
      - DEFAULT_BLOBXFER_REMOTE_PATH=/hhzb-test-db-backup
      - DB01_TYPE=postgres
      - DB01_HOST=auth_db
      - DB01_NAME=hhzb
      - DB01_USER=hhzb
      - DB01_PASS=hhzb
      - DB01_DUMP_INTERVAL=5
      - DB01_DUMP_BEGIN=+1
      - DB01_CLEANUP_TIME=60
      - DB01_CHECKSUM=SHA1
      - DB01_COMPRESSION=GZ

    restart: always
    networks:
      - db

volumes:
  pgdata:

networks:
  db:

Possible fixes

The root cause could be that blobxfer does not upload the files correctly. I conclude this because the checksum of the uploaded file does not match the checksum in the .sha1 file.

@Bennik2000 Bennik2000 added the bug label Dec 9, 2023
@tiredofit
Copy link
Owner

This is a troublesome report. My apologies if you have experienced some sort of data loss.

I'm not entirely familiar with blobxfer as it's outside of my actual use cases, but I can break down what happens.

After the backup occurs locally on the system in a folder called TEMP_PATH (env variable) it synchronizes your Azure bucket with local download folder with this command:

blobxfer download --mode file --remote-path <BLOBXFER_REMOTE_PATH> --storage-account <BLOBXFER_STORAGE_ACCOUNT> --storage-account-key <BLOBXFER_STORAGE_ACCOUNT_KEY> --local-path <FILESYSTEM_PATH> --delete

Then, after completion db-backup moves the newly made backup from TEMP_PATH to FILESYSTEM_PATH (usually /backup).

Then, we run blobxfer again, and synchronize the changes from FILESYSTEM_PATH to blobxfer bucket:

blobxfer upload --mode file --remote-path ${backup_job_blobxfer_remote_path} --remote-path <BLOBXFER_REMOTE_PATH> --storage-account <BLOBXFER_STORAGE_ACCOUNT> --storage-account-key <BLOBXFER_STORAGE_ACCOUNT_KEY> --local-path <FILESYSTEM_PATH>

Then the backup is complete.

Then, there is a cleanup phase:

We look for files that are older than _CLEANUP_TIME in minutes that match the pattern of the files you are backing up from FILESYSTEM_PATH

find "${backup_job_filesystem_path}"/ -type f -mmin +"${backup_job_cleanup_time}" -iname "${backup_job_filename_base}*" -exec rm -f {} \;

and then following that perform a synchronize with the filesystems again with blobxfer this time including the delete commands to delete only whats changed:

blobxfer upload --mode file --remote-path ${backup_job_blobxfer_remote_path} --remote-path <BLOBXFER_REMOTE_PATH> --storage-account <BLOBXFER_STORAGE_ACCOUNT> --storage-account-key <BLOBXFER_STORAGE_ACCOUNT_KEY> --local-path <FILESYSTEM_PATH> --delete --delete-only

Maybe you can see with these steps where this could be going sideways?

DEBUG_CLEANUP_OLD_DATA=TRUE and DEBUG_MOVE_DBBACKUP=TRUE might give you some output which may give hints, or you can try troubleshooting to stop the cleanup by not setting a cleanup time..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants