Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support periodic or on-demand full backups to enhance backup reliability #7070

Open
derekbit opened this issue Nov 9, 2023 · 3 comments
Assignees
Labels
area/backup-store Remote backup store related area/resilience System or volume resilience highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal
Milestone

Comments

@derekbit
Copy link
Member

derekbit commented Nov 9, 2023

Is your feature request related to a problem? Please describe (馃憤 if you like this request)

In the existing Longhorn backup system, the initial backup is a full backup, while subsequent backups are incremental. If any block becomes corrupted, all backup revisions relying on that block will also be corrupted as well. An approach to address the issue might perform a full backup after every N incremental backups. This method can decreases the likelihood of backup corruption, enhancing the overall reliability of the backup process.

Current implementation
image

A possible solution
image

Ref: https://www.architecting.it/blog/incrementals-forever-or-synthetic-fulls/

Describe the solution you'd like

Describe alternatives you've considered

Additional context

@derekbit derekbit added kind/feature Feature request, new feature require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal area/resilience System or volume resilience area/backup-store Remote backup store related labels Nov 9, 2023
@innobead innobead added this to the v1.7.0 milestone Nov 9, 2023
@derekbit
Copy link
Member Author

I would highlight the feature that can help improve the resilience to the silent corruption of a backup server.
cc @innobead

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Mar 20, 2024

Pre Ready-For-Testing Checklist

Test

  • Create a Volume with 10MB, and write 4MB data
  • Create NFS backupstore

Test 1: Full Backup

  • Take a Backup from UI
  • Go into NFS backupstore to replace one of the block content with random data
  • Restore the Volume should fail.
  • Create a full Backup by
    • http://${LONGHORN_UI_ENDPOINT}/v1/volumes/${VOLUME}
    • Click snapshotBackup
      • name: the previous snapshot name
      • parameters: {"backup-mode": "full"}
  • Restore the Volume should succeed.
  • The first backup
    • .Status.NewlyUploadedDataSize: "4194304"
    • .Status.ReUploadedDataSize: "0"
  • The second backup
    • .Status.NewlyUploadedDataSize: "0"
    • .Status.ReUploadedDataSize: "4194304"

Test 2: Recurring Full Backup - Alaways incremental

  • Cleanup all the Backup/BackupVolume
  • Prepare RecurringJob YAML (every 1 min)
    apiVersion: longhorn.io/v1beta1
    kind: RecurringJob
    metadata:
    name: backup-job
    namespace: longhorn-system
    spec:
    cron: "* * * * ?"
    task: "backup"
    groups:
    - default
    retain: 100
    concurrency: 1
    parameters:
        full-backup-interval: "2"
    
  • Change the interval to 0, full-backup-interval: "0"
  • Create the RecurringJob
  • Wait for 2 Backup to be created
  • k describe lhb -n longhorn-system | grep -A 2 "Parameters"
    • Both should be empty
  • k describe lhb -n longhorn-system | grep -A 5 "Newly"
    • The first one should be "4194304"
    • The second one should be "0"
  • k describe lhb -n longhorn-system | grep -A 5 "Re Uploaded Data Size"
    • The first one should be "0"
    • The second one should be "0"
  • Cleanup all the Backup/BackupVolume

Test 3: Recurring Full Backup - Alaways Full

  • Change the interval to 1, full-backup-interval: "1"
  • Create the RecurringJob
  • Wait for 2 Backup to be created
  • k describe lhb -n longhorn-system | grep -A 2 "Parameters"
    • Both should be {"backup-mode": full}
  • k describe lhb -n longhorn-system | grep -A 5 "Newly"
    • The first one should be "4194304"
    • The second one should be "0"
  • k describe lhb -n longhorn-system | grep -A 5 "Re Uploaded Data Size"
    • The first one should be "0"
    • The second one should be "4194304"
  • Cleanup all the Backup/BackupVolume

Test 4: Recurring Full Backup - Every N times

  • Change the interval to 2, full-backup-interval: "2"
  • Create the RecurringJob
  • Wait for 4 Backup to be created
  • k describe lhb -n longhorn-system | grep -A 2 "Parameters"
    • The 1st and 3th should be {"backup-mode": full}
    • The 2st and 4th should be empty
  • k describe lhb -n longhorn-system | grep -A 5 "Newly"
    • The 1st should be "4194304"
    • The 2nd, 3rd and 4th should be "0"
  • k describe lhb -n longhorn-system | grep -A 5 "Re Uploaded Data Size"
    • The 1st should be "0"
    • The 2nd, 3rd and 4th should be "4194304"

@derekbit
Copy link
Member Author

derekbit commented Mar 21, 2024

The current implementation is adding a backup-mode label within a recurring job. User must establish two separate recurring jobs for incremental and full backups to regulate the frequency of full backups. This method complicates management.

However, following discussions with @ChanYiLin and @c3y1huang, we can record the frequency, period, or count within recurring job or backup volume, and the solution could simplify the configuration, requiring only a single recurring job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backup-store Remote backup store related area/resilience System or volume resilience highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal
Projects
None yet
Development

No branches or pull requests

4 participants