Skip to content

Commit

Permalink
doc(enhancement): add recurring and manual full backup support
Browse files Browse the repository at this point in the history
ref: longhorn/longhorn 7070

Signed-off-by: Jack Lin <jack.lin@suse.com>
  • Loading branch information
ChanYiLin committed Mar 15, 2024
1 parent 42c0757 commit 29bc52f
Showing 1 changed file with 152 additions and 0 deletions.
152 changes: 152 additions & 0 deletions enhancements/20240314-recurring-and-manual-full-backup-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# 20240314-recurring-and-manual-full-backup-support

## Summary

This feature enables Longhorn to create **recurring job** for full backup or **manually trigger** the full backup of the volume.

### Related Issues

- Community issue: https://github.com/longhorn/longhorn/issues/7069
- Improvement issue: https://github.com/longhorn/longhorn/issues/7070

## Motivation

Longhorn always does incremental backup which only backup the newly updated blocks.
There is a chance that the previous backup blocks on the backupstore are corrupted. In this case, users can not restore the volume anymore because Longhorn aborts the restoration when it finds those blocks have different checksum.

### Goals

- Add a new preserved label `longhorn.io/longhorn-backup-mode: full` for the Backup CR to trigger the full backup
- With this label, user can create **recurring full backup job** or **manually trigger** the full backup of the volume
- When doing full backup, Longhorn will backup **all the current blocks** of the volume and **overwrite them** on the backupstore even if those blocks already exists on the backupstore.


## Proposal

### User Stories

### User Experience In Detail

#### Recurring Full Backup

1. Create a `backup` task type RecurringJob with the label `longhorn.io/longhorn-backup-mode: full` and assign it to the volume
```
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: recurring-full-backup-per-min
namespace: longhorn-system
spec:
concurrency: 1
cron: '* * * * *'
groups: []
labels:
longhorn.io/longhorn-backup-mode: full
name: recurring-full-backup-per-min
retain: 0
task: backup
```
2. The RecurringJob runs and fully backup the volume.

#### Manual Full Backup

1. When creating backup, users can fill in the label `longhorn.io/longhorn-backup-mode: full`.
2. The backup will be full backup.
3. Maybe adjust the UI to make the process more simple

## Design

### Implementation Overview

#### UI (May need some adjust)

1. When creating backup, users can fill in the label `longhorn.io/longhorn-backup-mode: full`.
2. The backup will be full backup.

#### CRD

1. **Backup**: add a new reserved Longhorn label `longhorn.io/longhorn-backup-mode: full`. (NO need to do any change)
- When the Backup CR has such label, it will perform full backup.
- Using label so we can filter Backup by the label to distinguish the full backup and normal backup easily.
- We already store the label to the backupstore when doing backup. Thus, when we pull the Backup from the backupstore in a new cluster, the label will be pulled as well.
- This label is only used when the backup happens and tell the engine/replica to do the full backup.
- Since we already pass `Label` through the grpc call chain from `longhorn-manager`->`longhorn-instance-manager`->`longhorn-engine/replica`->`backupstore`. So we don't need to do much modification.

Backup CR Example
```yaml
apiVersion: longhorn.io/v1beta2
kind: Backup
metadata:
name: backup-abcde1234
namespace: longhorn-system
spec:
snapshot: fake-snapshot
labels:
longhorn.io/longhorn-backup-mode: full
```

#### Backupstore
0. In our implementation, if the Volume has `lastBackup`, we then always perform incremental Backup.
1. Now, if `longhorn.io/longhorn-backup-mode: full` exists in the label,
- we then pretend the last Backup does not exist and force it to do the full Backup.
- overwrites the block on the backupstore even it already exists.

### Test plan

#### Manually Full Backup
1. Create a Volume 4MB and fill in the content.
2. Create a Backup of the Volume.
3. Intentionally replace the content of the first block(2MB) on the backupstore
4. Restore the Volume, and will get error logs like below
```
[pvc-XXXXXX] time="XXXX" level=error msg="Backup data restore Error Found in Server[gzip: invalid checksum]"
```
5. Create a full backup from the UI with the label `longhorn.io/longhorn-backup-mode: full`
6. Restore the backup, this time should work

#### Recurring Job Full Backup
1. Create a Volume 4MB and fill in the content.
2. Create a Backup of the Volume.
3. Intentionally replace the content of the first block(2MB) on the backupstore
4. Restore the Volume, and will get error logs like below
```
[pvc-XXXXXX] time="XXXX" level=error msg="Backup data restore Error Found in Server[gzip: invalid checksum]"
```
5. Create a `backup` task type RecurringJob with the label `longhorn.io/longhorn-backup-mode: full` and assign it to the volume
```
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: recurring-full-backup-per-min
namespace: longhorn-system
spec:
concurrency: 1
cron: '* * * * *'
groups: []
labels:
longhorn.io/longhorn-backup-mode: full
name: recurring-full-backup-per-min
retain: 0
task: backup
```
6. Wait for the recurring job to be finished.
7. Restore the backup, this time should work


#### Concurrent Backup

1. Create a Volume 4MB and fill in the content.
2. Create 3 recurring job for every 1 min, two for normal incremental backup and the other one for full backup
3. These 3 recurring job should be triggered at once.
4. Wait for 3 backup to be finished.
5. Restore the last backup of the BackupVolume.
6. The content should be the same as original Volume.


### Upgrade strategy

No need.

## Note [optional]

None.

0 comments on commit 29bc52f

Please sign in to comment.