-
Notifications
You must be signed in to change notification settings - Fork 567
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc(enhancement): add recurring and manual full backup support
ref: longhorn/longhorn 7070 Signed-off-by: Jack Lin <jack.lin@suse.com>
- Loading branch information
Showing
1 changed file
with
211 additions
and
0 deletions.
There are no files selected for viewing
211 changes: 211 additions & 0 deletions
211
enhancements/20240314-recurring-and-manual-full-backup-support.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
# 20240314-recurring-and-manual-full-backup-support | ||
|
||
## Summary | ||
|
||
This feature enables Longhorn to create **recurring job** for full backup or **manually trigger** the full backup of the volume. | ||
|
||
### Related Issues | ||
|
||
- Community issue: https://github.com/longhorn/longhorn/issues/7069 | ||
- Improvement issue: https://github.com/longhorn/longhorn/issues/7070 | ||
|
||
## Motivation | ||
|
||
Longhorn always does incremental backup which only backup the newly updated blocks. | ||
There is a chance that the previous backup blocks on the backupstore are corrupted. In this case, users can not restore the volume anymore because Longhorn aborts the restoration when it finds those blocks have different checksum. | ||
|
||
### Goals | ||
|
||
- Add a new fields `Parameters` to `RecurringJob` and `Backup` | ||
- `backup-mode`: used in `Backup` CR to trigger the full backup (Options: `"full"`, `"incremental"`, default to `"incremental"` for always incremental) | ||
- `full-backup-interval`: used in `RecurringJob - Backup Type` to execute full backup every N incremental backups (default to 0 for always incremental) | ||
- When doing full backup, Longhorn will backup **all the current blocks** of the volume and **overwrite them** on the backupstore even if those blocks already exists on the backupstore. | ||
- Collect metrics of `newly upload data size` and `overwritten data size` for user to better understand the cost. | ||
|
||
## Proposal | ||
|
||
### User Stories | ||
|
||
### User Experience In Detail | ||
|
||
#### Recurring Full Backup - Always | ||
|
||
1. Create a `Backup` task type RecurringJob with the parameter `full-backup-interval: 0` and assign it to the volume | ||
``` | ||
apiVersion: longhorn.io/v1beta2 | ||
kind: RecurringJob | ||
metadata: | ||
name: recurring-full-backup-per-min | ||
namespace: longhorn-system | ||
spec: | ||
concurrency: 1 | ||
cron: '* * * * *' | ||
groups: [] | ||
labels: {} | ||
parameters: | ||
full-backup-interval: 0 | ||
name: recurring-full-backup-per-min | ||
retain: 0 | ||
task: backup | ||
``` | ||
2. The RecurringJob runs and fully backup the volume every time. | ||
|
||
#### Recurring Full Backup - Every N Incremental Backups | ||
|
||
1. Create a `Backup` task type RecurringJob with the label `full-backup-interval: 5` and assign it to the volume | ||
``` | ||
apiVersion: longhorn.io/v1beta2 | ||
kind: RecurringJob | ||
metadata: | ||
name: recurring-full-backup-per-min | ||
namespace: longhorn-system | ||
spec: | ||
concurrency: 1 | ||
cron: '* * * * *' | ||
groups: [] | ||
labels: {} | ||
parameters: | ||
full-backup-interval: 5 | ||
name: recurring-full-backup-per-min | ||
retain: 0 | ||
task: backup | ||
``` | ||
2. The RecurringJob runs and fully backup the volume every 5 incremental backups. | ||
|
||
#### Manual Full Backup | ||
|
||
1. When creating backup, users can check the checkbox `Full Backup: []` or add the parameters to the spec `backup-mode: full`. | ||
2. The backup will be full backup. | ||
3. Maybe adjust the UI to make the process more simple | ||
|
||
## Design | ||
|
||
### Implementation Overview | ||
|
||
#### Metrics | ||
|
||
1. Add two new fields `new upload data size`, `reupload data size` to the Backup Status | ||
|
||
#### UI | ||
|
||
1. In **Volume Page** >> **Create Backup** , add a checkbox `Full Backup: []` | ||
- If it is checked, automatically add the parameters `backup-mode: full` to the request payload | ||
- For example: | ||
``` | ||
HTTP/1.1 POST /v1/volumes/${VOLUME_NAME}?action=snapshotBackup | ||
Host: localhost:8080 | ||
Accept: application/json | ||
Content-Type: application/json | ||
Content-Length: 55 | ||
{ | ||
"parameters": { | ||
"backup-mode": "full", | ||
}, | ||
"name": ${BACKUP_NAME}, | ||
} | ||
``` | ||
|
||
2. In **Recurring Jo** >> **Create Recurring Job**, add a new sector for user to fill in the parameters when the task is `Backup` related task. | ||
- Currently only support: | ||
- `full-backup-interval` | ||
|
||
3. In **Backup** >> **${BackVOlume}**, add a new field `Backup Mode` | ||
- If it has the parameters `backup-mode: full`, show `full`, otherwise show `incremental` | ||
|
||
#### CRD | ||
|
||
1. **BackupVolume**: add a new status `.Status.BackupCount` to record how many backups have been created. | ||
|
||
2. **Backup**: add a new fields `parameters` to pass the backup options. | ||
- `backup-mode`: `"full"` to trigger full backup. Default to `"incremental"` for incremental backup | ||
|
||
3. **RecurringJob**: add a new fields `parameters`. | ||
- `full-backup-interval`: Only used in `Backup` related task. Execute full backup every N incremental backups. Default to 0 for always incremental | ||
|
||
Backup CR Example | ||
```yaml | ||
apiVersion: longhorn.io/v1beta2 | ||
kind: Backup | ||
metadata: | ||
name: backup-abcde1234 | ||
namespace: longhorn-system | ||
spec: | ||
snapshot: fake-snapshot | ||
parameters: | ||
backup-mode: full | ||
``` | ||
|
||
#### Backupstore | ||
1. Need to pass `parameters` through the grpc function call chain. | ||
2. In our implementation, if the Volume has `lastBackup`, we then always perform incremental Backup. | ||
3. Now, if `backup-mode: full` exists in the parameters, | ||
- we then pretend the last Backup does not exist and force it to do the full Backup. | ||
- overwrites the block on the backupstore even it already exists. | ||
4. store the `new upload data size`, `reupload data size` to the Backup Status. | ||
|
||
#### Webhook | ||
1. Check the parameters to prevent from typo. | ||
2. ReucrringJob currently only accept `full-backup-interval` | ||
3. Backup currently only accept `backup-mode` | ||
|
||
### Test plan | ||
|
||
#### Manually Full Backup | ||
1. Create a Volume 4MB and fill in the content. | ||
2. Create a Backup of the Volume. | ||
3. Intentionally replace the content of the first block(2MB) on the backupstore | ||
4. Restore the Volume, and will get error logs like below | ||
``` | ||
[pvc-XXXXXX] time="XXXX" level=error msg="Backup data restore Error Found in Server[gzip: invalid checksum]" | ||
``` | ||
5. Create a full backup with the parameter `backup-mode: full` | ||
6. Restore the backup, this time should work | ||
|
||
#### Recurring Job Full Backup | ||
1. Create a Volume 4MB and fill in the content. | ||
2. Create a Backup of the Volume. | ||
3. Intentionally replace the content of the first block(2MB) on the backupstore | ||
4. Restore the Volume, and will get error logs like below | ||
``` | ||
[pvc-XXXXXX] time="XXXX" level=error msg="Backup data restore Error Found in Server[gzip: invalid checksum]" | ||
``` | ||
5. Create a `backup` task type RecurringJob with the parameter `full-backup-interval: 1` and assign it to the volume | ||
``` | ||
apiVersion: longhorn.io/v1beta2 | ||
kind: RecurringJob | ||
metadata: | ||
name: recurring-full-backup-per-min | ||
namespace: longhorn-system | ||
spec: | ||
concurrency: 1 | ||
cron: '* * * * *' | ||
groups: [] | ||
labels: {} | ||
parameters: | ||
full-backup-interval: 1 | ||
name: recurring-full-backup-per-min | ||
retain: 0 | ||
task: backup | ||
``` | ||
6. Wait for the recurring job to be finished. | ||
7. Restore the backup, this time should work | ||
|
||
|
||
#### Concurrent Backup | ||
|
||
1. Create a Volume 4MB and fill in the content. | ||
2. Create 3 recurring job for every 1 min, two for normal incremental backup and the other one for full backup | ||
3. These 3 recurring job should be triggered at once. | ||
4. Wait for 3 backup to be finished. | ||
5. Restore the last backup of the BackupVolume. | ||
6. The content should be the same as original Volume. | ||
|
||
|
||
### Upgrade strategy | ||
|
||
No need. | ||
|
||
## Note [optional] | ||
|
||
None. |