Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[controller] Do not delete true backup version during Repush #945

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

majisourav99
Copy link
Contributor

@majisourav99 majisourav99 commented Apr 15, 2024

Do not delete true backup version during Repush

During repush Venice deletes the actual backup version instead of current version.
For example some store has versions: v9, v10. When a repush starts it creates v11 and it deletes v9 assuming it is a backup version, where as repush actually copies data from v10 to v11.
Now later if use wants to rollback to backup version they can only rollback to v10 which is exactly same as v11 as v9 was deleted.

This PR fixes that. It relies on StoreBackupVersionCleanupService to delete the backup version v10 asynchronously. Currently it checks controller.backup.version.metadata.fetch.cleanup.enabled to do a safer delete of previous current version which validates there are no reads to v10 version.

How was this PR tested?

CI

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to explain your proposed changes and call out the behavior change.
    The change in backup version definition for repush, the backup version is the current version before push instead of version before that.

versionList.add(v);
});
doReturn(versionList).when(store).getVersions();

if (!versionList.isEmpty()) {
doReturn(Optional.of(versionList.get(0))).when(store).getVersion(currentVersion);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this always return the first item in the list or should it correspond to currentVersion-1

Comment on lines 288 to 294
private boolean readyToBeRemoved(Version v, boolean isRepush, int currentVersionNum) {
if (isRepush) {
return v.getCreatedTime() == Duration.ZERO.toMillis();
}
return v.getNumber() < currentVersionNum;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered whether this logic will work well or not.
Let us say we have two versions:
v1,
v2,
And v3 is the repushed version from v2.

So when cleaning up the store versions, v2 will be removed as its created time is 0, and the true backup version will never be removed as its createdTime is not 0, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be removed in the later iteration after regular push, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, what if there is no regular push in the following month?
This comment is related to the following comment, and the main cause is that we can't differentiate which version is the source version for the repushed current version, and if we have that info, the logic will be very straightforward.
The source version cleanup will be independent from the true backup version cleanup.

Comment on lines 2501 to 2505
// update the age of the curent version to 0 for deletion in StoreBackupVersionCleanupService
if (isRepush && multiClusterConfigs.getControllerConfig(clusterName)
.isBackupVersionMetadataFetchBasedCleanupEnabled()) {
store.getVersion(currentVersionBeforePush).get().setCreatedTime(Duration.ZERO);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be fully correct in all scenarios.
Let us say:

1. User runs a batch push job, which produces a version: v10.
2. The job succeeds in two regions with: v9, v10.
3. The job fails in one region: v9.
4. We trigger a repush based on v10.
5. Two good regions will have: v9, v10, v11 and the bad region will have v9, v11.
6. Based on this logic, the v10 in good regions will be marked with created time: 0 and the v9 in the bad region will be marked in the same way.
7. v10 will be removed from good regions and v9 will be removed from the bad region.

The above doesn't sound good and is not consistent.

I think we should introduce a new version-level field called re-push source version, so that we can explicitly know which versions to cleanup.
Right now, I feel the logic is a little tricky and error prone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm V9 wont be deleted in bad region as we have check for keep min number of version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show me the logic regarding ^?
Also v9's created time will be updated to be 0 based on the new logic IIUC.
I think it is confusing, and it is better to make the logic clean to avoid tricky bugs.

@@ -74,14 +76,18 @@ private Store mockStore(
doReturn(latestVersionPromoteToCurrentTimestamp).when(store).getLatestVersionPromoteToCurrentTimestamp();
doReturn(currentVersion).when(store).getCurrentVersion();
List<Version> versionList = new ArrayList<>();
AtomicInteger i = new AtomicInteger();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some E2E integration tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants