New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[controller] Do not delete true backup version during Repush #945
base: main
Are you sure you want to change the base?
Conversation
...ontroller/src/main/java/com/linkedin/venice/controller/StoreBackupVersionCleanupService.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/meta/ReadOnlyStore.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/meta/VersionImpl.java
Outdated
Show resolved
Hide resolved
...ontroller/src/main/java/com/linkedin/venice/controller/StoreBackupVersionCleanupService.java
Outdated
Show resolved
Hide resolved
...ontroller/src/main/java/com/linkedin/venice/controller/StoreBackupVersionCleanupService.java
Outdated
Show resolved
Hide resolved
...oller/src/test/java/com/linkedin/venice/controller/TestStoreBackupVersionCleanupService.java
Outdated
Show resolved
Hide resolved
versionList.add(v); | ||
}); | ||
doReturn(versionList).when(store).getVersions(); | ||
|
||
if (!versionList.isEmpty()) { | ||
doReturn(Optional.of(versionList.get(0))).when(store).getVersion(currentVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this always return the first item in the list or should it correspond to currentVersion-1
internal/venice-common/src/main/java/com/linkedin/venice/meta/Version.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/meta/ReadOnlyStore.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/main/java/com/linkedin/venice/meta/VersionImpl.java
Outdated
Show resolved
Hide resolved
...ontroller/src/main/java/com/linkedin/venice/controller/StoreBackupVersionCleanupService.java
Outdated
Show resolved
Hide resolved
private boolean readyToBeRemoved(Version v, boolean isRepush, int currentVersionNum) { | ||
if (isRepush) { | ||
return v.getCreatedTime() == Duration.ZERO.toMillis(); | ||
} | ||
return v.getNumber() < currentVersionNum; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered whether this logic will work well or not.
Let us say we have two versions:
v1,
v2,
And v3 is the repushed version from v2.
So when cleaning up the store versions, v2 will be removed as its created time is 0, and the true backup version will never be removed as its createdTime is not 0
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be removed in the later iteration after regular push, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, what if there is no regular push in the following month?
This comment is related to the following comment, and the main cause is that we can't differentiate which version is the source version for the repushed current version, and if we have that info, the logic will be very straightforward.
The source version cleanup will be independent from the true backup version cleanup.
// update the age of the curent version to 0 for deletion in StoreBackupVersionCleanupService | ||
if (isRepush && multiClusterConfigs.getControllerConfig(clusterName) | ||
.isBackupVersionMetadataFetchBasedCleanupEnabled()) { | ||
store.getVersion(currentVersionBeforePush).get().setCreatedTime(Duration.ZERO); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be fully correct in all scenarios.
Let us say:
1. User runs a batch push job, which produces a version: v10.
2. The job succeeds in two regions with: v9, v10.
3. The job fails in one region: v9.
4. We trigger a repush based on v10.
5. Two good regions will have: v9, v10, v11 and the bad region will have v9, v11.
6. Based on this logic, the v10 in good regions will be marked with created time: 0 and the v9 in the bad region will be marked in the same way.
7. v10 will be removed from good regions and v9 will be removed from the bad region.
The above doesn't sound good and is not consistent.
I think we should introduce a new version-level field called re-push source version, so that we can explicitly know which versions to cleanup.
Right now, I feel the logic is a little tricky and error prone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm V9 wont be deleted in bad region as we have check for keep min number of version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show me the logic regarding ^?
Also v9's created time will be updated to be 0
based on the new logic IIUC.
I think it is confusing, and it is better to make the logic clean to avoid tricky bugs.
@@ -74,14 +76,18 @@ private Store mockStore( | |||
doReturn(latestVersionPromoteToCurrentTimestamp).when(store).getLatestVersionPromoteToCurrentTimestamp(); | |||
doReturn(currentVersion).when(store).getCurrentVersion(); | |||
List<Version> versionList = new ArrayList<>(); | |||
AtomicInteger i = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some E2E integration tests.
Do not delete true backup version during Repush
During repush Venice deletes the actual backup version instead of current version.
For example some store has versions: v9, v10. When a repush starts it creates v11 and it deletes v9 assuming it is a backup version, where as repush actually copies data from v10 to v11.
Now later if use wants to rollback to backup version they can only rollback to v10 which is exactly same as v11 as v9 was deleted.
This PR fixes that. It relies on
StoreBackupVersionCleanupService
to delete the backup version v10 asynchronously. Currently it checkscontroller.backup.version.metadata.fetch.cleanup.enabled
to do a safer delete of previous current version which validates there are no reads to v10 version.How was this PR tested?
CI
Does this PR introduce any user-facing changes?
The change in backup version definition for repush, the backup version is the current version before push instead of version before that.