Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reveal hook to allow operators to restore just to the most recent snapshot #1035

Merged
merged 8 commits into from Mar 29, 2023

Conversation

mattl-netflix
Copy link
Contributor

No description provided.

Copy link
Contributor

@akhaku akhaku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, gotta say I love all of the cleanup you're doing; extraneous comments, vertical space, and declared exceptions are in particular some of my pet peeves

.findMetaFiles(dateRange)
.stream()
.filter(meta -> metaProxy.isMetaFileValid(meta).valid)
.findFirst();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful

@@ -133,22 +129,12 @@ public final boolean isFiltered(String keyspace, String columnFamilyDir) {
if (excludeFilter.containsKey(keyspace)
&& (excludeFilter.get(keyspace).isEmpty()
|| excludeFilter.get(keyspace).contains(columnFamilyName))) {
logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's fine and you have a better understanding around whether or not we ever need those log messages but in general I'm not opposed to debug logs that I can enable to help debug an issue

return true;
}
if (includeFilter != null)
if (!(includeFilter.containsKey(keyspace)
return !(includeFilter.containsKey(keyspace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: could collapse this further, return includeFilter != null && !includeFilter.containsKey(keyspace) && foo || bar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. However, I wanted to leave code in methods unrelated to the goal of the PR unchanged. I accepted this because it was done automatically by Intellij. If I change this I'd need to add the appropriate tests and I think that is beyond scope and best left to a follow-up.

@@ -17,6 +17,7 @@

package com.netflix.priam.backup;

import com.google.api.client.util.Lists;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally not a fan of using (non-shaded) guava in a library since it causes so many compatibility issues. I've gone as far as created my own Preconditions class. This usage below looks like it might be able to be replaced with a vanilla ArrayList constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -58,14 +59,13 @@ public static Optional<AbstractBackupPath> getLatestValidMetaPath(
.findFirst();
}

public static List<AbstractBackupPath> getAllFiles(
public static List<AbstractBackupPath> getMostRecentSnapshotPaths(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming the method was a great idea, makes it more precise (now that it's different) and also makes it impossible to forget to update a callsite

allFiles.addAll(
BackupRestoreUtil.getIncrementalPaths(
latestValidMetaFile.get(), dateRange, metaProxy));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it we don't have to update the callsite in BackupServletV2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused - do we need to use config.skipIncrementalRestore there too? I didn't see that there in this PR, perhaps it isn't required?

@mattl-netflix mattl-netflix merged commit 36321c4 into 3.11 Mar 29, 2023
1 check passed
@mattl-netflix mattl-netflix deleted the feature/restore_to_snapshot branch March 29, 2023 21:04
mattl-netflix added a commit that referenced this pull request Apr 18, 2023
…pshot (#1035)

* Remove unused code.

* Remove redundant comments and vertical whitespace.

* Remove debug comments and now-redundant logger, simplify if-else and tighten error message for code style.

* Use final where applicable and remove it where redundant.

* Remove redundant BackupRestoreException from getIncrementals method signature.

* Split getting incremental files and snapshot files into separate methods.

* Reveal hook to allow operators to restore to the last valid snapshot.

* Remove added non-shaded Guava dependency pursuant to review comments.
mattl-netflix added a commit that referenced this pull request Apr 18, 2023
* Fix backup verification race condition causing missing notifications (#1034)

* Remove metaproxy validation it is never null in practice.

* Remove DateRange validation. It is never null in practice.

* Remove debug logging.

* Remove latest backup metadata validation. It is never null in practice.

* Consolidate repeated code into private verifyBackup.

* Change method names to better reflect what they do.

* Update latestResult wherever possible.

* Rewrite logic in findLatestVerfiedBackup to make it look more like verifyBackupsInRange.

* Change signature of BackupNotificationMgr.notify to not depend on BackupVerificationResult.

* Return all verified BackupMetadata instead of BackupVerificationResult when verifying en masse. It has enough information to skip the call to find the most recently verified backup.

Also, fix some tests that broke in this process: remove the check for the snapshot time in TestBackupVerification that only makes sense when the Path is for a file that does not exist. Also, mock the appropriate functions in MockBackupVerification in TestBackupVerificationTask.

* Rename findLatestVerifiedBackup responding to review comments.

* Reveal hook to allow operators to restore just to the most recent snapshot (#1035)

* Remove unused code.

* Remove redundant comments and vertical whitespace.

* Remove debug comments and now-redundant logger, simplify if-else and tighten error message for code style.

* Use final where applicable and remove it where redundant.

* Remove redundant BackupRestoreException from getIncrementals method signature.

* Split getting incremental files and snapshot files into separate methods.

* Reveal hook to allow operators to restore to the last valid snapshot.

* Remove added non-shaded Guava dependency pursuant to review comments.

* minor code modifications to simplify the nfpriam spring boot migration

* make the constructor public

* remove the instance info from the DI (#1042)

* Always TTL backups. (#1038)

* Fix Github CI by explicitly creating necessary directories. (#1045)

---------

Co-authored-by: Cheng Wang <chengw@netflix.com>
Co-authored-by: Cheng Wang <107727158+chengw-netflix@users.noreply.github.com>
mattl-netflix added a commit that referenced this pull request Sep 6, 2023
…pshot (#1035)

* Remove unused code.

* Remove redundant comments and vertical whitespace.

* Remove debug comments and now-redundant logger, simplify if-else and tighten error message for code style.

* Use final where applicable and remove it where redundant.

* Remove redundant BackupRestoreException from getIncrementals method signature.

* Split getting incremental files and snapshot files into separate methods.

* Reveal hook to allow operators to restore to the last valid snapshot.

* Remove added non-shaded Guava dependency pursuant to review comments.
mattl-netflix added a commit that referenced this pull request Sep 15, 2023
* Remove redundant interfaces and swap log and notification lines (#1019)

* Remove EventGenerator Interface.

* Remove EventObserver Interface.

* Remove BackupEvent Interface.

* Send SNS notification on backup after logging to account for the possibility of an Exception while trying to notify.

* Use synchronized list for thread-safety (#1018)

This list of PartETag is modified on multiple threads, so it
needs to be thread-safe.
S3FileSystem already uses a synchronized list, so do the
same here.

* Log backup failures rather than ignoring them. (#1025)

* Update CHANGELOG in advance of 3.11.95

* Print cleaner stack trace on failure to upload. (#1027)

* Switch from com.google.inject to JSR-330 javax.inject annotations for better compatibility

* Update CHANGELOG.md

* Reveal property to enable auto_snapshot. (#1031)

* Fix backup verification race condition causing missing notifications (#1034)

* Remove metaproxy validation it is never null in practice.

* Remove DateRange validation. It is never null in practice.

* Remove debug logging.

* Remove latest backup metadata validation. It is never null in practice.

* Consolidate repeated code into private verifyBackup.

* Change method names to better reflect what they do.

* Update latestResult wherever possible.

* Rewrite logic in findLatestVerfiedBackup to make it look more like verifyBackupsInRange.

* Change signature of BackupNotificationMgr.notify to not depend on BackupVerificationResult.

* Return all verified BackupMetadata instead of BackupVerificationResult when verifying en masse. It has enough information to skip the call to find the most recently verified backup.

Also, fix some tests that broke in this process: remove the check for the snapshot time in TestBackupVerification that only makes sense when the Path is for a file that does not exist. Also, mock the appropriate functions in MockBackupVerification in TestBackupVerificationTask.

* Rename findLatestVerifiedBackup responding to review comments.

* Reveal hook to allow operators to restore just to the most recent snapshot (#1035)

* Remove unused code.

* Remove redundant comments and vertical whitespace.

* Remove debug comments and now-redundant logger, simplify if-else and tighten error message for code style.

* Use final where applicable and remove it where redundant.

* Remove redundant BackupRestoreException from getIncrementals method signature.

* Split getting incremental files and snapshot files into separate methods.

* Reveal hook to allow operators to restore to the last valid snapshot.

* Remove added non-shaded Guava dependency pursuant to review comments.

* minor code modifications to simplify the nfpriam spring boot migration

* Update CHANGELOG.md

* Update CHANGELOG.md

* make the constructor public

* Update CHANGELOG.md

* remove the instance info from the DI (#1042)

* Update CHANGELOG.md

* Always TTL backups. (#1038)

* Fix Github CI by explicitly creating necessary directories. (#1045)

* Change the interface of PriamScheduler (#1049)

Change the interface of PriamScheduler

* minor name change (#1051)

* Update CHANGELOG.md

* Increment cross regional duplicate tokens to replicate the policy we have been applying manually. (#1048)

* Increment cross regional duplicate tokens to replicate the policy we have been applying manually. Throw when duplicate tokens are created in region because that would be an obvious error and we should not add two nodes in the same region so closely together.

* Improve error message on intra-regional duplicate token.

* Update CHANGELOG in advance of 3.11.101

* Rollback #1042: Change the interface of EC2RoleAssumptionCredential (#1052)

* Fix snapshot location regression in SNS messages. (#1054)

* Update CHANGELOG in advance of 3.11.103

* change the CassandraMonitor to public (#1056)

* Update CHANGELOG.md

* Add new constructor (#1064)

* Update CHANGELOG.md

* Add disk_failure_policy config (#1065)

* Update CHANGELOG.md

* fix Gson serilization issue (#1067)

* Update CHANGELOG.md

* Make block_for_peers_timeout_in_secs a first-class tunable. (#1069)

* Update CHANGELOG in advance of 3.11.108

* Fix TokenRetrieverTest

---------

Co-authored-by: Ammar Khaku <akhaku@users.noreply.github.com>
Co-authored-by: Cheng Wang <chengw@netflix.com>
Co-authored-by: Cheng Wang <107727158+chengw-netflix@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants