Skip to content

Releases: Netflix/Priam

Increment cross regional duplicate tokens.

25 Apr 21:59
Compare
Choose a tag to compare

V1 Backups will be removed in the next release
Increment cross regional duplicate tokens to replicate the policy we have been applying manually. (#1048)

auto_snapshot, restore to snapshot, always ttl backups, avoid cross-regional duplicate tokens.

25 Apr 21:58
Compare
Choose a tag to compare

V1 Backups will be removed in the next release
Increment cross regional duplicate tokens to replicate the policy we have been applying manually. (#1048)
Fix Github CI by explicitly creating necessary directories. (#1045)
Always TTL backups. (#1038)
Fix backup verification race condition causing missing notifications (#1034)
Reveal hook to allow operators to restore just to the most recent snapshot (#1035)
Reveal property to enable auto_snapshot. (#1031)

3.11.100

24 Apr 17:52
68103bb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 3.11.99...3.11.100

3.11.99

11 Apr 22:49
c107cf3
Compare
Choose a tag to compare

What's Changed

Full Changelog: 3.11.98...3.11.99

3.11.98

31 Mar 05:55
373069a
Compare
Choose a tag to compare

What's Changed

Full Changelog: 3.11.97...3.11.98

3.11.97

30 Mar 18:21
b4fbf0c
Compare
Choose a tag to compare

What's Changed

Full Changelog: 3.11.96...3.11.97

Switch from com.google.inject to JSR-330 javax.inject annotations for better compatibility

28 Feb 17:15
bf69a1d
Compare
Choose a tag to compare

Switch from com.google.inject to JSR-330 javax.inject annotations for better compatibility for 3.11 branch

Switch from com.google.inject to JSR-330 javax.inject annotations for better compatibility

28 Feb 02:49
068dc51
Compare
Choose a tag to compare

Priam to support Cassandra 4.1

23 Feb 18:36
e4a6a13
Compare
Choose a tag to compare
Porting 3.0 changes to 4.0 branch (#1024)

* Backup TTL Service. Add encyprtion information in backup file messages, META_V2. Fix dateutils

* Introduction of MetaProxy layers for common functions.

Update the last modified time of manifest file to snapshot time.

* Remove deprecated code

* Backup Verification refactor

* Backup 2.0 Restore

* Add metric on CassandraConfig resource calls (#766)

* Add metric on CassandraConfig resource calls

* Support configure/tune complex parameters in cassandra.yaml.

* Add Cass SNAPSHOT JMX status, snapshot version, last validated timestamp. Changes to Servlet API.

* Add backup verification `force` field, ttl and info methods.

* Add backup version as enum, minor pr feedback

* changelog

* Do not throw NPE when no backup is found for the requested date.

* update to netflixoss 7.0.0

* Do not check existence of file if it is not SST_V2.
Note: AmazonServiceException can be for either Amazon unable to process request or for slow down. In either case, it is better to try to upload.

* Change the exception scope

* ninja fix from #766

ninja fix from https://github.com/Netflix/Priam/pull/766

* Put a cache in front of remote file system call of objectExist.
Add a rate limiter to s3 objectExist so we don't get slow down from S3.

* Provide an overwrite method to force Priam to replace a particular ip (#783)

This allows us to work around the A->B->C replacement problem where
Priam gets confused about who to replace

* BackupVerificationService (#784)

* move cassandra monitor to services

* Move JMX to connection package.

* Moving files to packages. No new functionality (#787)

* move cassandra monitor to services

* Move JMX to connection package.

* Output from the servlet should be JSON

* S3 - BucketLifecycleConfiguration has `prefix` method removed from latest library.

* Fix for ForgottenFiles when there are long running compaction job or Cassandra still reading from the files. (#789)

* Fix for ForgottenFiles when there are long running compaction job or Cassandra still reading from the files.

* clenaup code

* Use older API for prefix filtering, if prefix is available.

* Send notification only if we upload the file.

* No new functionality: Moving files

* move cassandra operations to connection

* Move to service module for Backups.

* move scheduling recurring jobs to service layer

* Hotfix: TTL for backup version 2.

Cassandra can flush files on file system like Index.db first and other component files later (like 30 mins). If there is a snapshot in between, then this "single" component file would not be part of snapshot as SSTable is still not part of cassandra's "view". Only if Cassandra could provide strong guarantees on file system such that  -
1. All component will be flushed to disk as real SSTables only if they are part of view. Until that happens all the files will be "tmp" files.
2. All component flushed will have same "last modified" file. i.e. on first flush. Stats.db can change over time and that is OK.

Since, this is the not the case, the TTL may end up deleting this file even though file are part of next snapshot. To avoid, this we add grace period (based on how long a compaction can run) when we delete the files.

* delete the files from internal cache when doing delete operation.

* increment backup verification failure metric

* Fix X->Y->Z issue. Replace nodes when gossip actually converges.

* Run backup TTL task on simple timer instead of CRON to avoid bombarding S3 with delete calls at same time and S3 chocking on us.

* Run backup verification or TTL only when we are not doing restore.

* Add API to clear the filesystem cache if required.

* expose api to get list of files

* Disable backups

* Priam will check Cassandra gossip information while grabbing pre-assigned token to decide if it should start Cassandra in bootstrap mode or in replace mode. Moved token owner inferring logic based on Cassandra gossip into a util class. Refactored InstanceIdentity.init() method. Fixed DeadTokenRetriever to use the IP from token database when minimum number of instances are not reachable.

* Changing the list in TokenRetrievalUtils to use wildcards.

* replacing replace_address with replace_address_first_boot which ignores doesn't try to bootstrap the node in replace mode if it is already bootstrapped successfully.

* rolling back the changes to use gossip info while grabbing pre-assigned and dead tokens.

* Change to gh-pages documentation instead of wiki

* Bug fix: It should be AND so that we don't go and try to fetch meta which do not exist.

(cherry picked from commit 015a9578dd495f6d3c119f3ef937a93e9cccf694)

* Removing functionality of incremental manifest file as no one uses it.

(cherry picked from commit 54d8b01147b1a3d6d49574bccd2222cfda790cbb)

* Fix the gossip status information.

(cherry picked from commit 3c1a8d2ef0b986a4cd025f65182648decb9dd281)

* Migrate travis to openjdk8 instead of oraclejdk8 (available in trusty only)

* changelog update

* update dependencies.

* bug fixes. 1> Change return type of partitioner call to text. 2> C* sometimes includes files which are 0 bytes long. 3> Chunk size needs to consider files which `inflate` after compression.

* Move flush and compactions to Service

(cherry picked from commit 6b4ac135258c9269a846b780fbffa2079eb7b801)

* Direct pinning to AWS Services

* Send "SNAPSHOT_VERIFIED" message when the snapshot is successfully verified by the backup system. This will ensure that downward dependencies can start consuming the backup.

(cherry picked from commit 2f1c7f26d3c663b54a2bb80c47e6f72c6d679dd0)

* A single consolidated commit with all changes required to support filtering notifications for certain BackupFileTypes.

* Fixing the issue with bootstrapping logic for updating the BackUpNotification component set.

* Updating ChangeLog and the config name to "priam.backupNotifyComponentIncludeList"

* Add hook in StandardTuner to allow for subclasses to add custom Cassandra parameters

* First commit of the changes necessary to support verifying all backups in the specified date range.

* First pass of the changes for verifying all backups in the specified date range.

* Fixing some broken tests.

* Addressing review comments that we should only verify any unverified backups in the given date range.

* Addressing some of the review comments.

* Adding some tests to address code coverage.

* Adding more tests to improve coverage and some code clean up in TestBackupVerificationTask.

* Code formatting.

* Adding another test to improve Project code coverage.

* Code formatting.

* Changelog for the changes related to Verifying all backups in the desired date range.

* Committing an update to change log to include the changes made by Matt for adding a hook to the StandardTuner to allow subclasses to add custom Cassandra parameters.

* CASS-1731 Handling the case when there are no backups taken yet and hence no verification results are available.

* CASS-1731 Handling the case when there are no backups to verify.

* CASS-1731 Addressing the review comments, we now have separated the scenarios where we would get empty backup verification results into 2 checks, 1) Due to all verified backups in our SLO date range, 2) Due to no backups in our SLO date range. We will continue to page for 2, but not for 1.

* CASS-1731 Adding a test for the get accessor of BackupMetrics in the BackUpVerificationTask class.

* CASS-1731 Modifying the logic based on review comments to 1) Notify for every verified backup, 2) If there are no verified backups in our SLO window we would page.

* CASS-1731 Removing a comment that is no longer needed.

* CASS-1731 Fixing the code formatting errors.

* CASS-1730 Changes to disable lifecycle rule if backup1.0 is disabled.

* CASS-1730 and CASS-1731 updating changelog.

* CASS-1799 Fixing the Priam config endpoints after they were broken due to a dependency update.

* CASS-1799 Addressing review comments no need to use TEXT_PLAIN, in this case since we already serialize to JSON.

* CASS-1799 Adding some tests for PriamConfig endpoints.

* CASS-1799 Changelog related commit.

* CASS-1799 Updating changelog commit date.

* CASS-1799 Changelog update for re-release of v3.11.60

* CASS-1799 Fixing the other Priam endpoints that were missed in the previous release. Adding tests for BackupServletV2 resource.

* CASS-1799 Attempting to address patch code coverage failure.

* CASS-1799 Fixing other Priam endpoints changelog related commit.

* Changing cass log dir env variable to converge with Cassandra upstream (#884)

Co-authored-by: Sumanth Pasupuleti <sumanth.pasupuleti.is@gmail.com>

* Support Tuning arbitrary Property Files in 3.11

* Update CHANGELOG for release 3.11.63

* Get CHANGELOG caught up to 3.11.66

* Update CHANGELOG for 3.11.67

* CASS-1752 First pass of changes to throw an exception when a node is bootstrapping to an existing token. Per current thoughts this change is going to held as a PR and only committed after the root cause is addressed in Cassandra if any.

* CASS-1752 Changelog related commit for PR #891.

* Make BackupVerificationTask log and emit when there is no verified backup within SLO. Cease requiring the backup to be fully in S3. Plus tidying tweaks.

* Fix the inferTokenOwnership method. Expose the results out so caller identity can make decision accordingly.

* Update CHANGELOG in advance of 3.11.69

* Throw when gossip unanimously says token is already owned by a live node.
C* will throw anyway in these cases and there is some chance that the current owner has yet to be taken offline.

* Update CHANGELOG in advance of 3.11.70

* Remove redundant log statements from CassandraAdmin (#902)

* CASS-1870 remove dead code, clean imports

* CASS-1870 correct typos

* CASS-1870 Remove noisy log statements from CassandraAdmin. All removed logs are redundant with what
JMXNodeTool produces and sometimes print entire stack traces when the Exception is not a surprise
(i.e., C* is not up yet)

* Update CHANGELOG in advance of 3.11.71

* CASS-1937 cease filtering out OpsCenter keyspace. (#908)

* CASS-1870 Quieter logs when we try to create a JMX connection before C* has started. (#910)

* CASS-1870 remove dead code, clean imports

* CASS-1870 Remove noisy log statements from CassandraAdmin. All removed logs are redundant with what
JMXNodeTool produces and sometimes print entire stack traces when the Exception is not a surprise
(i.e., C* is not up yet)

* CASS-1870 Quieter logs any time CassandraTunerService.updateServicePost is run before C* starts.

* Update CHANGELOG in advance of 3.11.72

* Migrate to travis-ci.com per their recommendations.

* Backup Secondary Indexes (#911)

* CASS-1906 Move CompressionAlgorithm and EncryptionAlgorithm to separate classes.

* CASS-1906 Remove dead code from AbstractBackupPath

* CASS-1906 Style tweaks

* CASS-1906 Add support for secondary index file type

* CASS-1906 Remove redundant methods from FileUploadResult.

* CASS-1906 Remove redundant methods from ColumnFamilyResult

* CASS-1906 Inline PrefixGenerator's only method, unnest try-catch blocks, Use ImmutableSetMultimap where applicable.

* CASS-1906 Upload secondary index files for V2 backups

* CASS-1906 Append '_V2' to 'SECONDARY_INDEX' enum value in BackupFileType to adhere to existing conventions.

* CASS-1906 Fix file depth bug found in e2e testing.

* Update CHANGELOG in advance of 3.11.73

* Remove redundant information about bootstrapping from InstanceState (#915)

* Adding support for custom override for role_manager (#918)

Co-authored-by: Sumanth Pasupuleti <spasupuleti@netflix.com>

* Upgrade nebula.netflixoss to replace bintray publication and update TravisCI Secrets

* Upgrade nebula.netflixoss to replace bintray publication and update TravisCI Secrets (#920)

* Revert "Adding support for custom override for role_manager (#918)"

This reverts commit 243afcf1f19ddc052f372108aac25d795d98e4d1.

* Revert "Revert "Adding support for custom override for role_manager (#918)""

This reverts commit b1b9b11e3183f987b6f06691d3d2dca9b465d8f1.

* CHANGELOG commit for 3.11.76

* Store private IPs in the token database. (#913)

* CASS-1828 Consolidate PriamInstance fetching into a single interface.

* CASS-1828 Remove DeadTokenRetriever interface.

* CASS-1828 Remove IPreGeneratedTokenRetriever interface

* CASS-1828 Remove INewTokenRetriever interface

* CASS-1828 remove populateRacMap

* CASS-1828 Remove sameHostPredicate

* CASS-1828 Move calls to gossip when finding preassigned token to separate function.

* CASS-1828 Change PriamInstance toString to include IP and shrink a log statement.

* CASS-1828 tighten up grabPreassignedToken.

* CASS-1828 make TokenRetriever use the same logic to get rac instances both when generating a dead token and a pregenerated token.

* CASS-1828 move gossip check from grabDeadToken to function.

* CASS-1828 Move deletion to separate function.

* CASS-1828 move token claiming to separate function

* CASS-1828 Remove redundant comments and log statements, plus some minor rearranging.

* CASS-1828 combine grabDeadToken and grabPreGeneratedToken

* CASS-1828 add tests of new token generation

* CASS-1828 Remove redundant method from ITokenManager interface and tighten up new token generation logic

* CASS-1828 use nullity of replace ip to determine whether to replace.

* CASS-1828 Ensure that pregenerated token is claimed when available and the dead token fails gossip check. This corrects a bug introduced when combining the erstwhile grabDeadToken and grabPregeneratedToken methods.

* CASS-1828 stop marking tokens dead and deleting them, begin updating atomically and reading consistently with SimpleDB.

* CASS-1828 Compare against both IPs when checking Gossip in assigned token case

* CASS-1828 update database when getting preassigned tokens

* CASS-1828 make DoubleRing inject InstanceInfo instead of InstanceIdentity.

* CASS-1828 use private IP when dictated by configuration.

* CASS-1828 remove redundant attachVolumes method from PriamInstanceFactory

* CASS-1828 remove redundant sort method from PriamInstanceFactory

* CASS-1828 remove redundant generics in PriamInstanceFactory

* CASS-1828 Make PriamInstanceFactory return a Set of PriamInstances rather than a List. More generally, use a Set of PriamInstances where applicable.

* CASS-1828 make IMembership return an ImmutableSet of IPs not a list.

* CASS-1828 make UpdateSecuritySettings always add current instance's IP to account for possibility of stale data in instance database. Remove extra call to instance database as well.

* Update CHANGELOG in advance of 3.11.77

* Replace JCenter with Maven Central

* Skip backup compression based on configuration (#922)

* CASS-1264 remove dead code.

* CASS-1264 Move buffer size constraint to the only place it is used and simplify its usage.

* CASS-1264 simplify getChunkSize()

* CASS-1264 style tweaks

* CASS-1264 remove redundant comments

* CASS-1264 change IBackupFileSystem download api to include compression information

* CASS-1264 pushing AbstractBackupPath (read compression info) further down. Plus change the getFileSize method to take a String to better accommodate existing usage patterns.

* CASS-1264 remove option to delete file after upload. It is always deleted in practice.

* CASS-1264 remove redundant Path parameters from upload api. They are embedded in the AbstractBackupPath in practice.

* CASS-1264 push compression info all the way down to where it is needed.

* CASS-1264 skip uncompression on download.

* CASS-1264 Skip compression on backup based on desires in path.

* CASS-1264 choose compression behavior on backup based on fast property rather tahn just defaulting to Snappy always.

* CASS-1264 a little reorganization in advance of making metafiles aware of varying compression.

* CASS-1264 Make metafile display compression algorithm properly

* CASS-1264 Limit conditional compression to V2 backups.

* CASS-1264 Addressing review comments.

* CASS-1264 ensure disctinction between creation time and last modified in FileUploadResult. They may not always be the same in practice.

* CASS-1264 rename BackupsToCompress.UNCOMPRESSED to IF_REQUIRED per review feedback

* CASS-1264 Rename CompressionAlgorithm enum to CompressionType per review comments

* Update secrets from 3.x branch (#942)

* Verify thrift listening on port before returning its status (#943)

* Make disk_acess_mode tunable via fast property (#944)

* Ingress rule configurability (#940)

* Improve the configurability of setting ingress rules

* CASS-1608 derive public ip, toggle ingressing public ips exclusively

* CASS-1608 Responding to review comments. Publish a metric with the ACL size, publish log warnings when we receive unexpected http status codes when trying to update acls.

* Fix file descriptor leak introduced in PR #922 (#945)

* Delete secondary index backup directories after uploading files (#941)

* CASS-2201 Consolidate logic to find secondary index directories

* CASS-2201 use correct file type for secondary index files in IncrementalBackup

* CASS-2201 Delete empty secondary index backup directories

* Responding to review comments and ensuring we wait for completion in IncrementalBackup.

* Changelog commit in advance of 3.11.79

* Revert behavior back to truncating milliseconds in backups last modified time. (#948)

* CHANGELOG commit in advance of 3.11.80

* Fix overflow problem in restore. (#952)

* Updating CHANGELOG in advance of 3.11.81

* Use different versions of some preconditions checks to keep code consistent with 2.1 branch. (#958)

* CASS-2153 Don't throw on failure to delete old backup directories. (#956)

* Only upload secondary index directories if they exist in the main data directory. (#955)

* Preliminary refactoring in advance of fixing si directory name bug

* Fix filtering issue with secondary index directory names.

* Switch to secondary index backup policy of uploading all readable directories that start with a '.'

* Update CHANGELOG in advance of 3.11.82

* Cease printing column family names in metafiles in place of keyspace names. (#960)

* Update CHANGELOG in advance of 3.11.83

* rotate TravisCI secrets

* Remove TravisCI and use Github Actions

* CASS-2201 Ensure SI backup directories are deleted when empty (#961)

* Update CHANGELOG in advance of 3.11.84 release (ensure secondary index directories are deleted once empty)

* Return private host name and ip in cases where public versions do not exist. Do this instead of throwing. (#970)

* Throw in case of gossip mismatch when determining replace_ip in assigned token case. (#973)

* CASS-1752 Throw in case of gossip mismatch when determining replace_ip in assigned token case.

* CASS-1752 Toggle throw-on-mismatch with a property.

* Add test of two-node mismatch case

* Update CHANGELOG in advance of 3.11.85

* Use IMDSV2 to get instance metadata (#977)

Co-authored-by: ayushis <ayushis@netflix.com>

* Update CHANGELOG in advance of 3.11.86

* compute region if not cached (#979)

Co-authored-by: ayushis <ayushis@netflix.com>

* Update CHANGELOG in advance of 3.11.87

* Update CHANGELOG in advance of 3.11.88

* Add Cassandra 4.0

* relocate jvm.options file and remove deprecated functions

* Updates to Healthchecks and Admin for thrift removal

* Remove deprecated Cassandra.yaml options from standard tuner

* changes in test health

* clean up obsolete info in casandra.yaml

* refrences to thrift removed,changes realted to nodetool commands

* Adding google in the repo list for dependencies

* Remove TokenRetrieverBase (#981)

* Dynamic Rate Limiting of Snapshots (#975)

* CASS-2011 update interface to accommodate dependency change

* CASS-2011 move backup cleaning outside of getTimer method

* CASS-2011 introduce a target time for upload completion. Currently always the epoch which implies a no-op

* CASS-2011 Change AbstractFileSystem API to pass target instant to fileUploadImpl

* CASS-2011 Delete already-uploaded files to get accurate estimates of remaining bytes to upload.

* CASS-2011 create a rate limiter that dynamically adjusts its throttle based on the bytes still to upload in all remaining snapshots and a user-specified target time. Adjust the throttle only when we've deviated by a user-configurable threshold to ensure the rate limiter is not constantly adjusted. In that case, it would be redundant as it does not throttle the subsequent file after an adjustment. Ensure that the target does not exceed the earlier of the next scheduled snapshot or the time at which we would fail to meet our backup verification SLO.

* CASS-2011 Add unit tests and associated refactoring for BackupDynamicRateLimiter.

* Optionally Add Content-MD5 header to uploads of files smaller than the backupChunkSize. (#985)

* Preliminary refactoring in advance of adding Content-MD5 header to single part uploads. That header is required when objects have a retention period.

* Optionally add md5 header for direct puts.

* Update CHANGELOG in advance of 3.11.89

* Update comment with crucial part of contract. (#989)

* Capitalize 'f' in ColumnfamilyResult pursuant to old review comment. (#991)

* CASS-2805 Add hook to optionally skip deletion for files added within a certain window. (#992)

* Update CHANGELOG in advance of 3.11.90

* streaming_socket_timeout_in_ms removed

* Fix racy behavior and remove redundant system call to get the time. (#995)

* Revert "CASS-2805 Add hook to optionally skip deletion for files added within a certain window. (#992)" (#997)

This reverts commit 57b26a4f42dc5457ade7b0ebf68b8fcada2f1743.

* Remove UpdateSecuritySettings (#966)

* Use IP to get gossip info rather than hostname. (#1001)

* Spread deletes over the interval depending on where the node is in the ring. (#1002)

* Update CHANGELOG in advance of 3.11.91

* RandomPartitioner creates tokens that don't always fit into a long so we use BigInteger to store and compare them. (#1005)

* Update CHANGELOG in advance of 3.11.92

* Pass millis to sleep instead of seconds. (#1009)

* Identify incrementals, optionally skip metafile compression, and upload data files last. (#999)

* Reveal whether an sstable file is an incremental backup in SNS notification.

* Remove retry parameter from upload method. It is always 10 in practice and there is no way to change it without a code change.

* Move upload logic out of AbstractBackup into an interface called BackupHelper.

* Upload data files last.

* Don't compress anything when backupsToCompress is set to None.

* Adding ability to add more keys in the message attributes for backup notification messages (#1006)

* Adding ability to add more keys in the message attributes for backup notification messages

* Formatting as per google java format

* Incorporating review comments

* Fixing failing tests in TestBackupNotificationMgr

* Use ImmutableSet instead of ImmutableList to store additional message attributes. (#1010)

* Use an ImmutableSet rather than an ImmutableList of additional message attributes when sending SNS backup notifications.

* Use ImmutableSet rather than ImmutableList to store additional message attributes.

* Update CHANGELOG in advance of 3.11.93

* Ensure SI files get into meta file properly. (#1012)

* Create operator-specifiable time such that if a backup file was written before then it is automatically compressed using SNAPPY before upload. (#1013)

* Update CHANGELOG in advance of 3.11.94

* reply only on replace_address_first_boot for replacement

* reply only on replace_address_first_boot for replacement

* cassandra-all version changed to 4.1.0

* changes for 4.x

---------

Co-authored-by: Arun Agrawal <aagrawal@netflix.com>
Co-authored-by: Arun Agrawal <arunagrawal.84@gmail.com>
Co-authored-by: Vinay Chella <vinaykumarcse@gmail.com>
Co-authored-by: Chandrasekhar Thumuluru <chandra.thumuluru@gmail.com>
Co-authored-by: Chandrasekhar Thumuluru <7935907+cthumuluru@users.noreply.github.com>
Co-authored-by: Chandrasekhar Thumuluru <chandra.thumuluru+github@gmail.com>
Co-authored-by: Sreeram Chakrovorthy <schakrovorthy@netflix.com>
Co-authored-by: schakrovorthy <12820070+schakrovorthy@users.noreply.github.com>
Co-authored-by: Sumanth Pasupuleti <sumanth.pasupuleti.is@gmail.com>
Co-authored-by: Joseph Lynch <josephl@netflix.com>
Co-authored-by: Sumanth Pasupuleti <spasupuleti@netflix.com>
Co-authored-by: Roberto Perez Alcolea <rperezalcolea@netflix.com>
Co-authored-by: Martin Chalupa <chalimartines@gmail.com>
Co-authored-by: ayushisingh29 <ayushi.a29s@gmail.com>
Co-authored-by: ayushis <ayushis@netflix.com>
Co-authored-by: Satyajit Thadeshwar <sthadeshwar@users.noreply.github.com>

Priam to support Cassandra 4.1

23 Feb 18:37
e4a6a13
Compare
Choose a tag to compare
Porting 3.0 changes to 4.0 branch (#1024)

* Backup TTL Service. Add encyprtion information in backup file messages, META_V2. Fix dateutils

* Introduction of MetaProxy layers for common functions.

Update the last modified time of manifest file to snapshot time.

* Remove deprecated code

* Backup Verification refactor

* Backup 2.0 Restore

* Add metric on CassandraConfig resource calls (#766)

* Add metric on CassandraConfig resource calls

* Support configure/tune complex parameters in cassandra.yaml.

* Add Cass SNAPSHOT JMX status, snapshot version, last validated timestamp. Changes to Servlet API.

* Add backup verification `force` field, ttl and info methods.

* Add backup version as enum, minor pr feedback

* changelog

* Do not throw NPE when no backup is found for the requested date.

* update to netflixoss 7.0.0

* Do not check existence of file if it is not SST_V2.
Note: AmazonServiceException can be for either Amazon unable to process request or for slow down. In either case, it is better to try to upload.

* Change the exception scope

* ninja fix from #766

ninja fix from https://github.com/Netflix/Priam/pull/766

* Put a cache in front of remote file system call of objectExist.
Add a rate limiter to s3 objectExist so we don't get slow down from S3.

* Provide an overwrite method to force Priam to replace a particular ip (#783)

This allows us to work around the A->B->C replacement problem where
Priam gets confused about who to replace

* BackupVerificationService (#784)

* move cassandra monitor to services

* Move JMX to connection package.

* Moving files to packages. No new functionality (#787)

* move cassandra monitor to services

* Move JMX to connection package.

* Output from the servlet should be JSON

* S3 - BucketLifecycleConfiguration has `prefix` method removed from latest library.

* Fix for ForgottenFiles when there are long running compaction job or Cassandra still reading from the files. (#789)

* Fix for ForgottenFiles when there are long running compaction job or Cassandra still reading from the files.

* clenaup code

* Use older API for prefix filtering, if prefix is available.

* Send notification only if we upload the file.

* No new functionality: Moving files

* move cassandra operations to connection

* Move to service module for Backups.

* move scheduling recurring jobs to service layer

* Hotfix: TTL for backup version 2.

Cassandra can flush files on file system like Index.db first and other component files later (like 30 mins). If there is a snapshot in between, then this "single" component file would not be part of snapshot as SSTable is still not part of cassandra's "view". Only if Cassandra could provide strong guarantees on file system such that  -
1. All component will be flushed to disk as real SSTables only if they are part of view. Until that happens all the files will be "tmp" files.
2. All component flushed will have same "last modified" file. i.e. on first flush. Stats.db can change over time and that is OK.

Since, this is the not the case, the TTL may end up deleting this file even though file are part of next snapshot. To avoid, this we add grace period (based on how long a compaction can run) when we delete the files.

* delete the files from internal cache when doing delete operation.

* increment backup verification failure metric

* Fix X->Y->Z issue. Replace nodes when gossip actually converges.

* Run backup TTL task on simple timer instead of CRON to avoid bombarding S3 with delete calls at same time and S3 chocking on us.

* Run backup verification or TTL only when we are not doing restore.

* Add API to clear the filesystem cache if required.

* expose api to get list of files

* Disable backups

* Priam will check Cassandra gossip information while grabbing pre-assigned token to decide if it should start Cassandra in bootstrap mode or in replace mode. Moved token owner inferring logic based on Cassandra gossip into a util class. Refactored InstanceIdentity.init() method. Fixed DeadTokenRetriever to use the IP from token database when minimum number of instances are not reachable.

* Changing the list in TokenRetrievalUtils to use wildcards.

* replacing replace_address with replace_address_first_boot which ignores doesn't try to bootstrap the node in replace mode if it is already bootstrapped successfully.

* rolling back the changes to use gossip info while grabbing pre-assigned and dead tokens.

* Change to gh-pages documentation instead of wiki

* Bug fix: It should be AND so that we don't go and try to fetch meta which do not exist.

(cherry picked from commit 015a9578dd495f6d3c119f3ef937a93e9cccf694)

* Removing functionality of incremental manifest file as no one uses it.

(cherry picked from commit 54d8b01147b1a3d6d49574bccd2222cfda790cbb)

* Fix the gossip status information.

(cherry picked from commit 3c1a8d2ef0b986a4cd025f65182648decb9dd281)

* Migrate travis to openjdk8 instead of oraclejdk8 (available in trusty only)

* changelog update

* update dependencies.

* bug fixes. 1> Change return type of partitioner call to text. 2> C* sometimes includes files which are 0 bytes long. 3> Chunk size needs to consider files which `inflate` after compression.

* Move flush and compactions to Service

(cherry picked from commit 6b4ac135258c9269a846b780fbffa2079eb7b801)

* Direct pinning to AWS Services

* Send "SNAPSHOT_VERIFIED" message when the snapshot is successfully verified by the backup system. This will ensure that downward dependencies can start consuming the backup.

(cherry picked from commit 2f1c7f26d3c663b54a2bb80c47e6f72c6d679dd0)

* A single consolidated commit with all changes required to support filtering notifications for certain BackupFileTypes.

* Fixing the issue with bootstrapping logic for updating the BackUpNotification component set.

* Updating ChangeLog and the config name to "priam.backupNotifyComponentIncludeList"

* Add hook in StandardTuner to allow for subclasses to add custom Cassandra parameters

* First commit of the changes necessary to support verifying all backups in the specified date range.

* First pass of the changes for verifying all backups in the specified date range.

* Fixing some broken tests.

* Addressing review comments that we should only verify any unverified backups in the given date range.

* Addressing some of the review comments.

* Adding some tests to address code coverage.

* Adding more tests to improve coverage and some code clean up in TestBackupVerificationTask.

* Code formatting.

* Adding another test to improve Project code coverage.

* Code formatting.

* Changelog for the changes related to Verifying all backups in the desired date range.

* Committing an update to change log to include the changes made by Matt for adding a hook to the StandardTuner to allow subclasses to add custom Cassandra parameters.

* CASS-1731 Handling the case when there are no backups taken yet and hence no verification results are available.

* CASS-1731 Handling the case when there are no backups to verify.

* CASS-1731 Addressing the review comments, we now have separated the scenarios where we would get empty backup verification results into 2 checks, 1) Due to all verified backups in our SLO date range, 2) Due to no backups in our SLO date range. We will continue to page for 2, but not for 1.

* CASS-1731 Adding a test for the get accessor of BackupMetrics in the BackUpVerificationTask class.

* CASS-1731 Modifying the logic based on review comments to 1) Notify for every verified backup, 2) If there are no verified backups in our SLO window we would page.

* CASS-1731 Removing a comment that is no longer needed.

* CASS-1731 Fixing the code formatting errors.

* CASS-1730 Changes to disable lifecycle rule if backup1.0 is disabled.

* CASS-1730 and CASS-1731 updating changelog.

* CASS-1799 Fixing the Priam config endpoints after they were broken due to a dependency update.

* CASS-1799 Addressing review comments no need to use TEXT_PLAIN, in this case since we already serialize to JSON.

* CASS-1799 Adding some tests for PriamConfig endpoints.

* CASS-1799 Changelog related commit.

* CASS-1799 Updating changelog commit date.

* CASS-1799 Changelog update for re-release of v3.11.60

* CASS-1799 Fixing the other Priam endpoints that were missed in the previous release. Adding tests for BackupServletV2 resource.

* CASS-1799 Attempting to address patch code coverage failure.

* CASS-1799 Fixing other Priam endpoints changelog related commit.

* Changing cass log dir env variable to converge with Cassandra upstream (#884)

Co-authored-by: Sumanth Pasupuleti <sumanth.pasupuleti.is@gmail.com>

* Support Tuning arbitrary Property Files in 3.11

* Update CHANGELOG for release 3.11.63

* Get CHANGELOG caught up to 3.11.66

* Update CHANGELOG for 3.11.67

* CASS-1752 First pass of changes to throw an exception when a node is bootstrapping to an existing token. Per current thoughts this change is going to held as a PR and only committed after the root cause is addressed in Cassandra if any.

* CASS-1752 Changelog related commit for PR #891.

* Make BackupVerificationTask log and emit when there is no verified backup within SLO. Cease requiring the backup to be fully in S3. Plus tidying tweaks.

* Fix the inferTokenOwnership method. Expose the results out so caller identity can make decision accordingly.

* Update CHANGELOG in advance of 3.11.69

* Throw when gossip unanimously says token is already owned by a live node.
C* will throw anyway in these cases and there is some chance that the current owner has yet to be taken offline.

* Update CHANGELOG in advance of 3.11.70

* Remove redundant log statements from CassandraAdmin (#902)

* CASS-1870 remove dead code, clean imports

* CASS-1870 correct typos

* CASS-1870 Remove noisy log statements from CassandraAdmin. All removed logs are redundant with what
JMXNodeTool produces and sometimes print entire stack traces when the Exception is not a surprise
(i.e., C* is not up yet)

* Update CHANGELOG in advance of 3.11.71

* CASS-1937 cease filtering out OpsCenter keyspace. (#908)

* CASS-1870 Quieter logs when we try to create a JMX connection before C* has started. (#910)

* CASS-1870 remove dead code, clean imports

* CASS-1870 Remove noisy log statements from CassandraAdmin. All removed logs are redundant with what
JMXNodeTool produces and sometimes print entire stack traces when the Exception is not a surprise
(i.e., C* is not up yet)

* CASS-1870 Quieter logs any time CassandraTunerService.updateServicePost is run before C* starts.

* Update CHANGELOG in advance of 3.11.72

* Migrate to travis-ci.com per their recommendations.

* Backup Secondary Indexes (#911)

* CASS-1906 Move CompressionAlgorithm and EncryptionAlgorithm to separate classes.

* CASS-1906 Remove dead code from AbstractBackupPath

* CASS-1906 Style tweaks

* CASS-1906 Add support for secondary index file type

* CASS-1906 Remove redundant methods from FileUploadResult.

* CASS-1906 Remove redundant methods from ColumnFamilyResult

* CASS-1906 Inline PrefixGenerator's only method, unnest try-catch blocks, Use ImmutableSetMultimap where applicable.

* CASS-1906 Upload secondary index files for V2 backups

* CASS-1906 Append '_V2' to 'SECONDARY_INDEX' enum value in BackupFileType to adhere to existing conventions.

* CASS-1906 Fix file depth bug found in e2e testing.

* Update CHANGELOG in advance of 3.11.73

* Remove redundant information about bootstrapping from InstanceState (#915)

* Adding support for custom override for role_manager (#918)

Co-authored-by: Sumanth Pasupuleti <spasupuleti@netflix.com>

* Upgrade nebula.netflixoss to replace bintray publication and update TravisCI Secrets

* Upgrade nebula.netflixoss to replace bintray publication and update TravisCI Secrets (#920)

* Revert "Adding support for custom override for role_manager (#918)"

This reverts commit 243afcf1f19ddc052f372108aac25d795d98e4d1.

* Revert "Revert "Adding support for custom override for role_manager (#918)""

This reverts commit b1b9b11e3183f987b6f06691d3d2dca9b465d8f1.

* CHANGELOG commit for 3.11.76

* Store private IPs in the token database. (#913)

* CASS-1828 Consolidate PriamInstance fetching into a single interface.

* CASS-1828 Remove DeadTokenRetriever interface.

* CASS-1828 Remove IPreGeneratedTokenRetriever interface

* CASS-1828 Remove INewTokenRetriever interface

* CASS-1828 remove populateRacMap

* CASS-1828 Remove sameHostPredicate

* CASS-1828 Move calls to gossip when finding preassigned token to separate function.

* CASS-1828 Change PriamInstance toString to include IP and shrink a log statement.

* CASS-1828 tighten up grabPreassignedToken.

* CASS-1828 make TokenRetriever use the same logic to get rac instances both when generating a dead token and a pregenerated token.

* CASS-1828 move gossip check from grabDeadToken to function.

* CASS-1828 Move deletion to separate function.

* CASS-1828 move token claiming to separate function

* CASS-1828 Remove redundant comments and log statements, plus some minor rearranging.

* CASS-1828 combine grabDeadToken and grabPreGeneratedToken

* CASS-1828 add tests of new token generation

* CASS-1828 Remove redundant method from ITokenManager interface and tighten up new token generation logic

* CASS-1828 use nullity of replace ip to determine whether to replace.

* CASS-1828 Ensure that pregenerated token is claimed when available and the dead token fails gossip check. This corrects a bug introduced when combining the erstwhile grabDeadToken and grabPregeneratedToken methods.

* CASS-1828 stop marking tokens dead and deleting them, begin updating atomically and reading consistently with SimpleDB.

* CASS-1828 Compare against both IPs when checking Gossip in assigned token case

* CASS-1828 update database when getting preassigned tokens

* CASS-1828 make DoubleRing inject InstanceInfo instead of InstanceIdentity.

* CASS-1828 use private IP when dictated by configuration.

* CASS-1828 remove redundant attachVolumes method from PriamInstanceFactory

* CASS-1828 remove redundant sort method from PriamInstanceFactory

* CASS-1828 remove redundant generics in PriamInstanceFactory

* CASS-1828 Make PriamInstanceFactory return a Set of PriamInstances rather than a List. More generally, use a Set of PriamInstances where applicable.

* CASS-1828 make IMembership return an ImmutableSet of IPs not a list.

* CASS-1828 make UpdateSecuritySettings always add current instance's IP to account for possibility of stale data in instance database. Remove extra call to instance database as well.

* Update CHANGELOG in advance of 3.11.77

* Replace JCenter with Maven Central

* Skip backup compression based on configuration (#922)

* CASS-1264 remove dead code.

* CASS-1264 Move buffer size constraint to the only place it is used and simplify its usage.

* CASS-1264 simplify getChunkSize()

* CASS-1264 style tweaks

* CASS-1264 remove redundant comments

* CASS-1264 change IBackupFileSystem download api to include compression information

* CASS-1264 pushing AbstractBackupPath (read compression info) further down. Plus change the getFileSize method to take a String to better accommodate existing usage patterns.

* CASS-1264 remove option to delete file after upload. It is always deleted in practice.

* CASS-1264 remove redundant Path parameters from upload api. They are embedded in the AbstractBackupPath in practice.

* CASS-1264 push compression info all the way down to where it is needed.

* CASS-1264 skip uncompression on download.

* CASS-1264 Skip compression on backup based on desires in path.

* CASS-1264 choose compression behavior on backup based on fast property rather tahn just defaulting to Snappy always.

* CASS-1264 a little reorganization in advance of making metafiles aware of varying compression.

* CASS-1264 Make metafile display compression algorithm properly

* CASS-1264 Limit conditional compression to V2 backups.

* CASS-1264 Addressing review comments.

* CASS-1264 ensure disctinction between creation time and last modified in FileUploadResult. They may not always be the same in practice.

* CASS-1264 rename BackupsToCompress.UNCOMPRESSED to IF_REQUIRED per review feedback

* CASS-1264 Rename CompressionAlgorithm enum to CompressionType per review comments

* Update secrets from 3.x branch (#942)

* Verify thrift listening on port before returning its status (#943)

* Make disk_acess_mode tunable via fast property (#944)

* Ingress rule configurability (#940)

* Improve the configurability of setting ingress rules

* CASS-1608 derive public ip, toggle ingressing public ips exclusively

* CASS-1608 Responding to review comments. Publish a metric with the ACL size, publish log warnings when we receive unexpected http status codes when trying to update acls.

* Fix file descriptor leak introduced in PR #922 (#945)

* Delete secondary index backup directories after uploading files (#941)

* CASS-2201 Consolidate logic to find secondary index directories

* CASS-2201 use correct file type for secondary index files in IncrementalBackup

* CASS-2201 Delete empty secondary index backup directories

* Responding to review comments and ensuring we wait for completion in IncrementalBackup.

* Changelog commit in advance of 3.11.79

* Revert behavior back to truncating milliseconds in backups last modified time. (#948)

* CHANGELOG commit in advance of 3.11.80

* Fix overflow problem in restore. (#952)

* Updating CHANGELOG in advance of 3.11.81

* Use different versions of some preconditions checks to keep code consistent with 2.1 branch. (#958)

* CASS-2153 Don't throw on failure to delete old backup directories. (#956)

* Only upload secondary index directories if they exist in the main data directory. (#955)

* Preliminary refactoring in advance of fixing si directory name bug

* Fix filtering issue with secondary index directory names.

* Switch to secondary index backup policy of uploading all readable directories that start with a '.'

* Update CHANGELOG in advance of 3.11.82

* Cease printing column family names in metafiles in place of keyspace names. (#960)

* Update CHANGELOG in advance of 3.11.83

* rotate TravisCI secrets

* Remove TravisCI and use Github Actions

* CASS-2201 Ensure SI backup directories are deleted when empty (#961)

* Update CHANGELOG in advance of 3.11.84 release (ensure secondary index directories are deleted once empty)

* Return private host name and ip in cases where public versions do not exist. Do this instead of throwing. (#970)

* Throw in case of gossip mismatch when determining replace_ip in assigned token case. (#973)

* CASS-1752 Throw in case of gossip mismatch when determining replace_ip in assigned token case.

* CASS-1752 Toggle throw-on-mismatch with a property.

* Add test of two-node mismatch case

* Update CHANGELOG in advance of 3.11.85

* Use IMDSV2 to get instance metadata (#977)

Co-authored-by: ayushis <ayushis@netflix.com>

* Update CHANGELOG in advance of 3.11.86

* compute region if not cached (#979)

Co-authored-by: ayushis <ayushis@netflix.com>

* Update CHANGELOG in advance of 3.11.87

* Update CHANGELOG in advance of 3.11.88

* Add Cassandra 4.0

* relocate jvm.options file and remove deprecated functions

* Updates to Healthchecks and Admin for thrift removal

* Remove deprecated Cassandra.yaml options from standard tuner

* changes in test health

* clean up obsolete info in casandra.yaml

* refrences to thrift removed,changes realted to nodetool commands

* Adding google in the repo list for dependencies

* Remove TokenRetrieverBase (#981)

* Dynamic Rate Limiting of Snapshots (#975)

* CASS-2011 update interface to accommodate dependency change

* CASS-2011 move backup cleaning outside of getTimer method

* CASS-2011 introduce a target time for upload completion. Currently always the epoch which implies a no-op

* CASS-2011 Change AbstractFileSystem API to pass target instant to fileUploadImpl

* CASS-2011 Delete already-uploaded files to get accurate estimates of remaining bytes to upload.

* CASS-2011 create a rate limiter that dynamically adjusts its throttle based on the bytes still to upload in all remaining snapshots and a user-specified target time. Adjust the throttle only when we've deviated by a user-configurable threshold to ensure the rate limiter is not constantly adjusted. In that case, it would be redundant as it does not throttle the subsequent file after an adjustment. Ensure that the target does not exceed the earlier of the next scheduled snapshot or the time at which we would fail to meet our backup verification SLO.

* CASS-2011 Add unit tests and associated refactoring for BackupDynamicRateLimiter.

* Optionally Add Content-MD5 header to uploads of files smaller than the backupChunkSize. (#985)

* Preliminary refactoring in advance of adding Content-MD5 header to single part uploads. That header is required when objects have a retention period.

* Optionally add md5 header for direct puts.

* Update CHANGELOG in advance of 3.11.89

* Update comment with crucial part of contract. (#989)

* Capitalize 'f' in ColumnfamilyResult pursuant to old review comment. (#991)

* CASS-2805 Add hook to optionally skip deletion for files added within a certain window. (#992)

* Update CHANGELOG in advance of 3.11.90

* streaming_socket_timeout_in_ms removed

* Fix racy behavior and remove redundant system call to get the time. (#995)

* Revert "CASS-2805 Add hook to optionally skip deletion for files added within a certain window. (#992)" (#997)

This reverts commit 57b26a4f42dc5457ade7b0ebf68b8fcada2f1743.

* Remove UpdateSecuritySettings (#966)

* Use IP to get gossip info rather than hostname. (#1001)

* Spread deletes over the interval depending on where the node is in the ring. (#1002)

* Update CHANGELOG in advance of 3.11.91

* RandomPartitioner creates tokens that don't always fit into a long so we use BigInteger to store and compare them. (#1005)

* Update CHANGELOG in advance of 3.11.92

* Pass millis to sleep instead of seconds. (#1009)

* Identify incrementals, optionally skip metafile compression, and upload data files last. (#999)

* Reveal whether an sstable file is an incremental backup in SNS notification.

* Remove retry parameter from upload method. It is always 10 in practice and there is no way to change it without a code change.

* Move upload logic out of AbstractBackup into an interface called BackupHelper.

* Upload data files last.

* Don't compress anything when backupsToCompress is set to None.

* Adding ability to add more keys in the message attributes for backup notification messages (#1006)

* Adding ability to add more keys in the message attributes for backup notification messages

* Formatting as per google java format

* Incorporating review comments

* Fixing failing tests in TestBackupNotificationMgr

* Use ImmutableSet instead of ImmutableList to store additional message attributes. (#1010)

* Use an ImmutableSet rather than an ImmutableList of additional message attributes when sending SNS backup notifications.

* Use ImmutableSet rather than ImmutableList to store additional message attributes.

* Update CHANGELOG in advance of 3.11.93

* Ensure SI files get into meta file properly. (#1012)

* Create operator-specifiable time such that if a backup file was written before then it is automatically compressed using SNAPPY before upload. (#1013)

* Update CHANGELOG in advance of 3.11.94

* reply only on replace_address_first_boot for replacement

* reply only on replace_address_first_boot for replacement

* cassandra-all version changed to 4.1.0

* changes for 4.x

---------

Co-authored-by: Arun Agrawal <aagrawal@netflix.com>
Co-authored-by: Arun Agrawal <arunagrawal.84@gmail.com>
Co-authored-by: Vinay Chella <vinaykumarcse@gmail.com>
Co-authored-by: Chandrasekhar Thumuluru <chandra.thumuluru@gmail.com>
Co-authored-by: Chandrasekhar Thumuluru <7935907+cthumuluru@users.noreply.github.com>
Co-authored-by: Chandrasekhar Thumuluru <chandra.thumuluru+github@gmail.com>
Co-authored-by: Sreeram Chakrovorthy <schakrovorthy@netflix.com>
Co-authored-by: schakrovorthy <12820070+schakrovorthy@users.noreply.github.com>
Co-authored-by: Sumanth Pasupuleti <sumanth.pasupuleti.is@gmail.com>
Co-authored-by: Joseph Lynch <josephl@netflix.com>
Co-authored-by: Sumanth Pasupuleti <spasupuleti@netflix.com>
Co-authored-by: Roberto Perez Alcolea <rperezalcolea@netflix.com>
Co-authored-by: Martin Chalupa <chalimartines@gmail.com>
Co-authored-by: ayushisingh29 <ayushi.a29s@gmail.com>
Co-authored-by: ayushis <ayushis@netflix.com>
Co-authored-by: Satyajit Thadeshwar <sthadeshwar@users.noreply.github.com>