Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSANDRA-19472 - CEP-15 (C*) Integrate accord with repair #3174

Closed
wants to merge 1 commit into from

Conversation

bdeggleston
Copy link
Member

Wires accord repair into the normal repair path and removes the distinction between an accord repair job and a cassandra repair job, this makes both repair types coexist

{
accordRepair = paxosRepair.flatMap(unused -> {
logger.info("{} {}.{} starting accord repair", session.previewKind.logPrefix(session.getId()), desc.keyspace, desc.columnFamily);
IPartitioner partitioner = desc.ranges.iterator().next().left.getPartitioner();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IPartitioner partitioner = desc.ranges.iterator().next().left.getPartitioner();
IPartitioner partitioner = metadata.getPartitioner();

rather than walking ranges, we know the table why not pull there?

@@ -265,7 +265,7 @@ private AccordService(Id localId)
{
Invariants.checkState(localId != null, "static localId must be set before instantiating AccordService");
logger.info("Starting accord with nodeId {}", localId);
AccordAgent agent = new AccordAgent();
AccordAgent agent = FBUtilities.construct(System.getProperty("cassandra.accord.agent", AccordAgent.class.getName()), "AccordAgent");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will no longer build as we don't allow random system properties in the code anymore; check style should block this. Need to migrate to the related properties enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if this is for testing and not for pluggability, maybe cassandra.test prefix?

IPartitioner partitioner = desc.ranges.iterator().next().left.getPartitioner();
this.ranges = AccordTopology.toAccordRanges(desc.keyspace, desc.ranges);
this.splitter = partitioner.accordSplitter().apply(ranges);
this.listener = listener != null ? listener : Listener.NOOP;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listener is currently null, we do the following

AccordRepair repair = new AccordRepair(null, partitioner, desc.keyspace, desc.ranges);

This logic dropped metrics in favor of listeners, but doesn't have the listeners do anything yet...

throw new RuntimeException(t);
}
finally
{
cfs.metric.rangeMigration.addNano(start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why drop metric? Also I think it forgot to do end - start?

Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big improvement having the Accord repair sequenced before the regular repair so that regular repair can ensure the accord stuff is applied at CL.ALL since it will only do it at quorum.

Some correctness issues primarily around Paxos repair alone not being sufficient for migration.

src/java/org/apache/cassandra/repair/RepairJob.java Outdated Show resolved Hide resolved

public static ConsensusMigrationRepairResult fromPaxosOnlyRepair(Epoch minEpoch, boolean deadNodesExcluded)
{
return fromRepair(minEpoch, true, false, deadNodesExcluded);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is mixing up how Paxos repair needs to work unless I am wrong. Paxos repair doesn't finish at ALL and the requirement for Accord for migration is that it be done at ALL so that eventually Accord can safely switch to single replica reads.

That is why previously the Paxos only repair was not marked as migration eligible.

If Paxos repair actually applies everything at all then this would work, but I recall that it doesn't because that wouldn't be highly available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks. Fixed

@@ -105,6 +105,9 @@ public class Repair extends NodeToolCmd
@Option(title = "paxos-only", name = {"-paxos-only", "--paxos-only"}, description = "If the --paxos-only flag is included, no table data is repaired, only paxos operations..")
private boolean paxosOnly = false;

@Option(title = "accord-only", name = {"-accord-only", "--accord-only"}, description = "If the --accord-only flag is included, no table data is repaired, only accord operations..")
private boolean accordOnly = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last we discussed this you actually pushed to not expose this to users in repair and I still agree with that. An accord-only repair isn't something useful to end users and we added a separate finish migration step to the JMX interface to handle invoking the Accord only repair.

We definitely should as part of repair do the Accord repairs to help maintain the contract people rely on where the repaired set is safe to read at CL=1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my objection was exposing the initial iteration of accord repair as a repair, since it didn't sync data on disk between all nodes. An accord only repair is a little different, it's basically the accord version of a paxos-only repair. Both paxos and accord only repairs only complete pending operations and do not sync data.

I'm also don't agree that accord-only repairs wouldn't be useful to end users. For instance, one of the use cases of paxos repairs is to establish a lower bound for applied paxos operations so that data can be exported to an ldap system for processing. Having said that, the accord repair is not on par with paxos repair since it issues a barrier, but does not wait for it to be applied on all nodes, and I think that's something I should add before committing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but just to make sure we are on the same page, Barriers used for repair should wait for application, they should call ApplyThenWaitUntilApplied after gathering dependencies. There is the exclusive sync points which don't wait, and instead populate that map that rejects transactions that are too old.

@@ -120,7 +135,7 @@ private void repairRange(TokenRange range) throws Throwable
BigInteger remaining = rangeSize.subtract(offset);
BigInteger length = remaining.min(rangeStep);

long start = ctx.clock().nanoTime();
long start = Clock.Global.nanoTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long start = Clock.Global.nanoTime();
long start = ctx.clock().nanoTime();

throw new RuntimeException(t);
}
finally
{
cfs.metric.rangeMigration.addNano(start);
long end = Clock.Global.nanoTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long end = Clock.Global.nanoTime();
long end = ctx.clock().nanoTime();

Copy link
Contributor

@dcapwell dcapwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finished review, only blocking comment is the usage of Clock.Global

Copy link
Contributor

@aweisberg aweisberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I am basically +1 I just have one concern.

RequiredResponseTracker might not have the same nodes passed in from repair that Accord ends up contacting. It really seems like the node list should come from Accord based on what it actually decides to contact?

The other thing is just that requiring operators to precisely know which consensus system is managing which table/range at any given point in time in order to invoke repair is a big burden. We should probably handle it for them.

if (options.paxosOnly() && !metadata.supportsPaxosOperations())
throw new IllegalArgumentException(String.format("Cannot run paxos only repair on %s.%s, which isn't configured for paxos operations", cfs.keyspace.getName(), cfs.name));

if (options.accordOnly() && !metadata.requiresAccordSupport())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an operator this might be painful because it requires you to know which kind you should invoke when in reality you just want to run the active consensus system's repair and you don't care which. I think we should split the ranges and then run the appropriate repair for each.

If we can make existing external tooling continue to "just work" we definitely should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a little harder about this being the X only case which is unusual, it still seems like we shouldn't expose the need to determine which ranges belong to which system outside of C*.

If you want to make this a TODO and file a JIRA we can kick the can down the road on this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/java/org/apache/cassandra/repair/RepairSession.java Outdated Show resolved Hide resolved
@@ -313,21 +307,23 @@ else if (ranges.isEmpty())
private final boolean repairPaxos;
private final boolean paxosOnly;

private final boolean accordRepair;
private final boolean accordOnly;
private final boolean isConsensusMigration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isConsensusMigration might be too declarative at the level of specifying what a repair should do. Maybe it should just indicate that this is all vs quorum for Accord.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really agree. Understanding what consistency repair should perform a given operation at seems is a bit specialized for someone reading the code for repair or consensus migration tool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally reasonable. I think a comment somewhere explaining that consensus migration wants to be highly available so it can occur at QUORUM while repair wants to ensure visibility of data before the repair started at all replicas.

Really migration being highly available is just icing on the cake since we can theoretically stay migrating indefinitely and it is fine. It just works that way because it's additional work to add all which you have now done.

I think it's worth keeping just because we don't really know the downsides of being in migration yet.

@@ -105,6 +105,9 @@ public class Repair extends NodeToolCmd
@Option(title = "paxos-only", name = {"-paxos-only", "--paxos-only"}, description = "If the --paxos-only flag is included, no table data is repaired, only paxos operations..")
private boolean paxosOnly = false;

@Option(title = "accord-only", name = {"-accord-only", "--accord-only"}, description = "If the --accord-only flag is included, no table data is repaired, only accord operations..")
private boolean accordOnly = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but just to make sure we are on the same page, Barriers used for repair should wait for application, they should call ApplyThenWaitUntilApplied after gathering dependencies. There is the exclusive sync points which don't wait, and instead populate that map that rejects transactions that are too old.

Patch by Blake Eggleston; Reviewed by Ariel Weisberg and David Capwell for CASSANDRA-19472
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants