Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-22130 Fix retries. #3704

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

ascherbakoff
Copy link
Contributor

@ascherbakoff ascherbakoff commented May 6, 2024

  1. Retries both for implicit and explicit transactions are submitted to separate pool with a delay to avoid recursion in embedded mode and give some time to wait for lock release.
  2. Added retries for reads and data streamer.
  3. Removed wrapReplicationException method

@@ -210,7 +225,14 @@ private <R> CompletableFuture<R> sendToReplica(String targetNodeConsistentId, Re
return null;
});
} else {
res.completeExceptionally(errResp.throwable());
if (retryExecutor != null && matchAny(unwrapCause(errResp.throwable()), ACQUIRE_LOCK_ERR, REPLICA_MISS_ERR)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that using exception classes is better than error codes, in general.
By the way, should ACQUIRE_LOCK_TIMEOUT_ERR be taken into account as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that using exception classes is better than error codes, in general.

I disagree. The whole exception design is based on error codes. Checking error codes is more clean then comparing exception classes.

By the way, should ACQUIRE_LOCK_TIMEOUT_ERR be taken into account as well?

No it should not. This error code and related functionality should be removed, because we got retries from client side. I plan to create a ticket for this.

Copy link
Contributor

@sk0x50 sk0x50 May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The whole exception design is based on error codes. Checking error codes is more clean then comparing exception classes.

I disagree, using Java classes that represent an exception is a widespread practice. Error codes are a way to provide the user with an additional clue on critical situations, especially in the case of thin clients that are not supported exceptions.

No it should not. This error code and related functionality should be removed, because we got retries from client side. I plan to create a ticket for this.

Ok, I got it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, using Java classes that represent an exception is a widespread practice. Error codes are a way to provide the user with an additional clue on critical situations, especially in the case of thin clients that are not supported exceptions.

Looks like we have no agreement here, because changing error code checking to instanceof doesn't make much sense to me. This however should not block PR changes. Or is this a merge blocker ? We can fix it later then we have strict rules on working with errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely not a blocker.

@@ -59,6 +68,8 @@ public class ReplicaService {

private final ReplicationConfiguration replicationConfiguration;

private final ScheduledExecutorService retryExecutor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add @Nullable here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

/**
* Transaction module exception mapper.
*/
@AutoService(IgniteExceptionMappersProvider.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

retryExecutor.schedule(
// Need to resubmit again to pool which is valid for synchronous IO execution.
() -> partitionOperationsExecutor.execute(() -> res.completeExceptionally(errResp.throwable())),
RETRY_TIMEOUT_MILLIS, MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood it correctly,, we are holding response for several milliseconds to prevent instantaneous retry.
What will change if we don't do that? Because it is incorrect to delay a response instead of retrying, especially in cases where we don't make the retry again.

if (full) { // Full transaction retries are handled in postEnlist.
return replicaSvc.invoke(primaryReplicaAndConsistencyToken.get1(), request);
} else {
if (write) { // Track only write requests from explicit transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else if (write) {
This code format is acceptable here.

@@ -1022,6 +1028,10 @@ private static boolean allSchemaVersionsSame(Collection<? extends BinaryRow> row
boolean first = true;

for (BinaryRow row : rows) {
if (row == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a bug, how is it happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants