IGNITE-22130 Fix retries. #3704

ascherbakoff · 2024-05-06T11:33:20Z

Retries both for implicit and explicit transactions are submitted to separate pool with a delay to avoid recursion in embedded mode and give some time to wait for lock release.
Added retries for reads and data streamer.
Removed wrapReplicationException method

sk0x50 · 2024-05-08T10:51:07Z

modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaService.java

@@ -210,7 +225,14 @@ private <R> CompletableFuture<R> sendToReplica(String targetNodeConsistentId, Re
                            return null;
                        });
                    } else {
-                        res.completeExceptionally(errResp.throwable());
+                        if (retryExecutor != null && matchAny(unwrapCause(errResp.throwable()), ACQUIRE_LOCK_ERR, REPLICA_MISS_ERR)) {


I still think that using exception classes is better than error codes, in general.
By the way, should ACQUIRE_LOCK_TIMEOUT_ERR be taken into account as well?

I still think that using exception classes is better than error codes, in general.

I disagree. The whole exception design is based on error codes. Checking error codes is more clean then comparing exception classes.

By the way, should ACQUIRE_LOCK_TIMEOUT_ERR be taken into account as well?

No it should not. This error code and related functionality should be removed, because we got retries from client side. I plan to create a ticket for this.

I disagree. The whole exception design is based on error codes. Checking error codes is more clean then comparing exception classes.

I disagree, using Java classes that represent an exception is a widespread practice. Error codes are a way to provide the user with an additional clue on critical situations, especially in the case of thin clients that are not supported exceptions.

No it should not. This error code and related functionality should be removed, because we got retries from client side. I plan to create a ticket for this.

Ok, I got it.

I disagree, using Java classes that represent an exception is a widespread practice. Error codes are a way to provide the user with an additional clue on critical situations, especially in the case of thin clients that are not supported exceptions.

Looks like we have no agreement here, because changing error code checking to instanceof doesn't make much sense to me. This however should not block PR changes. Or is this a merge blocker ? We can fix it later then we have strict rules on working with errors.

This is definitely not a blocker.

denis-chudov · 2024-05-08T07:23:32Z

modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaService.java

@@ -59,6 +68,8 @@ public class ReplicaService {

    private final ReplicationConfiguration replicationConfiguration;

+    private final ScheduledExecutorService retryExecutor;


pls add @Nullable here

sk0x50 · 2024-05-08T11:30:12Z

...sactions/src/main/java/org/apache/ignite/internal/tx/TransactionExceptionMapperProvider.java

+/**
+ * Transaction module exception mapper.
+ */
+@AutoService(IgniteExceptionMappersProvider.class)


vldpyatkov · 2024-05-13T15:14:40Z

modules/replicator/src/main/java/org/apache/ignite/internal/replicator/ReplicaService.java

+                            retryExecutor.schedule(
+                                    // Need to resubmit again to pool which is valid for synchronous IO execution.
+                                    () -> partitionOperationsExecutor.execute(() -> res.completeExceptionally(errResp.throwable())),
+                                    RETRY_TIMEOUT_MILLIS, MILLISECONDS);


If I understood it correctly,, we are holding response for several milliseconds to prevent instantaneous retry.
What will change if we don't do that? Because it is incorrect to delay a response instead of retrying, especially in cases where we don't make the retry again.

vldpyatkov · 2024-05-14T14:32:25Z

...le/src/main/java/org/apache/ignite/internal/table/distributed/storage/InternalTableImpl.java

+        if (full) { // Full transaction retries are handled in postEnlist.
+            return replicaSvc.invoke(primaryReplicaAndConsistencyToken.get1(), request);
+        } else {
+            if (write) { // Track only write requests from explicit transactions.


} else if (write) {
This code format is acceptable here.

vldpyatkov · 2024-05-14T14:41:40Z

...le/src/main/java/org/apache/ignite/internal/table/distributed/storage/InternalTableImpl.java

@@ -1022,6 +1028,10 @@ private static boolean allSchemaVersionsSame(Collection<? extends BinaryRow> row
        boolean first = true;

        for (BinaryRow row : rows) {
+            if (row == null) {


It looks like a bug, how is it happening?

ascherbakoff added 6 commits May 6, 2024 14:29

IGNITE-22130 Fix retries.

f4c36dc

IGNITE-22130 Fix tests.

00c429b

IGNITE-22130 Remove wrapReplicationException.

d0d0bb9

IGNITE-22130 Remove wrapReplicationException.

6175e95

IGNITE-22130 Remove wrapReplicationException.

e00fb94

IGNITE-22130 Fix style.

8e1147b

sk0x50 reviewed May 8, 2024

View reviewed changes

denis-chudov reviewed May 8, 2024

View reviewed changes

sk0x50 reviewed May 8, 2024

View reviewed changes

IGNITE-22130 Add nullable.

b9754dc

denis-chudov approved these changes May 9, 2024

View reviewed changes

vldpyatkov reviewed May 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IGNITE-22130 Fix retries. #3704

IGNITE-22130 Fix retries. #3704

ascherbakoff commented May 6, 2024 •

edited

sk0x50 May 8, 2024

ascherbakoff May 8, 2024 •

edited

sk0x50 May 11, 2024 •

edited

ascherbakoff May 13, 2024 •

edited

sk0x50 May 13, 2024

denis-chudov May 8, 2024

ascherbakoff May 8, 2024

sk0x50 May 8, 2024

vldpyatkov May 13, 2024

vldpyatkov May 14, 2024

vldpyatkov May 14, 2024

		@@ -59,6 +68,8 @@ public class ReplicaService {

		private final ReplicationConfiguration replicationConfiguration;

		private final ScheduledExecutorService retryExecutor;

IGNITE-22130 Fix retries. #3704

Are you sure you want to change the base?

IGNITE-22130 Fix retries. #3704

Conversation

ascherbakoff commented May 6, 2024 • edited

Choose a reason for hiding this comment

ascherbakoff May 8, 2024 • edited

Choose a reason for hiding this comment

sk0x50 May 11, 2024 • edited

Choose a reason for hiding this comment

ascherbakoff May 13, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ascherbakoff commented May 6, 2024 •

edited

ascherbakoff May 8, 2024 •

edited

sk0x50 May 11, 2024 •

edited

ascherbakoff May 13, 2024 •

edited