Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner.connection.it.ITReadWriteAutocommitSpannerTest: test03_MultipleStatements_WithTimeouts failed #938

Closed
flaky-bot bot opened this issue Mar 7, 2021 · 2 comments · Fixed by #999
Assignees
Labels
api: spanner Issues related to the googleapis/java-spanner API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@flaky-bot
Copy link

flaky-bot bot commented Mar 7, 2021

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: 6d4382c
buildURL: Build Status, Sponge
status: failed

Test output
com.google.cloud.spanner.SpannerException: CANCELLED: com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCELLED: Read/query was cancelled due to the enclosing transaction being invalidated by a later transaction in the same session.
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:283)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:61)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionForCancellation(SpannerExceptionFactory.java:199)
	at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:179)
	at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get(GapicSpannerRpc.java:1665)
	at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executeQuery(GapicSpannerRpc.java:1417)
	at com.google.cloud.spanner.TransactionRunnerImpl$TransactionContextImpl.executeUpdate(TransactionRunnerImpl.java:598)
	at com.google.cloud.spanner.connection.SingleUseTransaction$3$1.run(SingleUseTransaction.java:363)
	at com.google.cloud.spanner.connection.SingleUseTransaction$3$1.run(SingleUseTransaction.java:360)
	at com.google.cloud.spanner.TransactionRunnerImpl$1.call(TransactionRunnerImpl.java:902)
	at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:105)
	at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
	at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted(SpannerRetryHelper.java:65)
	at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted(SpannerRetryHelper.java:56)
	at com.google.cloud.spanner.TransactionRunnerImpl.runInternal(TransactionRunnerImpl.java:962)
	at com.google.cloud.spanner.TransactionRunnerImpl.run(TransactionRunnerImpl.java:861)
	at com.google.cloud.spanner.SessionPool$SessionPoolTransactionRunner.run(SessionPool.java:944)
	at com.google.cloud.spanner.connection.SingleUseTransaction$3.call(SingleUseTransaction.java:359)
	at com.google.cloud.spanner.connection.SingleUseTransaction$3.call(SingleUseTransaction.java:353)
	at io.grpc.Context$2.call(Context.java:596)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: com.google.cloud.spanner.connection.AbstractBaseUnitOfWork$SpannerAsyncExecutionException: Execution failed for statement: DELETE FROM TEST WHERE ID=1000
		at com.google.cloud.spanner.connection.AbstractBaseUnitOfWork.executeStatementAsync(AbstractBaseUnitOfWork.java:226)
		at com.google.cloud.spanner.connection.AbstractBaseUnitOfWork.executeStatementAsync(AbstractBaseUnitOfWork.java:151)
		at com.google.cloud.spanner.connection.SingleUseTransaction.executeTransactionalUpdateAsync(SingleUseTransaction.java:374)
		at com.google.cloud.spanner.connection.SingleUseTransaction.executeUpdateAsync(SingleUseTransaction.java:306)
		at com.google.cloud.spanner.connection.ConnectionImpl.internalExecuteUpdateAsync(ConnectionImpl.java:958)
		at com.google.cloud.spanner.connection.ConnectionImpl.executeUpdate(ConnectionImpl.java:846)
		at com.google.cloud.spanner.connection.it.ITReadWriteAutocommitSpannerTest.test03_MultipleStatements_WithTimeouts(ITReadWriteAutocommitSpannerTest.java:107)
		at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
		at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
		at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
		at java.lang.reflect.Method.invoke(Method.java:498)
		at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
		at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
		at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
		at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
		at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
		at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
		at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
		at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
		at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
		at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
		at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
		at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
		at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
		at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
		at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
		at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
		at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
		at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
		at org.junit.rules.RunRules.evaluate(RunRules.java:20)
		at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
		at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
		at org.junit.runners.Suite.runChild(Suite.java:128)
		at org.junit.runners.Suite.runChild(Suite.java:27)
		at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
		at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
		at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
		at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
		at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
		at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
		at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
		at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55)
		at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137)
		at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeLazy(JUnitCoreWrapper.java:119)
		at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:87)
		at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75)
		at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158)
		at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377)
		at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138)
		at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465)
		at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451)
Caused by: java.util.concurrent.ExecutionException: com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCELLED: Read/query was cancelled due to the enclosing transaction being invalidated by a later transaction in the same session.
	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:566)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:547)
	at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get(GapicSpannerRpc.java:1659)
	... 22 more
Caused by: com.google.api.gax.rpc.CancelledException: io.grpc.StatusRuntimeException: CANCELLED: Read/query was cancelled due to the enclosing transaction being invalidated by a later transaction in the same session.
	at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:43)
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
	at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1041)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1215)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
	at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:563)
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:533)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose(SpannerErrorInterceptor.java:100)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:553)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:68)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:739)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:718)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	... 3 more
Caused by: io.grpc.StatusRuntimeException: CANCELLED: Read/query was cancelled due to the enclosing transaction being invalidated by a later transaction in the same session.
	at io.grpc.Status.asRuntimeException(Status.java:535)
	... 18 more
@flaky-bot flaky-bot bot added flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Mar 7, 2021
@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/java-spanner API. label Mar 7, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented Mar 8, 2021

Looks like this issue is flaky. 😟

I'm going to leave this open and stop commenting.

A human should fix and close this.


When run at the same commit (6d4382c), this test passed in one build (Build Status, Sponge) and failed in another build (Build Status, Sponge).

@flaky-bot flaky-bot bot added the flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. label Mar 8, 2021
@olavloite olavloite self-assigned this Mar 11, 2021
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Mar 14, 2021
@olavloite olavloite added priority: p2 Moderately-important priority. Fix may not be included in next release. priority: p3 Desirable enhancement or fix. May not be included in next release. and removed 🚨 This issue needs some love. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Mar 19, 2021
@olavloite
Copy link
Collaborator

I think that what might have happened here is the following (all below statements are executed in autocommit, which means that each statement will include a BeginTransaction option, and then execute a separate Commit RPC afterwards if the initial statement succeeded):

  1. The statement UPDATE TEST SET NAME='test18' WHERE ID=1000 is executed with a very short timeout. The statement was sent to Spanner and timed out in the client before it reached Spanner. No Commit was sent.
  2. The statement DELETE FROM TEST WHERE ID=1000 was executed with a normal timeout. The statement was sent to Spanner and overtook the statement from step 1. One of the two statements will be cancelled by Spanner, and in this case it happened to be statement 2. Statement 1 was executed successfully, but never committed.

This could in theory happen in other cases as well when a session is returned to the pool while a statement is still actively running on the backend, but the client has abandoned the statement because of a timeout or otherwise cancelled statement. The best solution is probably to retry the transaction in the specific case when the first statement of a transaction that did include a BeginTransaction option fails with the specific error CANCELLED: Read/query was cancelled due to the enclosing transaction being invalidated by a later transaction in the same session..

olavloite added a commit that referenced this issue Mar 19, 2021
If the first statement of a read/write transaction fails with a CANCELLED error
and the error message is `Read/query was cancelled due to the enclosing transaction
being invalidated by a later transaction in the same session.`, then the transaction
should be retried, as the error could be caused by a previous statement that was
abandoned by the client but still executed by the backend. This could be the case
if the statement timed out (on the client) or was cancelled.

Fixes #938
gcf-merge-on-green bot pushed a commit that referenced this issue Mar 22, 2021
If the first statement of a read/write transaction fails with a `CANCELLED` error and the error message is `Read/query was cancelled due to the enclosing transaction being invalidated by a later transaction in the same session.`, then the transaction should be retried, as the error could be caused by a previous statement that was abandoned by the client but still executed by the backend. This could be the case if the statement timed out (on the client) or was cancelled.

Fixes #938
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/java-spanner API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants