Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experienced '504 Deadline Exceeded' error when tried to delete large number rows by partitioned dml #199

Closed
FengnaLiu opened this issue May 13, 2020 · 17 comments · Fixed by #203 or #232
Assignees
Labels
api: spanner Issues related to the googleapis/java-spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@FengnaLiu
Copy link

There are 1 billion old records need to be cleaned. According the documents, It seems Partitioned DML should be the good choice. I choose java client because we can set timeout by using setPartitionedDmlTimeout(Duration.ofHours(24L)). However, I still experienced the '504 Deadline Exceeded' after a few hours that shorter than 24 hours I set. Why setPartitionedDmlTimeout did not work? How should I solve the problem? Is it not possible to delete so large number of data by using Partitioned DML.

Following are the details.

Environment details

  1. Spanner
    3 nodes, 2 billion records in a table and 1 billion old records need to be cleaned
  2. OS type and version: MacOs version 10.15.3
  3. Java version: java 8
  4. spanner version(s): the latest version

Steps to reproduce

  1. Write 2 billion records with timestamp in to the spanner table with dataflow
  2. Run a clean job to delete the old 1 billion records with Partitioned DML by java client
  3. Set the timeout to be 24 hours by using setPartitionedDmlTimeout(Duration.ofHours(24L))
  4. Run the java code

Code example

SpannerOptions options = SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofHours(24L)).build();

        Spanner spanner = options.getService();
        try {

            DatabaseId db = DatabaseId.of(options.getProjectId(), args[0], args[1]);
            // [END init_client]
            //This will return the default project id based on the environment.
            String clientProject = spanner.getOptions().getProjectId();
            if (!db.getInstanceId().getProject().equals(clientProject)) {
                System.err.println(
                        "Invalid project specified. Project in the database id should match the"
                                + "project name set in the environment variable GOOGLE_CLOUD_PROJECT. Expected: "
                                + clientProject);
            }

            // [START init_client]
            DatabaseClient dbClient = spanner.getDatabaseClient(db);

            SimpleDateFormat formatter = new SimpleDateFormat("dd/MM/yyyy hh:mm:ss.SSS");
            String sql="DELETE FROM MyTable WHERE updateTimestamp<'2020-05-02T03:00:00Z'";
            Date currentDate = new Date();
            System.out.println(String.format("Delete Start:%s",formatter.format(currentDate)));

            long lStartTime = Instant.now().toEpochMilli();
            deleteUsingPartitionedDml(dbClient,sql);
            long lEndTime = Instant.now().toEpochMilli();
            long elaspedTime = lEndTime - lStartTime;
            System.out.println("Elapsed time in : " + elaspedTime/1000);

        } finally {
            spanner.close();
        }
        // [END init_client]
        System.out.println("Closed client");
    }
    private static void deleteUsingPartitionedDml(DatabaseClient dbClient,String sql) {
        long rowCount = dbClient.executePartitionedUpdate(Statement.of(sql));

        System.out.printf("%d records deleted.\n", rowCount);
    }
@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/java-spanner API. label May 13, 2020
@olavloite olavloite self-assigned this May 13, 2020
@olavloite
Copy link
Collaborator

@FengnaLiu Thanks for the detailed report. I'll look into it.

Could you also share the stacktrace of the exception that you get? The reason I'm asking is that I noticed that you wrote that you get a 504 Deadline Exceeded error, which indicates that it is an HTTP error and not a gRPC error. That could indicate that the timeout is occurring on the server and not in the client. The client would normally throw a DEADLINE_EXCEEDED exception with gRPC status code 4.

@olavloite
Copy link
Collaborator

@FengnaLiu I have a couple of additional questions in addition to the one above.

  1. Could it be that the Partitioned DML statement is at least partly executed? In other words: If you execute a SELECT COUNT(*) FROM MyTable before and after running the above sample, does it return the same value?
  2. Does your network infrastructure include a HTTP proxy?

olavloite added a commit that referenced this issue May 13, 2020
Setting a timeout value for Partitioned DML would not be respected if the timeout
value was higher than the timeout value set for the ExecuteSql RPC on the SpannerStub.
Lower timeout values would be respected.

Fixes #199
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 13, 2020
@skuruppu skuruppu added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels May 14, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 14, 2020
olavloite added a commit that referenced this issue May 14, 2020
Setting a timeout value for Partitioned DML would not be respected if the timeout
value was higher than the timeout value set for the ExecuteSql RPC on the SpannerStub.
Lower timeout values would be respected.

Fixes #199
olavloite added a commit that referenced this issue May 14, 2020
* fix: Partitioned DML timeout was not always respected

Setting a timeout value for Partitioned DML would not be respected if the timeout
value was higher than the timeout value set for the ExecuteSql RPC on the SpannerStub.
Lower timeout values would be respected.

Fixes #199

* fix: add ignored changes + InternalApi

* tests: add test for retry on UNAVAILABLE
@FengnaLiu
Copy link
Author

@olavloite I tried to run the delete job again. I got the 'DEADLINE_EXCEEDED' instead of '504 DEADLINE_EXCEEDED' about four hours later after started the delete job.

Following are the details.

  1. Half of the data was deleted.

  2. The stacktrace of the exception i got
    com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: com.google.api.gax.rpc.DeadlineExceededException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599999079514ns. [buffered_nanos=186035342, remote_addr=spanner.googleapis.com/172.217.175.106:443] at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted (SpannerExceptionFactory.java:182) at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException (SpannerExceptionFactory.java:45) at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException (SpannerExceptionFactory.java:139) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get (GapicSpannerRpc.java:691) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executePartitionedDml (GapicSpannerRpc.java:543) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:87) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:69) at com.google.api.gax.retrying.DirectRetryingExecutor.submit (DirectRetryingExecutor.java:105) at com.google.cloud.RetryHelper.run (RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries (RetryHelper.java:50) at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted (SpannerRetryHelper.java:47) at com.google.cloud.spanner.PartitionedDMLTransaction.executePartitionedUpdate (PartitionedDMLTransaction.java:91) at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate (SessionImpl.java:103) at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate (SessionPool.java:647) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:199) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:196) at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry (DatabaseClientImpl.java:213) at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate (DatabaseClientImpl.java:194) at jp.co.sej.central.spanner.DeleteOldStoreRecords.deleteUsingPartitionedDml (DeleteOldStoreRecords.java:93) at jp.co.sej.central.spanner.DeleteOldStoreRecords.main (DeleteOldStoreRecords.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:748) Caused by: java.util.concurrent.ExecutionException: com.google.api.gax.rpc.DeadlineExceededException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599999079514ns. [buffered_nanos=186035342, remote_addr=spanner.googleapis.com/172.217.175.106:443] at com.google.common.util.concurrent.AbstractFuture.getDoneValue (AbstractFuture.java:552) at com.google.common.util.concurrent.AbstractFuture.get (AbstractFuture.java:533) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get (GapicSpannerRpc.java:685) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executePartitionedDml (GapicSpannerRpc.java:543) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:87) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:69) at com.google.api.gax.retrying.DirectRetryingExecutor.submit (DirectRetryingExecutor.java:105) at com.google.cloud.RetryHelper.run (RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries (RetryHelper.java:50) at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted (SpannerRetryHelper.java:47) at com.google.cloud.spanner.PartitionedDMLTransaction.executePartitionedUpdate (PartitionedDMLTransaction.java:91) at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate (SessionImpl.java:103) at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate (SessionPool.java:647) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:199) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:196) at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry (DatabaseClientImpl.java:213) at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate (DatabaseClientImpl.java:194) at jp.co.sej.central.spanner.DeleteOldStoreRecords.deleteUsingPartitionedDml (DeleteOldStoreRecords.java:93) at jp.co.sej.central.spanner.DeleteOldStoreRecords.main (DeleteOldStoreRecords.java:81) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:748) Caused by: com.google.api.gax.rpc.DeadlineExceededException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599999079514ns. [buffered_nanos=186035342, remote_addr=spanner.googleapis.com/172.217.175.106:443] at com.google.api.gax.rpc.ApiExceptionFactory.createException (ApiExceptionFactory.java:51) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create (GrpcApiExceptionFactory.java:72) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create (GrpcApiExceptionFactory.java:60) at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure (GrpcExceptionCallable.java:97) at com.google.api.core.ApiFutures$1.onFailure (ApiFutures.java:68) at com.google.common.util.concurrent.Futures$CallbackListener.run (Futures.java:982) at com.google.common.util.concurrent.DirectExecutor.execute (DirectExecutor.java:30) at com.google.common.util.concurrent.AbstractFuture.executeListener (AbstractFuture.java:1138) at com.google.common.util.concurrent.AbstractFuture.complete (AbstractFuture.java:957) at com.google.common.util.concurrent.AbstractFuture.setException (AbstractFuture.java:748) at io.grpc.stub.ClientCalls$GrpcFuture.setException (ClientCalls.java:520) at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose (ClientCalls.java:495) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose (SpannerErrorInterceptor.java:100) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose (CensusStatsModule.java:700) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose (CensusTracingModule.java:399) at io.grpc.internal.ClientCallImpl.closeObserver (ClientCallImpl.java:510) at io.grpc.internal.ClientCallImpl.access$300 (ClientCallImpl.java:66) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close (ClientCallImpl.java:630) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700 (ClientCallImpl.java:518) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal (ClientCallImpl.java:692) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext (ClientCallImpl.java:681) at io.grpc.internal.ContextRunnable.run (ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run (SerializingExecutor.java:123) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) at java.util.concurrent.FutureTask.run (FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201 (ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run (Thread.java:748) Caused by: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: deadline exceeded after 3599999079514ns. [buffered_nanos=186035342, remote_addr=spanner.googleapis.com/172.217.175.106:443] at io.grpc.Status.asRuntimeException (Status.java:533) at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose (ClientCalls.java:495) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose (SpannerErrorInterceptor.java:100) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose (CensusStatsModule.java:700) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose (CensusTracingModule.java:399) at io.grpc.internal.ClientCallImpl.closeObserver (ClientCallImpl.java:510) at io.grpc.internal.ClientCallImpl.access$300 (ClientCallImpl.java:66) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close (ClientCallImpl.java:630) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700 (ClientCallImpl.java:518) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal (ClientCallImpl.java:692) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext (ClientCallImpl.java:681) at io.grpc.internal.ContextRunnable.run (ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run (SerializingExecutor.java:123) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) at java.util.concurrent.FutureTask.run (FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201 (ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run (Thread.java:748)

  3. My network infrastructure does not include a HTTP proxy

@olavloite
Copy link
Collaborator

@FengnaLiu Thanks for the additional information. I have been able to reproduce this, and a fix has been merged. You could try it with that by checking out the latest version of the master branch of this repository, or wait for the next release.

@iamacleish
Copy link

@olavloite Any idea when the next release will be?

@olavloite olavloite added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed triage me I really want to be triaged. labels May 15, 2020
gcf-merge-on-green bot pushed a commit that referenced this issue May 19, 2020
🤖 I have created a release \*beep\* \*boop\* 
---
## [1.55.0](https://www.github.com/googleapis/java-spanner/compare/v1.54.0...v1.55.0) (2020-05-19)


### Features

* mark when a Spanner client is closed ([#198](https://www.github.com/googleapis/java-spanner/issues/198)) ([50cb174](https://www.github.com/googleapis/java-spanner/commit/50cb1744e7ede611758d3ff63b3df77a1d3682eb))


### Bug Fixes

* make it possible to override backups methods ([#195](https://www.github.com/googleapis/java-spanner/issues/195)) ([2d19c25](https://www.github.com/googleapis/java-spanner/commit/2d19c25ba32847d116194565e67e1b1276fcb9f8))
* Partitioned DML timeout was not always respected ([#203](https://www.github.com/googleapis/java-spanner/issues/203)) ([13cb37e](https://www.github.com/googleapis/java-spanner/commit/13cb37e55ddfd1ff4ec22b1dcdc20c4832eee444)), closes [#199](https://www.github.com/googleapis/java-spanner/issues/199)
* partitionedDml stub was not closed ([#213](https://www.github.com/googleapis/java-spanner/issues/213)) ([a2d9a33](https://www.github.com/googleapis/java-spanner/commit/a2d9a33fa31f7467fc2bfbef5a29c4b3f5aea7c8))
* reuse clientId for invalidated databases ([#206](https://www.github.com/googleapis/java-spanner/issues/206)) ([7b4490d](https://www.github.com/googleapis/java-spanner/commit/7b4490dfb61fbc81b5bd6be6c9a663b36b5ce402))
* use nanos to prevent truncation errors ([#204](https://www.github.com/googleapis/java-spanner/issues/204)) ([a608460](https://www.github.com/googleapis/java-spanner/commit/a60846043dc0ca47e1970d8ab99380b6d725c7a9)), closes [#200](https://www.github.com/googleapis/java-spanner/issues/200)


### Dependencies

* update dependency com.google.cloud:google-cloud-shared-dependencies to v0.3.1 ([#190](https://www.github.com/googleapis/java-spanner/issues/190)) ([ad41a0d](https://www.github.com/googleapis/java-spanner/commit/ad41a0d4b0cc6a2c0ae0611c767652f64cfb2fb7))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please).
@olavloite
Copy link
Collaborator

@iamacleish A new release has now been published.

@FengnaLiu
Copy link
Author

FengnaLiu commented May 20, 2020

@olavloite Thanks for your information.
I tried the new version. But I encountered the new error [com.google.cloud.spanner.SpannerException: INTERNAL: ] after 1 hour running.

Details information as following.

  1. The stacktrace of the exception i got
    com.google.cloud.spanner.SpannerException: INTERNAL: com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: Internal error encountered. at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted (SpannerExceptionFactory.java:210) at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException (SpannerExceptionFactory.java:56) at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException (SpannerExceptionFactory.java:150) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get (GapicSpannerRpc.java:1214) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executePartitionedDml (GapicSpannerRpc.java:1060) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:86) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:68) at com.google.api.gax.retrying.DirectRetryingExecutor.submit (DirectRetryingExecutor.java:105) at com.google.cloud.RetryHelper.run (RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries (RetryHelper.java:50) at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted (SpannerRetryHelper.java:57) at com.google.cloud.spanner.PartitionedDMLTransaction.executePartitionedUpdate (PartitionedDMLTransaction.java:90) at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate (SessionImpl.java:108) at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate (SessionPool.java:688) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:205) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:202) at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry (DatabaseClientImpl.java:219) at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate (DatabaseClientImpl.java:200) at jp.co.sej.central.spanner.DeleteOldStoreRecords.deleteUsingPartitionedDml (DeleteOldStoreRecords.java:56) at jp.co.sej.central.spanner.DeleteOldStoreRecords.main (DeleteOldStoreRecords.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:748) Caused by: java.util.concurrent.ExecutionException: com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: Internal error encountered. at com.google.common.util.concurrent.AbstractFuture.getDoneValue (AbstractFuture.java:564) at com.google.common.util.concurrent.AbstractFuture.get (AbstractFuture.java:545) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get (GapicSpannerRpc.java:1208) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executePartitionedDml (GapicSpannerRpc.java:1060) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:86) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:68) at com.google.api.gax.retrying.DirectRetryingExecutor.submit (DirectRetryingExecutor.java:105) at com.google.cloud.RetryHelper.run (RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries (RetryHelper.java:50) at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted (SpannerRetryHelper.java:57) at com.google.cloud.spanner.PartitionedDMLTransaction.executePartitionedUpdate (PartitionedDMLTransaction.java:90) at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate (SessionImpl.java:108) at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate (SessionPool.java:688) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:205) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:202) at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry (DatabaseClientImpl.java:219) at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate (DatabaseClientImpl.java:200) at jp.co.sej.central.spanner.DeleteOldStoreRecords.deleteUsingPartitionedDml (DeleteOldStoreRecords.java:56) at jp.co.sej.central.spanner.DeleteOldStoreRecords.main (DeleteOldStoreRecords.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:748) Caused by: com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: Internal error encountered. at com.google.api.gax.rpc.ApiExceptionFactory.createException (ApiExceptionFactory.java:67) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create (GrpcApiExceptionFactory.java:72) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create (GrpcApiExceptionFactory.java:60) at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure (GrpcExceptionCallable.java:97) at com.google.api.core.ApiFutures$1.onFailure (ApiFutures.java:68) at com.google.common.util.concurrent.Futures$CallbackListener.run (Futures.java:1050) at com.google.common.util.concurrent.DirectExecutor.execute (DirectExecutor.java:30) at com.google.common.util.concurrent.AbstractFuture.executeListener (AbstractFuture.java:1176) at com.google.common.util.concurrent.AbstractFuture.complete (AbstractFuture.java:969) at com.google.common.util.concurrent.AbstractFuture.setException (AbstractFuture.java:760) at io.grpc.stub.ClientCalls$GrpcFuture.setException (ClientCalls.java:526) at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose (ClientCalls.java:501) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose (SpannerErrorInterceptor.java:100) at io.grpc.internal.ClientCallImpl.closeObserver (ClientCallImpl.java:426) at io.grpc.internal.ClientCallImpl.access$500 (ClientCallImpl.java:66) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close (ClientCallImpl.java:689) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900 (ClientCallImpl.java:577) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal (ClientCallImpl.java:751) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext (ClientCallImpl.java:740) at io.grpc.internal.ContextRunnable.run (ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run (SerializingExecutor.java:123) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) at java.util.concurrent.FutureTask.run (FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201 (ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run (Thread.java:748) Caused by: io.grpc.StatusRuntimeException: INTERNAL: Internal error encountered. at io.grpc.Status.asRuntimeException (Status.java:533) at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose (ClientCalls.java:501) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose (SpannerErrorInterceptor.java:100) at io.grpc.internal.ClientCallImpl.closeObserver (ClientCallImpl.java:426) at io.grpc.internal.ClientCallImpl.access$500 (ClientCallImpl.java:66) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close (ClientCallImpl.java:689) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900 (ClientCallImpl.java:577) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal (ClientCallImpl.java:751) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext (ClientCallImpl.java:740) at io.grpc.internal.ContextRunnable.run (ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run (SerializingExecutor.java:123) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) at java.util.concurrent.FutureTask.run (FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201 (ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run (Thread.java:748)

  2. pom.xml

`



com.google.cloud
libraries-bom
5.4.0
pom
import


UTF-8 UTF-8
<!--testing dependencies-->
<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>4.13</version>
  <scope>test</scope>
</dependency>

<dependency>
  <groupId>org.hamcrest</groupId>
  <artifactId>hamcrest-library</artifactId>
  <version>2.2</version>
  <scope>test</scope>
</dependency>

<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-databind</artifactId>
  <version>2.11.0</version>
</dependency>

<dependency>
  <groupId>com.google.cloud</groupId>
  <artifactId>google-cloud-spanner</artifactId>
  <version>1.55.0</version>
</dependency>

`

@olavloite
Copy link
Collaborator

@FengnaLiu That's surprising. I'll look into it right away. Did you set any specific timeout value?

@FengnaLiu
Copy link
Author

@olavloite I did set the timeout to be 24 hours.
SpannerOptions options = SpannerOptions.newBuilder().setPartitionedDmlTimeout(Duration.ofHours(24L)).build();

@FengnaLiu
Copy link
Author

@olavloite I ran the job again and then the error seems different.

com.google.cloud.spanner.AbortedException: ABORTED: com.google.api.gax.rpc.AbortedException: io.grpc.StatusRuntimeException: ABORTED: Aborted due to transient fault at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted (SpannerExceptionFactory.java:196) at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException (SpannerExceptionFactory.java:56) at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException (SpannerExceptionFactory.java:150) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get (GapicSpannerRpc.java:1214) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executePartitionedDml (GapicSpannerRpc.java:1060) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:86) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:68) at com.google.api.gax.retrying.DirectRetryingExecutor.submit (DirectRetryingExecutor.java:105) at com.google.cloud.RetryHelper.run (RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries (RetryHelper.java:50) at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted (SpannerRetryHelper.java:57) at com.google.cloud.spanner.PartitionedDMLTransaction.executePartitionedUpdate (PartitionedDMLTransaction.java:90) at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate (SessionImpl.java:108) at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate (SessionPool.java:688) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:205) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:202) at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry (DatabaseClientImpl.java:219) at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate (DatabaseClientImpl.java:200) at jp.co.sej.central.spanner.DeleteOldStoreRecords.deleteUsingPartitionedDml (DeleteOldStoreRecords.java:56) at jp.co.sej.central.spanner.DeleteOldStoreRecords.main (DeleteOldStoreRecords.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:748) Caused by: java.util.concurrent.ExecutionException: com.google.api.gax.rpc.AbortedException: io.grpc.StatusRuntimeException: ABORTED: Aborted due to transient fault at com.google.common.util.concurrent.AbstractFuture.getDoneValue (AbstractFuture.java:564) at com.google.common.util.concurrent.AbstractFuture.get (AbstractFuture.java:545) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.get (GapicSpannerRpc.java:1208) at com.google.cloud.spanner.spi.v1.GapicSpannerRpc.executePartitionedDml (GapicSpannerRpc.java:1060) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:86) at com.google.cloud.spanner.PartitionedDMLTransaction$1.call (PartitionedDMLTransaction.java:68) at com.google.api.gax.retrying.DirectRetryingExecutor.submit (DirectRetryingExecutor.java:105) at com.google.cloud.RetryHelper.run (RetryHelper.java:76) at com.google.cloud.RetryHelper.runWithRetries (RetryHelper.java:50) at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted (SpannerRetryHelper.java:57) at com.google.cloud.spanner.PartitionedDMLTransaction.executePartitionedUpdate (PartitionedDMLTransaction.java:90) at com.google.cloud.spanner.SessionImpl.executePartitionedUpdate (SessionImpl.java:108) at com.google.cloud.spanner.SessionPool$PooledSession.executePartitionedUpdate (SessionPool.java:688) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:205) at com.google.cloud.spanner.DatabaseClientImpl$3.apply (DatabaseClientImpl.java:202) at com.google.cloud.spanner.DatabaseClientImpl.runWithSessionRetry (DatabaseClientImpl.java:219) at com.google.cloud.spanner.DatabaseClientImpl.executePartitionedUpdate (DatabaseClientImpl.java:200) at jp.co.sej.central.spanner.DeleteOldStoreRecords.deleteUsingPartitionedDml (DeleteOldStoreRecords.java:56) at jp.co.sej.central.spanner.DeleteOldStoreRecords.main (DeleteOldStoreRecords.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282) at java.lang.Thread.run (Thread.java:748) Caused by: com.google.api.gax.rpc.AbortedException: io.grpc.StatusRuntimeException: ABORTED: Aborted due to transient fault at com.google.api.gax.rpc.ApiExceptionFactory.createException (ApiExceptionFactory.java:61) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create (GrpcApiExceptionFactory.java:72) at com.google.api.gax.grpc.GrpcApiExceptionFactory.create (GrpcApiExceptionFactory.java:60) at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure (GrpcExceptionCallable.java:97) at com.google.api.core.ApiFutures$1.onFailure (ApiFutures.java:68) at com.google.common.util.concurrent.Futures$CallbackListener.run (Futures.java:1050) at com.google.common.util.concurrent.DirectExecutor.execute (DirectExecutor.java:30) at com.google.common.util.concurrent.AbstractFuture.executeListener (AbstractFuture.java:1176) at com.google.common.util.concurrent.AbstractFuture.complete (AbstractFuture.java:969) at com.google.common.util.concurrent.AbstractFuture.setException (AbstractFuture.java:760) at io.grpc.stub.ClientCalls$GrpcFuture.setException (ClientCalls.java:526) at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose (ClientCalls.java:501) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose (SpannerErrorInterceptor.java:100) at io.grpc.internal.ClientCallImpl.closeObserver (ClientCallImpl.java:426) at io.grpc.internal.ClientCallImpl.access$500 (ClientCallImpl.java:66) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close (ClientCallImpl.java:689) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900 (ClientCallImpl.java:577) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal (ClientCallImpl.java:751) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext (ClientCallImpl.java:740) at io.grpc.internal.ContextRunnable.run (ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run (SerializingExecutor.java:123) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) at java.util.concurrent.FutureTask.run (FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201 (ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run (Thread.java:748) Caused by: io.grpc.StatusRuntimeException: ABORTED: Aborted due to transient fault at io.grpc.Status.asRuntimeException (Status.java:533) at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose (ClientCalls.java:501) at io.grpc.PartialForwardingClientCallListener.onClose (PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose (ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose (ForwardingClientCallListener.java:40) at com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor$1$1.onClose (SpannerErrorInterceptor.java:100) at io.grpc.internal.ClientCallImpl.closeObserver (ClientCallImpl.java:426) at io.grpc.internal.ClientCallImpl.access$500 (ClientCallImpl.java:66) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close (ClientCallImpl.java:689) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900 (ClientCallImpl.java:577) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal (ClientCallImpl.java:751) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext (ClientCallImpl.java:740) at io.grpc.internal.ContextRunnable.run (ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run (SerializingExecutor.java:123) at java.util.concurrent.Executors$RunnableAdapter.call (Executors.java:511) at java.util.concurrent.FutureTask.run (FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201 (ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run (ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run (Thread.java:748)

@olavloite
Copy link
Collaborator

@FengnaLiu Thanks for the additional information. That is indeed a different error. The Partitioned DML transaction has been aborted, and normally that should lead to a retry by the client, but it seems that that is not happening in this case. I'll try to figure out why that is.

olavloite added a commit that referenced this issue May 20, 2020
The PartitionedDML retry settings were only applied for the RPC, and not
for the generic retryer that would retry the PDML transaction if it was
aborted by Spanner. This could cause long-running PDML transactions to
fail with an Aborted exception.

Fixes #199
skuruppu pushed a commit that referenced this issue May 20, 2020
* fix: PDML retry settings were not applied for aborted tx

The PartitionedDML retry settings were only applied for the RPC, and not
for the generic retryer that would retry the PDML transaction if it was
aborted by Spanner. This could cause long-running PDML transactions to
fail with an Aborted exception.

Fixes #199

* fix: add ignored diff to clirr
@olavloite
Copy link
Collaborator

olavloite commented May 21, 2020

@FengnaLiu The last problem you ran into seems to be caused by the following scenario:

  1. The Partitioned DML transaction and statement is started.
  2. The transaction is aborted by Cloud Spanner.
  3. The client starts an automatic retry of the PDML transaction. The retry and timeout settings that were used by the automatic retryer for the retried transaction were the default retry settings, and not the specific PDML retry settings that have been introduced.

The last point above has now also been fixed in #232

@FengnaLiu
Copy link
Author

@olavloite I really appreciate your quick response.
There are still something that I am not clear.

  1. Does it mean I have to wait for next release to apply the changes?
  2. And how about the first problem [com.google.cloud.spanner.SpannerException: INTERNAL: ]?
    Did it also have been fixed?

@FengnaLiu
Copy link
Author

@olavloite I really sorry to bother you again. I tried version 1.55.1 and 1.55.0 for several delete patterns. And I found out that with version 1.55.1 it did experience no timeout error but the job kept running even the records have been all deleted. Additionally, it appears to deleting much far more slower than the older version.

Details as followings:

  1. delete 1 billion records.
    Version 1.55.0 - The job experiences timeout error after a few hours running , and timeout I set is 24 hours
    Version 1.55.1 - the job keeps running even after 24 hours( the timeout I set). Additionally, I checked the left records during the job and it appears to stop deleting records after 5-6 hours.

  2. delete 100million records.
    Version 1.55.0 and Version 1.46.0 - Job Succeeded after 30 minutes.
    Version 1.55.1 - After 2 hours the job is still running and I checked the left record number through query is 0. I manually stopped the job after 3 hours.

  3. delete 200 million records
    Version 1.55.1 - Job succeeded after 6 hours. I was not able to check when the records were all deleted because I ran the job during the night.
    Version 1.55.0 - Not checked yet. I will find out today.

@olavloite
Copy link
Collaborator

Hi @FengnaLiu

I really sorry to bother you again.
Don't worry about that. This is valuable information for improving the client library.

Thanks for the detailed information and examples. I don't think the new version is literally deleting the records slower, but it might be that it is slower and less reliable in detecting when it has finished.

The big challenge here is that the Partitioned DML RPC is defined as an RPC that returns its response when the operation has finished. That works well for operations that normally takes seconds or maybe a couple of minutes, but not so well if the operation can take hours as any network hick-up during this time might cause unexpected errors. That is why for example RPCs to update the DDL of a database respond with a long-running operation that can be polled for its progress, instead of waiting to respond until the operation has finished on the backend.

I'll start a discussion about this with a couple of others to see if we can come up with a better solution for this. One solution might be to create a kind of long-running operation in the client library that handles all this for you. In the meantime, I'm afraid that your best option is to retry the delete operation with reasonable chunks of the table as you are already doing.

I'll get back to you on this as soon as I can.

@iamacleish
Copy link

Hey @olavloite !
Any updates on this?

@skuruppu
Copy link
Contributor

skuruppu commented Jun 4, 2020

@olavloite and I spoke yesterday about whether we can update the client library to support this case. I have a couple of issues with this:

  1. Does it make sense for us to retry these transactions on the client library? There may be some queries where this doesn't make sense?
  2. Under what errors should we retry?

I'm a bit concerned about making the client library logic more complicated than it already is. It also means that we'll have an inconsistency between how Java handles this vs client libraries for other languages.

Ideally the backend would return a LRO here instead of a standard response which would make it easier for us to consistently handle this type of issue across all the client libraries. But I don't see a change like this being made anytime soon.

rajatbhatta pushed a commit to rajatbhatta/java-spanner that referenced this issue Nov 17, 2022
…cies to v0.9.0 (googleapis#199)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [com.google.cloud:google-cloud-shared-dependencies](https://togithub.com/googleapis/java-shared-dependencies) | minor | `0.8.6` -> `0.9.0` |

---

### Release Notes

<details>
<summary>googleapis/java-shared-dependencies</summary>

### [`v0.9.0`](https://togithub.com/googleapis/java-shared-dependencies/blob/master/CHANGELOG.md#&#8203;090-httpswwwgithubcomgoogleapisjava-shared-dependenciescompare086v090-2020-08-31)

[Compare Source](https://togithub.com/googleapis/java-shared-dependencies/compare/v0.8.6...v0.9.0)

##### Bug Fixes

-   temporarily disable reporting to unblock releases ([#&#8203;129](https://www.github.com/googleapis/java-shared-dependencies/issues/129)) ([7fff6f2](https://www.github.com/googleapis/java-shared-dependencies/commit/7fff6f2c94a19ba998d8cd47e8be5a6333808df8))

##### Dependencies

-   update dependency com.google.protobuf:protobuf-bom to v3.13.0 ([#&#8203;126](https://www.github.com/googleapis/java-shared-dependencies/issues/126)) ([908063f](https://www.github.com/googleapis/java-shared-dependencies/commit/908063f9b820dd3195b15537189e45be0d4acbbb))
-   update dependency io.grpc:grpc-bom to v1.31.1 ([#&#8203;124](https://www.github.com/googleapis/java-shared-dependencies/issues/124)) ([553a339](https://www.github.com/googleapis/java-shared-dependencies/commit/553a3393f5bede0e90e16e2d0d87daa2b936ab32))
-   update google.common-protos.version to v1.18.1 ([#&#8203;119](https://www.github.com/googleapis/java-shared-dependencies/issues/119)) ([05ad27e](https://www.github.com/googleapis/java-shared-dependencies/commit/05ad27e35fe082e148d377315b10943b187b5670))
-   update google.core.version to v1.93.8 ([f72cef3](https://www.github.com/googleapis/java-shared-dependencies/commit/f72cef3d78a036e2b2434bf08b0a75d57b2fa139))
-   update iam.version to v0.15.0 ([#&#8203;118](https://www.github.com/googleapis/java-shared-dependencies/issues/118)) ([1409a16](https://www.github.com/googleapis/java-shared-dependencies/commit/1409a16826d3fd4a5d9cbcbe46ea4b4af8687a5c))
-   update iam.version to v1 (major) ([#&#8203;120](https://www.github.com/googleapis/java-shared-dependencies/issues/120)) ([a6243a0](https://www.github.com/googleapis/java-shared-dependencies/commit/a6243a02129e42fec804b5769fb8e3f334ba84ce))

##### [0.8.6](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.5...v0.8.6) (2020-08-07)

##### Dependencies

-   update gax to v1.58.2 ([#&#8203;115](https://www.github.com/googleapis/java-shared-dependencies/issues/115)) ([84b48b4](https://www.github.com/googleapis/java-shared-dependencies/commit/84b48b4e77a4b5b4a2db6030609abe241d5ee2e1))

##### [0.8.5](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.4...v0.8.5) (2020-08-07)

##### Dependencies

-   update gax to v1.58.1 ([#&#8203;111](https://www.github.com/googleapis/java-shared-dependencies/issues/111)) ([93a1691](https://www.github.com/googleapis/java-shared-dependencies/commit/93a16915b863a610ffdabe0e23aec58c4ae5f4f1))

##### [0.8.4](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.3...v0.8.4) (2020-08-04)

##### Dependencies

-   update core dependencies ([#&#8203;104](https://www.github.com/googleapis/java-shared-dependencies/issues/104)) ([5046818](https://www.github.com/googleapis/java-shared-dependencies/commit/504681803d2bba888404acffe9a8853746501358))
-   update dependency com.google.api:api-common to v1.10.0 ([#&#8203;101](https://www.github.com/googleapis/java-shared-dependencies/issues/101)) ([6472fac](https://www.github.com/googleapis/java-shared-dependencies/commit/6472face89700e3f2f982c04a5e88801876580be))
-   update dependency com.google.protobuf:protobuf-bom to v3.12.4 ([#&#8203;103](https://www.github.com/googleapis/java-shared-dependencies/issues/103)) ([885bd0e](https://www.github.com/googleapis/java-shared-dependencies/commit/885bd0ef3c9e344bd1fc60e0f3264995064001d9))

##### [0.8.3](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.2...v0.8.3) (2020-07-09)

##### Dependencies

-   update core dependencies ([#&#8203;96](https://www.github.com/googleapis/java-shared-dependencies/issues/96)) ([978e69e](https://www.github.com/googleapis/java-shared-dependencies/commit/978e69e9b5999630354ea204c034be2d6b8a2d80))
-   update dependency com.google.api-client:google-api-client-bom to v1.30.10 ([#&#8203;99](https://www.github.com/googleapis/java-shared-dependencies/issues/99)) ([65c9cce](https://www.github.com/googleapis/java-shared-dependencies/commit/65c9ccea34275fa6f9599043d6e06df169bc433a))
-   update dependency com.google.api:api-common to v1.9.3 ([#&#8203;91](https://www.github.com/googleapis/java-shared-dependencies/issues/91)) ([2a23a50](https://www.github.com/googleapis/java-shared-dependencies/commit/2a23a50ee5c73b581a02171048e20b14f999949f))

##### [0.8.2](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.1...v0.8.2) (2020-07-01)

##### Dependencies

-   update dependency com.google.auth:google-auth-library-bom to v0.21.0 ([#&#8203;86](https://www.github.com/googleapis/java-shared-dependencies/issues/86)) ([dc9d717](https://www.github.com/googleapis/java-shared-dependencies/commit/dc9d717fdec4f0962141ba34e98f5737ec3bc57a))
-   update dependency com.google.http-client:google-http-client-bom to v1.36.0 ([#&#8203;89](https://www.github.com/googleapis/java-shared-dependencies/issues/89)) ([12437d7](https://www.github.com/googleapis/java-shared-dependencies/commit/12437d7392a430299c3372d18d2650b62be60eaf))
-   update dependency io.grpc:grpc-bom to v1.30.1 ([#&#8203;82](https://www.github.com/googleapis/java-shared-dependencies/issues/82)) ([a5199a0](https://www.github.com/googleapis/java-shared-dependencies/commit/a5199a02d5bde75e86349b37c1efae64a6379a40))
-   update dependency io.grpc:grpc-bom to v1.30.2 ([#&#8203;85](https://www.github.com/googleapis/java-shared-dependencies/issues/85)) ([084d18d](https://www.github.com/googleapis/java-shared-dependencies/commit/084d18daab010c6b0be04e67b42ca8ab8ba5f3d5))

##### [0.8.1](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.0...v0.8.1) (2020-06-13)

##### Reverts

-   Revert "feat: mark javax annotations scope as provided ([#&#8203;70](https://togithub.com/googleapis/java-shared-dependencies/issues/70))" ([#&#8203;75](https://togithub.com/googleapis/java-shared-dependencies/issues/75)) ([d2f23ec](https://www.github.com/googleapis/java-shared-dependencies/commit/d2f23ecae56a1ec07b1217f0fca5347dd0f0406b)), closes [#&#8203;70](https://www.github.com/googleapis/java-shared-dependencies/issues/70) [#&#8203;75](https://www.github.com/googleapis/java-shared-dependencies/issues/75)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/java-spanner-jdbc).
rajatbhatta pushed a commit to rajatbhatta/java-spanner that referenced this issue Nov 17, 2022
🤖 I have created a release \*beep\* \*boop\* 
---
### [1.17.1](https://www.github.com/googleapis/java-spanner-jdbc/compare/v1.17.0...v1.17.1) (2020-09-21)


### Dependencies

* update dependency com.google.cloud:google-cloud-shared-dependencies to v0.9.0 ([googleapis#199](https://www.github.com/googleapis/java-spanner-jdbc/issues/199)) ([59a7d07](https://www.github.com/googleapis/java-spanner-jdbc/commit/59a7d07c284210033bd1d587b09c44d9c271a52e))
---


This PR was generated with [Release Please](https://github.com/googleapis/release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/java-spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
5 participants