New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experienced '504 Deadline Exceeded' error when tried to delete large number rows by partitioned dml #199
Comments
@FengnaLiu Thanks for the detailed report. I'll look into it. Could you also share the stacktrace of the exception that you get? The reason I'm asking is that I noticed that you wrote that you get a |
@FengnaLiu I have a couple of additional questions in addition to the one above.
|
Setting a timeout value for Partitioned DML would not be respected if the timeout value was higher than the timeout value set for the ExecuteSql RPC on the SpannerStub. Lower timeout values would be respected. Fixes #199
Setting a timeout value for Partitioned DML would not be respected if the timeout value was higher than the timeout value set for the ExecuteSql RPC on the SpannerStub. Lower timeout values would be respected. Fixes #199
* fix: Partitioned DML timeout was not always respected Setting a timeout value for Partitioned DML would not be respected if the timeout value was higher than the timeout value set for the ExecuteSql RPC on the SpannerStub. Lower timeout values would be respected. Fixes #199 * fix: add ignored changes + InternalApi * tests: add test for retry on UNAVAILABLE
@olavloite I tried to run the delete job again. I got the 'DEADLINE_EXCEEDED' instead of '504 DEADLINE_EXCEEDED' about four hours later after started the delete job. Following are the details.
|
@FengnaLiu Thanks for the additional information. I have been able to reproduce this, and a fix has been merged. You could try it with that by checking out the latest version of the master branch of this repository, or wait for the next release. |
@olavloite Any idea when the next release will be? |
🤖 I have created a release \*beep\* \*boop\* --- ## [1.55.0](https://www.github.com/googleapis/java-spanner/compare/v1.54.0...v1.55.0) (2020-05-19) ### Features * mark when a Spanner client is closed ([#198](https://www.github.com/googleapis/java-spanner/issues/198)) ([50cb174](https://www.github.com/googleapis/java-spanner/commit/50cb1744e7ede611758d3ff63b3df77a1d3682eb)) ### Bug Fixes * make it possible to override backups methods ([#195](https://www.github.com/googleapis/java-spanner/issues/195)) ([2d19c25](https://www.github.com/googleapis/java-spanner/commit/2d19c25ba32847d116194565e67e1b1276fcb9f8)) * Partitioned DML timeout was not always respected ([#203](https://www.github.com/googleapis/java-spanner/issues/203)) ([13cb37e](https://www.github.com/googleapis/java-spanner/commit/13cb37e55ddfd1ff4ec22b1dcdc20c4832eee444)), closes [#199](https://www.github.com/googleapis/java-spanner/issues/199) * partitionedDml stub was not closed ([#213](https://www.github.com/googleapis/java-spanner/issues/213)) ([a2d9a33](https://www.github.com/googleapis/java-spanner/commit/a2d9a33fa31f7467fc2bfbef5a29c4b3f5aea7c8)) * reuse clientId for invalidated databases ([#206](https://www.github.com/googleapis/java-spanner/issues/206)) ([7b4490d](https://www.github.com/googleapis/java-spanner/commit/7b4490dfb61fbc81b5bd6be6c9a663b36b5ce402)) * use nanos to prevent truncation errors ([#204](https://www.github.com/googleapis/java-spanner/issues/204)) ([a608460](https://www.github.com/googleapis/java-spanner/commit/a60846043dc0ca47e1970d8ab99380b6d725c7a9)), closes [#200](https://www.github.com/googleapis/java-spanner/issues/200) ### Dependencies * update dependency com.google.cloud:google-cloud-shared-dependencies to v0.3.1 ([#190](https://www.github.com/googleapis/java-spanner/issues/190)) ([ad41a0d](https://www.github.com/googleapis/java-spanner/commit/ad41a0d4b0cc6a2c0ae0611c767652f64cfb2fb7)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
@iamacleish A new release has now been published. |
@olavloite Thanks for your information. Details information as following.
`
` |
@FengnaLiu That's surprising. I'll look into it right away. Did you set any specific timeout value? |
@olavloite I did set the timeout to be 24 hours. |
@olavloite I ran the job again and then the error seems different.
|
@FengnaLiu Thanks for the additional information. That is indeed a different error. The Partitioned DML transaction has been aborted, and normally that should lead to a retry by the client, but it seems that that is not happening in this case. I'll try to figure out why that is. |
The PartitionedDML retry settings were only applied for the RPC, and not for the generic retryer that would retry the PDML transaction if it was aborted by Spanner. This could cause long-running PDML transactions to fail with an Aborted exception. Fixes #199
* fix: PDML retry settings were not applied for aborted tx The PartitionedDML retry settings were only applied for the RPC, and not for the generic retryer that would retry the PDML transaction if it was aborted by Spanner. This could cause long-running PDML transactions to fail with an Aborted exception. Fixes #199 * fix: add ignored diff to clirr
@FengnaLiu The last problem you ran into seems to be caused by the following scenario:
The last point above has now also been fixed in #232 |
@olavloite I really appreciate your quick response.
|
@olavloite I really sorry to bother you again. I tried version 1.55.1 and 1.55.0 for several delete patterns. And I found out that with version 1.55.1 it did experience no timeout error but the job kept running even the records have been all deleted. Additionally, it appears to deleting much far more slower than the older version. Details as followings:
|
Hi @FengnaLiu
Thanks for the detailed information and examples. I don't think the new version is literally deleting the records slower, but it might be that it is slower and less reliable in detecting when it has finished. The big challenge here is that the Partitioned DML RPC is defined as an RPC that returns its response when the operation has finished. That works well for operations that normally takes seconds or maybe a couple of minutes, but not so well if the operation can take hours as any network hick-up during this time might cause unexpected errors. That is why for example RPCs to update the DDL of a database respond with a long-running operation that can be polled for its progress, instead of waiting to respond until the operation has finished on the backend. I'll start a discussion about this with a couple of others to see if we can come up with a better solution for this. One solution might be to create a kind of long-running operation in the client library that handles all this for you. In the meantime, I'm afraid that your best option is to retry the delete operation with reasonable chunks of the table as you are already doing. I'll get back to you on this as soon as I can. |
Hey @olavloite ! |
@olavloite and I spoke yesterday about whether we can update the client library to support this case. I have a couple of issues with this:
I'm a bit concerned about making the client library logic more complicated than it already is. It also means that we'll have an inconsistency between how Java handles this vs client libraries for other languages. Ideally the backend would return a LRO here instead of a standard response which would make it easier for us to consistently handle this type of issue across all the client libraries. But I don't see a change like this being made anytime soon. |
…cies to v0.9.0 (googleapis#199) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [com.google.cloud:google-cloud-shared-dependencies](https://togithub.com/googleapis/java-shared-dependencies) | minor | `0.8.6` -> `0.9.0` | --- ### Release Notes <details> <summary>googleapis/java-shared-dependencies</summary> ### [`v0.9.0`](https://togithub.com/googleapis/java-shared-dependencies/blob/master/CHANGELOG.md#​090-httpswwwgithubcomgoogleapisjava-shared-dependenciescompare086v090-2020-08-31) [Compare Source](https://togithub.com/googleapis/java-shared-dependencies/compare/v0.8.6...v0.9.0) ##### Bug Fixes - temporarily disable reporting to unblock releases ([#​129](https://www.github.com/googleapis/java-shared-dependencies/issues/129)) ([7fff6f2](https://www.github.com/googleapis/java-shared-dependencies/commit/7fff6f2c94a19ba998d8cd47e8be5a6333808df8)) ##### Dependencies - update dependency com.google.protobuf:protobuf-bom to v3.13.0 ([#​126](https://www.github.com/googleapis/java-shared-dependencies/issues/126)) ([908063f](https://www.github.com/googleapis/java-shared-dependencies/commit/908063f9b820dd3195b15537189e45be0d4acbbb)) - update dependency io.grpc:grpc-bom to v1.31.1 ([#​124](https://www.github.com/googleapis/java-shared-dependencies/issues/124)) ([553a339](https://www.github.com/googleapis/java-shared-dependencies/commit/553a3393f5bede0e90e16e2d0d87daa2b936ab32)) - update google.common-protos.version to v1.18.1 ([#​119](https://www.github.com/googleapis/java-shared-dependencies/issues/119)) ([05ad27e](https://www.github.com/googleapis/java-shared-dependencies/commit/05ad27e35fe082e148d377315b10943b187b5670)) - update google.core.version to v1.93.8 ([f72cef3](https://www.github.com/googleapis/java-shared-dependencies/commit/f72cef3d78a036e2b2434bf08b0a75d57b2fa139)) - update iam.version to v0.15.0 ([#​118](https://www.github.com/googleapis/java-shared-dependencies/issues/118)) ([1409a16](https://www.github.com/googleapis/java-shared-dependencies/commit/1409a16826d3fd4a5d9cbcbe46ea4b4af8687a5c)) - update iam.version to v1 (major) ([#​120](https://www.github.com/googleapis/java-shared-dependencies/issues/120)) ([a6243a0](https://www.github.com/googleapis/java-shared-dependencies/commit/a6243a02129e42fec804b5769fb8e3f334ba84ce)) ##### [0.8.6](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.5...v0.8.6) (2020-08-07) ##### Dependencies - update gax to v1.58.2 ([#​115](https://www.github.com/googleapis/java-shared-dependencies/issues/115)) ([84b48b4](https://www.github.com/googleapis/java-shared-dependencies/commit/84b48b4e77a4b5b4a2db6030609abe241d5ee2e1)) ##### [0.8.5](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.4...v0.8.5) (2020-08-07) ##### Dependencies - update gax to v1.58.1 ([#​111](https://www.github.com/googleapis/java-shared-dependencies/issues/111)) ([93a1691](https://www.github.com/googleapis/java-shared-dependencies/commit/93a16915b863a610ffdabe0e23aec58c4ae5f4f1)) ##### [0.8.4](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.3...v0.8.4) (2020-08-04) ##### Dependencies - update core dependencies ([#​104](https://www.github.com/googleapis/java-shared-dependencies/issues/104)) ([5046818](https://www.github.com/googleapis/java-shared-dependencies/commit/504681803d2bba888404acffe9a8853746501358)) - update dependency com.google.api:api-common to v1.10.0 ([#​101](https://www.github.com/googleapis/java-shared-dependencies/issues/101)) ([6472fac](https://www.github.com/googleapis/java-shared-dependencies/commit/6472face89700e3f2f982c04a5e88801876580be)) - update dependency com.google.protobuf:protobuf-bom to v3.12.4 ([#​103](https://www.github.com/googleapis/java-shared-dependencies/issues/103)) ([885bd0e](https://www.github.com/googleapis/java-shared-dependencies/commit/885bd0ef3c9e344bd1fc60e0f3264995064001d9)) ##### [0.8.3](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.2...v0.8.3) (2020-07-09) ##### Dependencies - update core dependencies ([#​96](https://www.github.com/googleapis/java-shared-dependencies/issues/96)) ([978e69e](https://www.github.com/googleapis/java-shared-dependencies/commit/978e69e9b5999630354ea204c034be2d6b8a2d80)) - update dependency com.google.api-client:google-api-client-bom to v1.30.10 ([#​99](https://www.github.com/googleapis/java-shared-dependencies/issues/99)) ([65c9cce](https://www.github.com/googleapis/java-shared-dependencies/commit/65c9ccea34275fa6f9599043d6e06df169bc433a)) - update dependency com.google.api:api-common to v1.9.3 ([#​91](https://www.github.com/googleapis/java-shared-dependencies/issues/91)) ([2a23a50](https://www.github.com/googleapis/java-shared-dependencies/commit/2a23a50ee5c73b581a02171048e20b14f999949f)) ##### [0.8.2](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.1...v0.8.2) (2020-07-01) ##### Dependencies - update dependency com.google.auth:google-auth-library-bom to v0.21.0 ([#​86](https://www.github.com/googleapis/java-shared-dependencies/issues/86)) ([dc9d717](https://www.github.com/googleapis/java-shared-dependencies/commit/dc9d717fdec4f0962141ba34e98f5737ec3bc57a)) - update dependency com.google.http-client:google-http-client-bom to v1.36.0 ([#​89](https://www.github.com/googleapis/java-shared-dependencies/issues/89)) ([12437d7](https://www.github.com/googleapis/java-shared-dependencies/commit/12437d7392a430299c3372d18d2650b62be60eaf)) - update dependency io.grpc:grpc-bom to v1.30.1 ([#​82](https://www.github.com/googleapis/java-shared-dependencies/issues/82)) ([a5199a0](https://www.github.com/googleapis/java-shared-dependencies/commit/a5199a02d5bde75e86349b37c1efae64a6379a40)) - update dependency io.grpc:grpc-bom to v1.30.2 ([#​85](https://www.github.com/googleapis/java-shared-dependencies/issues/85)) ([084d18d](https://www.github.com/googleapis/java-shared-dependencies/commit/084d18daab010c6b0be04e67b42ca8ab8ba5f3d5)) ##### [0.8.1](https://www.github.com/googleapis/java-shared-dependencies/compare/v0.8.0...v0.8.1) (2020-06-13) ##### Reverts - Revert "feat: mark javax annotations scope as provided ([#​70](https://togithub.com/googleapis/java-shared-dependencies/issues/70))" ([#​75](https://togithub.com/googleapis/java-shared-dependencies/issues/75)) ([d2f23ec](https://www.github.com/googleapis/java-shared-dependencies/commit/d2f23ecae56a1ec07b1217f0fca5347dd0f0406b)), closes [#​70](https://www.github.com/googleapis/java-shared-dependencies/issues/70) [#​75](https://www.github.com/googleapis/java-shared-dependencies/issues/75) </details> --- ### Renovate configuration :date: **Schedule**: At any time (no schedule defined). :vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied. :recycle: **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. :no_bell: **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#github/googleapis/java-spanner-jdbc).
🤖 I have created a release \*beep\* \*boop\* --- ### [1.17.1](https://www.github.com/googleapis/java-spanner-jdbc/compare/v1.17.0...v1.17.1) (2020-09-21) ### Dependencies * update dependency com.google.cloud:google-cloud-shared-dependencies to v0.9.0 ([googleapis#199](https://www.github.com/googleapis/java-spanner-jdbc/issues/199)) ([59a7d07](https://www.github.com/googleapis/java-spanner-jdbc/commit/59a7d07c284210033bd1d587b09c44d9c271a52e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
There are 1 billion old records need to be cleaned. According the documents, It seems Partitioned DML should be the good choice. I choose java client because we can set timeout by using setPartitionedDmlTimeout(Duration.ofHours(24L)). However, I still experienced the '504 Deadline Exceeded' after a few hours that shorter than 24 hours I set. Why setPartitionedDmlTimeout did not work? How should I solve the problem? Is it not possible to delete so large number of data by using Partitioned DML.
Following are the details.
Environment details
3 nodes, 2 billion records in a table and 1 billion old records need to be cleaned
Steps to reproduce
Code example
The text was updated successfully, but these errors were encountered: