Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement aborted transactions retry mechanism #1
feat: implement aborted transactions retry mechanism #1
Changes from 11 commits
b5a38ec
57a1382
0550de9
da12f15
d181f4b
da45f4b
896c9bd
365271f
91f69ed
24c1030
5350338
37cce49
37096a3
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The java version actually consumes the results up to the same point that the user did before retrying before doing the comparison: https://github.com/googleapis/java-spanner/blob/059ef1ef1f03e80f4ff2705b45a62c553bb4e83d/google-cloud-spanner/src/main/java/com/google/cloud/spanner/connection/ChecksumResultSet.java#L145-L153. Here we actually yield the rows used in the comparison. How come this doesn't result in the user getting the same rows twice after an internal retry?
It's also a bit confusing to see this do the checksum comparison with each row. I know
_compare_checksums
passes if it sees fewer retried rows than original, but that's not what I'd expect it to do without reading it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olavloite, @skuruppu mentioned you might be able to review and compare this to the java client behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me a user will not use the rows read within a transaction in operations outside the transaction until the transaction is finished. It probably sounds hard, so consider this case:
User starts a transaction, reads several rows, then calls his own (not Spanner) API which is using these rows to write some data somewhere. Then he calls more functions in the same transaction and commits it. It's incorrect way of use - one should not do anything with data which the transaction is using at that moment, as transaction can fail and all the data/changes will be rollbacked, but it'll not rollback the records made in the user API, so the user will get the data desynchronization.
With this, the data that was read within transaction are considered as a temporary inexact-until-transaction-commit data, and if we're retrying a transaction, all that data should be re-read and recalculated, that's how I see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's something I'm not 100% like too, but retry logic looks too apportioned across classes and functions to me, so I've decided to make many things deep under the hood. Thus, in the transaction class we're not thinking about results number of a result set of one of the operations of this transaction, we're just calling
==
and it takes care about such a things deeper in the result set itself. Result set object is much more closer to these details than transaction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have absolutely no Python experience, so bear with me if I misread or misunderstood (parts of) the code here, but if I understand it correctly the suggested implementation for Python here does the following (during the initial transaction):
__iter__
method will normally (i.e. in the original transaction) iterate over thePartialResultSet
s that are received from Spanner, and for each of these iterate over the rows in thatPartialResultSet
.ResultSet
has a checksum, it will update the running checksum with the contents of the current row._original_results_checksum
will beNone
and the comparison will always succeed.If a transaction is retried, the following happens:
__iter__
method will iterate over thePartialResultSet
s that are received from Spanner for the retried query, and for each of these iterate over the rows in thatPartialResultSet
.ResultSet
has a checksum, it will update the running checksum with the contents of the current row.Assume that a query returns the following rows during the original transaction:
During the original transaction the entire result set was consumed so the running checksum was calculated based on all the rows:
C1
C2
So the original checksum is now
C2
.During a retry the query will be retried and the checksum will be calculated again (I assume that the results during the retry are equal to the original query):
C1
. That is compared to the original checksum that isC2
and the comparison will fail.The checksum should therefore not be compared for each row, but after consuming the same number of rows during the retry as during original query.
(But I get a feeling that there is a more fundamental misunderstanding about this retry logic. I'll elaborate on that in a separate comment.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @olavloite
Yeah, that's the thing @c24t mentioned as something that we don't expect. In fact I'm redefining
==
operator forResultsChecksum
object, so that it compares checksums in this way:At the first line it checks if we're in retried transaction - if not, than we have nothing to compare.
The second checks if checksums are equal, considering their length (number of consumed results).
And the third line checks if the retried transaction came to the original transaction failure point.
So, if the given checksums have different length, it'll be considered like there are no problems so far, 'cause a) retried transaction is not at the point, where the original failed, so it's too soon to compare checksums; b) retried transaction went farther than the original transaction = the checksums were compared = they were equal at the original transaction failure point).
Looking at this, I think there can be a problem when retried transaction went farther than the original. I'll double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c24t, hm-m, I think the logic here can be little bit simpler, if I'll use a
__len__
magic method.UPDATE: yes, it looks better, PTAL