Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug fix for supporting sorted-indexes for partial-upsert #12544

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Mar 2, 2024

label:
bugfix
upsert

Potential fix for #12397

As called out in the issue:

With Partial Upserts, say we had 4 events for a given primary-key in the Mutable Segment: R0, R1, R2, R3. Let's also assume that the comparison column value of these events is: R0 < R1 < (R2 = R3).

If after applying the sorted column, their order changes to: R0, R1, R3, R2, then the UMM will start pointing to R2 as the valid doc.

Now based on the assumption that R3's docId will be greater than R2's docId, raising this fix as an echancement to fix in #12395 where in case of same comparison column value we also check the docID value of the old mutable segment to know for a particular record which one to keep.

We keep the max-doc-id of older segment in making this decision. Say, in the example above; we will ensure to dedup R3 as the only record.

Tested in our local cluster by adding sorted-index column to a table with constant comparison column. Ensured that we are deduping the correct record and persisting it. Added UTs for updating the resolvingComparisonTies method.

cc @ankitsultana @Jackie-Jiang

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2024

Codecov Report

Attention: Patch coverage is 64.70588% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 62.17%. Comparing base (59551e4) to head (f0d82d2).
Report is 444 commits behind head on master.

Files Patch % Lines
...cal/upsert/BasePartitionUpsertMetadataManager.java 50.00% 0 Missing and 3 partials ⚠️
...local/indexsegment/mutable/MutableSegmentImpl.java 85.71% 1 Missing ⚠️
...t/ConcurrentMapPartitionUpsertMetadataManager.java 66.66% 1 Missing ⚠️
...ava/org/apache/pinot/segment/spi/IndexSegment.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12544      +/-   ##
============================================
+ Coverage     61.75%   62.17%   +0.42%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2515      +79     
  Lines        133233   137875    +4642     
  Branches      20636    21341     +705     
============================================
+ Hits          82274    85729    +3455     
- Misses        44911    45765     +854     
- Partials       6048     6381     +333     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.14% <64.70%> (+0.43%) ⬆️
java-21 62.06% <64.70%> (+0.44%) ⬆️
skip-bytebuffers-false 62.15% <64.70%> (+0.40%) ⬆️
skip-bytebuffers-true 62.04% <64.70%> (+34.32%) ⬆️
temurin 62.17% <64.70%> (+0.42%) ⬆️
unittests 62.17% <64.70%> (+0.42%) ⬆️
unittests1 46.74% <0.00%> (-0.15%) ⬇️
unittests2 27.80% <64.70%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if (comparisonResult > 0) {
return recordInfo;
}
if (comparisonResult == 0 && recordInfo.getDocId() > maxComparisonValueRecordInfo.getDocId()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect.

recordInfoIterator and this function iterates over the new segment. We can't pick the record with the latest doc-id in the new segment.

To fix this, for each docId in the new segment, you need to get the docId in the old segment, and keep the record which had the highest docId in the old segment.

That is a bit of an involved problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect.

recordInfoIterator and this function iterates over the new segment. We can't pick the record with the latest doc-id in the new segment.

Yeah i misunderstood the sortedIndex implementation i thought we preserve the old docId values for a record. Seems we do not.

To fix this, for each docId in the new segment, you need to get the docId in the old segment, and keep the record which had the highest docId in the old segment.

That is a bit of an involved problem.

Hmm looks like we have the mapping available here:

int[] sortedDocIds = _columnIndicesForRealtimeTable.getSortedColumn() != null

Somehow we need to make it available here while resolving comparison ties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix this, for each docId in the new segment, you need to get the docId in the old segment, and keep the record which had the highest docId in the old segment.

@ankitsultana updated to maintain this in a list and use it to resolve ties. Please review the updated patch! Tested in one of our clusters and worked as expected. cc @Jackie-Jiang

@tibrewalpratik17 tibrewalpratik17 marked this pull request as draft March 3, 2024 10:23
@tibrewalpratik17 tibrewalpratik17 force-pushed the partial_upsert_sorted_index_fix branch from 5a5878c to 3313e6c Compare May 13, 2024 17:20
@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review May 14, 2024 18:32
@tibrewalpratik17 tibrewalpratik17 changed the title Potential bug fix for partial-upsert with sorted-indexes Potential bug fix for supporting sorted-indexes for partial-upsert May 14, 2024
return recordInfo;
} else if (comparisonResult == 0) {
// Check if mappings are empty or the current record has a higher doc ID position
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in this scenario: say we have two replicas of a consuming segment R0 and R1. Then we initiate a commit flow:

  • R0 is selected as the committer and R1 is told to wait.
  • R0 sorts the records and does a local addOrReplaceSegment. Say the sort order for the records is not unique.
  • R1 is told to discard its local copy and download from peer.
  • Would the addOrReplaceSegment call in that case be made with R1's old MutableSegment and R0's ImmutableSegment? If so, we can't use this mapping from the Mutable Segment because the two replicas could have come to different orderings.

I haven't thought about this in a while, but based on my previous assessment, supporting sorted index for Realtime tables generally is a bit of an involved problem and requires us to take a holistic look at the entire design. Even for Vanilla Realtime tables we can end up with CRC mismatch due to comparison ties.

I think the simplest recommendation we should share with customers for now is: ensure that you don't have comparison column ties.

cc: @Jackie-Jiang for his thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens in this scenario: say we have two replicas of a consuming segment R0 and R1. Then we initiate a commit flow:
R0 is selected as the committer and R1 is told to wait.
R0 sorts the records and does a local addOrReplaceSegment. Say the sort order for the records is not unique.
R1 is told to discard its local copy and download from peer.
Would the addOrReplaceSegment call in that case be made with R1's old MutableSegment and R0's ImmutableSegment? If so, we can't use this mapping from the Mutable Segment because the two replicas could have come to different orderings.

In this scenario we would use R0's ImmutableSegment to replace. In this case, we wouldn't have the original docID mapping anymore. If you think about it, we would hit the scenario during other similar events like resetting segment. In order to resolve it properly, we need to persist the original docID order somewhere like segment-metadata or a virtual column. Having a virtual column like $originalDocID would also make sense and especially for upserts we can refer that to resolve comparison ties.

Even for Vanilla Realtime tables we can end up with CRC mismatch due to comparison ties.

This shouldn't be an issue for append-only tables as we are not concerned about data-correctness. In case of upserts, data-correctness becomes a major issue.

I think the simplest recommendation we should share with customers for now is: ensure that you don't have comparison column ties.

A lot of folks use constant comparison column to ensure upsert happens for every new record for a given key. Maybe we can try to brainstorm possible solutions and see how involved the solution actually is and accordingly make a call.

@Jackie-Jiang any suggestions how we can support this ^

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @klsince

This approach won't work because there is no guarantee that the segment is sealed locally. If segment is downloaded, _originalDocIDPositionMapping won't be generated.
One potential solution would be to persist the original doc id into a column, then use that to break tie

@tibrewalpratik17
Copy link
Contributor Author

One potential solution would be to persist the original doc id into a column, then use that to break tie

Yes that's what I suggested in one of the threads above.
Having a virtual column like $originalDocID would also make sense and especially for upserts we can refer that to resolve comparison ties.
For now, i can think of only this as a possible solution or to persist the original docID in segment-metadata somewhere but that doesn't seem optimal. Maintaining a virtual column still makes more sense.
cc @klsince what are your thoughts on this?

@ankitsultana
Copy link
Contributor

Can we consider adding the stream offset as a virtual column? Then we could allow users to set that virtual column as the comparison column.

@tibrewalpratik17
Copy link
Contributor Author

tibrewalpratik17 commented May 16, 2024

Can we consider adding the stream offset as a virtual column? Then we could allow users to set that virtual column as the comparison column.

Setting offset field as a comparison column might be another discussion we are getting into. But we can use offsets virtual column to break ties in general (very similar logic to originalDocID field). Other than higher memory / disk usage compared to originalDocID field i don't see any other challenges. But yeah having offset as a column also comes with pros like easier debugging and observability around data.

@klsince
Copy link
Contributor

klsince commented May 16, 2024

to my best understanding on this issue, I'd +1 to keep original docId in a virtualColumn to help break tie.

for the idea of using stream offset, I'd prefer to put stream offset in a real column, assuming we may create some ingestion transformation to extract the msg offset and put into the column. Then we can use this column as one of the comparison columns, to prevent comparison ties from happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants