Potential bug fix for supporting sorted-indexes for partial-upsert #12544

tibrewalpratik17 · 2024-03-02T23:18:31Z

label:
bugfix
upsert

Potential fix for #12397

As called out in the issue:

With Partial Upserts, say we had 4 events for a given primary-key in the Mutable Segment: R0, R1, R2, R3. Let's also assume that the comparison column value of these events is: R0 < R1 < (R2 = R3).

If after applying the sorted column, their order changes to: R0, R1, R3, R2, then the UMM will start pointing to R2 as the valid doc.

Now based on the assumption that R3's docId will be greater than R2's docId, raising this fix as an echancement to fix in #12395 where in case of same comparison column value we also check the docID value of the old mutable segment to know for a particular record which one to keep.

We keep the max-doc-id of older segment in making this decision. Say, in the example above; we will ensure to dedup R3 as the only record.

Tested in our local cluster by adding sorted-index column to a table with constant comparison column. Ensured that we are deduping the correct record and persisting it. Added UTs for updating the resolvingComparisonTies method.

cc @ankitsultana @Jackie-Jiang

codecov-commenter · 2024-03-02T23:53:12Z

Codecov Report

Attention: Patch coverage is 64.70588% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 62.17%. Comparing base (59551e4) to head (f0d82d2).
Report is 444 commits behind head on master.

Files	Patch %	Lines
...cal/upsert/BasePartitionUpsertMetadataManager.java	50.00%	0 Missing and 3 partials ⚠️
...local/indexsegment/mutable/MutableSegmentImpl.java	85.71%	1 Missing ⚠️
...t/ConcurrentMapPartitionUpsertMetadataManager.java	66.66%	1 Missing ⚠️
...ava/org/apache/pinot/segment/spi/IndexSegment.java	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #12544      +/-   ##
============================================
+ Coverage     61.75%   62.17%   +0.42%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2515      +79     
  Lines        133233   137875    +4642     
  Branches      20636    21341     +705     
============================================
+ Hits          82274    85729    +3455     
- Misses        44911    45765     +854     
- Partials       6048     6381     +333

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`62.14% <64.70%> (+0.43%)`	⬆️
java-21	`62.06% <64.70%> (+0.44%)`	⬆️
skip-bytebuffers-false	`62.15% <64.70%> (+0.40%)`	⬆️
skip-bytebuffers-true	`62.04% <64.70%> (+34.32%)`	⬆️
temurin	`62.17% <64.70%> (+0.42%)`	⬆️
unittests	`62.17% <64.70%> (+0.42%)`	⬆️
unittests1	`46.74% <0.00%> (-0.15%)`	⬇️
unittests2	`27.80% <64.70%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ankitsultana · 2024-03-03T00:21:40Z

.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java

+            if (comparisonResult > 0) {
+              return recordInfo;
+            }
+            if (comparisonResult == 0 && recordInfo.getDocId() > maxComparisonValueRecordInfo.getDocId()) {


I think this is incorrect.

recordInfoIterator and this function iterates over the new segment. We can't pick the record with the latest doc-id in the new segment.

To fix this, for each docId in the new segment, you need to get the docId in the old segment, and keep the record which had the highest docId in the old segment.

That is a bit of an involved problem.

I think this is incorrect.

recordInfoIterator and this function iterates over the new segment. We can't pick the record with the latest doc-id in the new segment.

Yeah i misunderstood the sortedIndex implementation i thought we preserve the old docId values for a record. Seems we do not.

To fix this, for each docId in the new segment, you need to get the docId in the old segment, and keep the record which had the highest docId in the old segment.

That is a bit of an involved problem.

Hmm looks like we have the mapping available here:

pinot/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/converter/RealtimeSegmentConverter.java

Line 122 in 43dadbf

int[] sortedDocIds = _columnIndicesForRealtimeTable.getSortedColumn() != null

Somehow we need to make it available here while resolving comparison ties.

To fix this, for each docId in the new segment, you need to get the docId in the old segment, and keep the record which had the highest docId in the old segment.

@ankitsultana updated to maintain this in a list and use it to resolve ties. Please review the updated patch! Tested in one of our clusters and worked as expected. cc @Jackie-Jiang

ankitsultana · 2024-05-14T21:10:39Z

.../src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java

              return recordInfo;
+            } else if (comparisonResult == 0) {
+              // Check if mappings are empty or the current record has a higher doc ID position


What happens in this scenario: say we have two replicas of a consuming segment R0 and R1. Then we initiate a commit flow:

R0 is selected as the committer and R1 is told to wait.

R0 sorts the records and does a local addOrReplaceSegment. Say the sort order for the records is not unique.

R1 is told to discard its local copy and download from peer.

Would the addOrReplaceSegment call in that case be made with R1's old MutableSegment and R0's ImmutableSegment? If so, we can't use this mapping from the Mutable Segment because the two replicas could have come to different orderings.

I haven't thought about this in a while, but based on my previous assessment, supporting sorted index for Realtime tables generally is a bit of an involved problem and requires us to take a holistic look at the entire design. Even for Vanilla Realtime tables we can end up with CRC mismatch due to comparison ties.

I think the simplest recommendation we should share with customers for now is: ensure that you don't have comparison column ties.

cc: @Jackie-Jiang for his thoughts.

What happens in this scenario: say we have two replicas of a consuming segment R0 and R1. Then we initiate a commit flow:
R0 is selected as the committer and R1 is told to wait.
R0 sorts the records and does a local addOrReplaceSegment. Say the sort order for the records is not unique.
R1 is told to discard its local copy and download from peer.
Would the addOrReplaceSegment call in that case be made with R1's old MutableSegment and R0's ImmutableSegment? If so, we can't use this mapping from the Mutable Segment because the two replicas could have come to different orderings.

In this scenario we would use R0's ImmutableSegment to replace. In this case, we wouldn't have the original docID mapping anymore. If you think about it, we would hit the scenario during other similar events like resetting segment. In order to resolve it properly, we need to persist the original docID order somewhere like segment-metadata or a virtual column. Having a virtual column like $originalDocID would also make sense and especially for upserts we can refer that to resolve comparison ties.

Even for Vanilla Realtime tables we can end up with CRC mismatch due to comparison ties.

This shouldn't be an issue for append-only tables as we are not concerned about data-correctness. In case of upserts, data-correctness becomes a major issue.

I think the simplest recommendation we should share with customers for now is: ensure that you don't have comparison column ties.

A lot of folks use constant comparison column to ensure upsert happens for every new record for a given key. Maybe we can try to brainstorm possible solutions and see how involved the solution actually is and accordingly make a call.

@Jackie-Jiang any suggestions how we can support this ^

Jackie-Jiang

cc @klsince

This approach won't work because there is no guarantee that the segment is sealed locally. If segment is downloaded, _originalDocIDPositionMapping won't be generated.
One potential solution would be to persist the original doc id into a column, then use that to break tie

tibrewalpratik17 · 2024-05-15T19:27:10Z

One potential solution would be to persist the original doc id into a column, then use that to break tie

Yes that's what I suggested in one of the threads above.
Having a virtual column like $originalDocID would also make sense and especially for upserts we can refer that to resolve comparison ties.
For now, i can think of only this as a possible solution or to persist the original docID in segment-metadata somewhere but that doesn't seem optimal. Maintaining a virtual column still makes more sense.
cc @klsince what are your thoughts on this?

ankitsultana · 2024-05-16T03:43:19Z

Can we consider adding the stream offset as a virtual column? Then we could allow users to set that virtual column as the comparison column.

tibrewalpratik17 · 2024-05-16T06:27:49Z

Can we consider adding the stream offset as a virtual column? Then we could allow users to set that virtual column as the comparison column.

Setting offset field as a comparison column might be another discussion we are getting into. But we can use offsets virtual column to break ties in general (very similar logic to originalDocID field). Other than higher memory / disk usage compared to originalDocID field i don't see any other challenges. But yeah having offset as a column also comes with pros like easier debugging and observability around data.

klsince · 2024-05-16T22:40:07Z

to my best understanding on this issue, I'd +1 to keep original docId in a virtualColumn to help break tie.

for the idea of using stream offset, I'd prefer to put stream offset in a real column, assuming we may create some ingestion transformation to extract the msg offset and put into the column. Then we can use this column as one of the comparison columns, to prevent comparison ties from happening.

ankitsultana requested changes Mar 3, 2024

View reviewed changes

tibrewalpratik17 marked this pull request as draft March 3, 2024 10:23

tibrewalpratik17 added 2 commits May 13, 2024 22:49

Potential bug fix for partial-upsert with sorted-indexes

fbd4928

add mapping of old doc ids to new doc ids

3313e6c

tibrewalpratik17 force-pushed the partial_upsert_sorted_index_fix branch from 5a5878c to 3313e6c Compare May 13, 2024 17:20

add UT

f0d82d2

tibrewalpratik17 marked this pull request as ready for review May 14, 2024 18:32

tibrewalpratik17 changed the title ~~Potential bug fix for partial-upsert with sorted-indexes~~ Potential bug fix for supporting sorted-indexes for partial-upsert May 14, 2024

tibrewalpratik17 requested a review from ankitsultana May 14, 2024 18:35

ankitsultana reviewed May 14, 2024

View reviewed changes

Jackie-Jiang added upsert bugfix labels May 15, 2024

Jackie-Jiang reviewed May 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug fix for supporting sorted-indexes for partial-upsert #12544

Potential bug fix for supporting sorted-indexes for partial-upsert #12544

tibrewalpratik17 commented Mar 2, 2024 •

edited

codecov-commenter commented Mar 2, 2024 •

edited

ankitsultana Mar 3, 2024

tibrewalpratik17 Mar 3, 2024

tibrewalpratik17 May 14, 2024

ankitsultana May 14, 2024

tibrewalpratik17 May 15, 2024

Jackie-Jiang left a comment

tibrewalpratik17 commented May 15, 2024

ankitsultana commented May 16, 2024

tibrewalpratik17 commented May 16, 2024 •

edited

klsince commented May 16, 2024

Potential bug fix for supporting sorted-indexes for partial-upsert #12544

Are you sure you want to change the base?

Potential bug fix for supporting sorted-indexes for partial-upsert #12544

Conversation

tibrewalpratik17 commented Mar 2, 2024 • edited

codecov-commenter commented Mar 2, 2024 • edited

Codecov Report

ankitsultana Mar 3, 2024

Choose a reason for hiding this comment

tibrewalpratik17 Mar 3, 2024

Choose a reason for hiding this comment

tibrewalpratik17 May 14, 2024

Choose a reason for hiding this comment

ankitsultana May 14, 2024

Choose a reason for hiding this comment

tibrewalpratik17 May 15, 2024

Choose a reason for hiding this comment

Jackie-Jiang left a comment

Choose a reason for hiding this comment

tibrewalpratik17 commented May 15, 2024

ankitsultana commented May 16, 2024

tibrewalpratik17 commented May 16, 2024 • edited

klsince commented May 16, 2024

tibrewalpratik17 commented Mar 2, 2024 •

edited

codecov-commenter commented Mar 2, 2024 •

edited

tibrewalpratik17 commented May 16, 2024 •

edited