New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] add locking logic to get consistent table view for upsert tables #12976
base: master
Are you sure you want to change the base?
Conversation
5e6acce
to
de05748
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12976 +/- ##
============================================
+ Coverage 61.75% 62.14% +0.39%
+ Complexity 207 198 -9
============================================
Files 2436 2502 +66
Lines 133233 136615 +3382
Branches 20636 21147 +511
============================================
+ Hits 82274 84902 +2628
- Misses 44911 45429 +518
- Partials 6048 6284 +236
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return _lastUpsertViewRefreshTimeMs + _upsertViewRefreshIntervalMs > nowMs; | ||
} | ||
|
||
private void doBatchRefreshUpsertView(long upsertViewFreshnessMs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this at partition metadata manager level can we move this to table data manager level? We can maintain a map of segment -> segmentContext and do a refresh there. This way we can easily extend the scope of segment context in future as well.
I was trying on something like this -- https://github.com/apache/pinot/compare/master...tibrewalpratik17:pinot:fix_upsert_inconsistency?expand=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing the code changes, and I read it through. The refreshing logic is similar with the batch mode here although done at table mgr level. But your changes didn't allow query threads to force update the view to ensure data freshness.
I think it's more intuitive to handle data consistency in table partition mgr, as table partition is the unit for managing upsert states in upsert table.
Today, the table partition mgr enriches
the segment contexts, which are provided by table partition mgr, so there shouldn't be limitation to extending segment contexts later. In fact hiding the consistency logic away from table mgr would make it easier to iterate in future.
public void setSegmentContexts(List<SegmentContext> segmentContexts) {
private boolean _enableUpsertView; | ||
|
||
@JsonPropertyDescription("Whether to enable batch refresh mode to keep consistent view for upsert table") | ||
private boolean _enableUpsertViewBatchRefresh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo this should be enabled by default. At present, this is more of a bugfix rather than a feature to be kept behind a config. It can be kept behind a config initially to be on the safer side but then we should enable it by default in the long run.
This PR tries to support consistent table view for querying upsert tables. The consistency is at table partition level, which can contain many segments. Today, updates that involve two segments' bitmaps are not atomic, thus causing queries to see inconsistent table view, e.g. to return less than expected PKs.
The high level idea is either to synchronize the query threads and upsert threads (including consuming thread or HelixTaskExecutor threads) for queries to get a consistent set of segments' validDocIds bitmaps; or let upsert threads keep a copy of bitmaps and refresh it regularly. Both modes are added in this PR, as they can be wired up using one R/W lock.
Configs:
TODO: add tests