Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] add locking logic to get consistent table view for upsert tables #12976

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

klsince
Copy link
Contributor

@klsince klsince commented Apr 19, 2024

This PR tries to support consistent table view for querying upsert tables. The consistency is at table partition level, which can contain many segments. Today, updates that involve two segments' bitmaps are not atomic, thus causing queries to see inconsistent table view, e.g. to return less than expected PKs.

The high level idea is either to synchronize the query threads and upsert threads (including consuming thread or HelixTaskExecutor threads) for queries to get a consistent set of segments' validDocIds bitmaps; or let upsert threads keep a copy of bitmaps and refresh it regularly. Both modes are added in this PR, as they can be wired up using one R/W lock.

Configs:

  1. by default, the feature is disabled.
  2. if _enableUpsertView=true, the upsert threads take the WLock when the upsert involves two segments' bitmaps; and the query threads take the RLock when getting bitmaps for all its selected segments. This is the sync mode, for best data freshness, but queries may block data ingestion. Mainly for low qps or low ingestion cases.
  3. if _enableUpsertViewBatchRefresh=true, the query threads don't need to take lock when getting bitmaps. The query threads access a copy of bitmaps that are kept updated by upsert thread periodically. In this batch mode, if data freshness is concerned, the query can specify a query option as called upsertViewFreshnessMs (e.g. 3s) to set its tolerance on data freshness, and the query thread can fresh the bitmap copies immediately if they are not fresh enough. For high qps and high ingestion cases.

TODO: add tests

@klsince klsince force-pushed the consistent_view_segment_context branch from 5e6acce to de05748 Compare April 19, 2024 21:28
@codecov-commenter
Copy link

codecov-commenter commented Apr 19, 2024

Codecov Report

Attention: Patch coverage is 43.65079% with 71 lines in your changes are missing coverage. Please review.

Project coverage is 62.14%. Comparing base (59551e4) to head (de05748).
Report is 345 commits behind head on master.

Files Patch % Lines
...cal/upsert/BasePartitionUpsertMetadataManager.java 20.73% 55 Missing and 10 partials ⚠️
...e/pinot/common/utils/config/QueryOptionsUtils.java 0.00% 2 Missing ⚠️
...ata/manager/realtime/RealtimeTableDataManager.java 0.00% 1 Missing ⚠️
...t/ConcurrentMapPartitionUpsertMetadataManager.java 88.88% 1 Missing ⚠️
...psert/ConcurrentMapTableUpsertMetadataManager.java 0.00% 1 Missing ⚠️
...ache/pinot/segment/local/upsert/UpsertContext.java 92.30% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12976      +/-   ##
============================================
+ Coverage     61.75%   62.14%   +0.39%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2502      +66     
  Lines        133233   136615    +3382     
  Branches      20636    21147     +511     
============================================
+ Hits          82274    84902    +2628     
- Misses        44911    45429     +518     
- Partials       6048     6284     +236     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.10% <43.65%> (+0.39%) ⬆️
java-21 62.03% <43.65%> (+0.40%) ⬆️
skip-bytebuffers-false 62.13% <43.65%> (+0.38%) ⬆️
skip-bytebuffers-true 61.99% <43.65%> (+34.26%) ⬆️
temurin 62.14% <43.65%> (+0.39%) ⬆️
unittests 62.14% <43.65%> (+0.39%) ⬆️
unittests1 46.66% <7.14%> (-0.23%) ⬇️
unittests2 27.95% <36.50%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @klsince for the fix as discussed in #12667.

return _lastUpsertViewRefreshTimeMs + _upsertViewRefreshIntervalMs > nowMs;
}

private void doBatchRefreshUpsertView(long upsertViewFreshnessMs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this at partition metadata manager level can we move this to table data manager level? We can maintain a map of segment -> segmentContext and do a refresh there. This way we can easily extend the scope of segment context in future as well.
I was trying on something like this -- https://github.com/apache/pinot/compare/master...tibrewalpratik17:pinot:fix_upsert_inconsistency?expand=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the code changes, and I read it through. The refreshing logic is similar with the batch mode here although done at table mgr level. But your changes didn't allow query threads to force update the view to ensure data freshness.

I think it's more intuitive to handle data consistency in table partition mgr, as table partition is the unit for managing upsert states in upsert table.

Today, the table partition mgr enriches the segment contexts, which are provided by table partition mgr, so there shouldn't be limitation to extending segment contexts later. In fact hiding the consistency logic away from table mgr would make it easier to iterate in future.

public void setSegmentContexts(List<SegmentContext> segmentContexts) {

private boolean _enableUpsertView;

@JsonPropertyDescription("Whether to enable batch refresh mode to keep consistent view for upsert table")
private boolean _enableUpsertViewBatchRefresh;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo this should be enabled by default. At present, this is more of a bugfix rather than a feature to be kept behind a config. It can be kept behind a config initially to be on the safer side but then we should enable it by default in the long run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants