Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Supports topic partition increase. #115

Merged
merged 8 commits into from Mar 19, 2021

Conversation

jiangmichaellll
Copy link
Contributor

This adds support for topic partition increase for both micro batch and continuous mode.

CachedPartitionCountReader is used to cache the number of topic partitions and fetches once every 10s, that should be well within the limit (admin read limit is 600/min). Spark doesn't need a consistent read for it to work as long as it's eventually consistent.

For micro batch, the CachedPartitionCountReader is embedded inside HeadOffsetReader, and inside the lifecycle of each batch, as soon as the topic partition is read, this will serve as the topic partition across the whole lifecycle of this batch. It's implicitly embedded in the endOffset.

For continuous, a topic partition number is set once a ContinuousReader, and once needsReconfiguration() detects an updated value, Spark will reconstruct a new ContinuousReader with the updated value.

@jiangmichaellll jiangmichaellll requested a review from a team as a code owner March 11, 2021 23:02
@product-auto-label product-auto-label bot added the api: pubsublite Issues related to the googleapis/java-pubsublite-spark API. label Mar 11, 2021
@google-cla google-cla bot added the api: pubsublite Issues related to the googleapis/java-pubsublite-spark API. label Mar 11, 2021
@codecov
Copy link

codecov bot commented Mar 11, 2021

Codecov Report

Merging #115 (e408b3e) into master (d75274e) will decrease coverage by 0.50%.
The diff coverage is 63.09%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #115      +/-   ##
============================================
- Coverage     59.46%   58.96%   -0.51%     
- Complexity       82       92      +10     
============================================
  Files            17       18       +1     
  Lines           528      580      +52     
  Branches         18       24       +6     
============================================
+ Hits            314      342      +28     
- Misses          210      232      +22     
- Partials          4        6       +2     
Impacted Files Coverage Δ Complexity Δ
...d/pubsublite/spark/CachedPartitionCountReader.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...m/google/cloud/pubsublite/spark/PslDataSource.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...oud/pubsublite/spark/LimitingHeadOffsetReader.java 70.58% <44.44%> (-8.73%) 5.00 <2.00> (+1.00) ⬇️
...le/cloud/pubsublite/spark/PslContinuousReader.java 58.53% <75.00%> (-0.44%) 8.00 <2.00> (+1.00) ⬇️
.../pubsublite/spark/MultiPartitionCommitterImpl.java 83.33% <88.23%> (+3.33%) 14.00 <11.00> (+8.00)
...le/cloud/pubsublite/spark/PslMicroBatchReader.java 86.27% <88.88%> (+0.27%) 11.00 <6.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d75274e...e408b3e. Read the comment docs.

Copy link

@palmere-google palmere-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this MJ! Mostly looks good, a few comments

@jiangmichaellll
Copy link
Contributor Author

Hi Tianzi, can you help review the clirr-ignored-differences.xml change? Thanks

Copy link
Collaborator

@anguillanneuf anguillanneuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not sure how to use it.

@jiangmichaellll jiangmichaellll merged commit 20f3366 into master Mar 19, 2021
@jiangmichaellll jiangmichaellll deleted the jiangmichael-up-size branch March 19, 2021 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsublite Issues related to the googleapis/java-pubsublite-spark API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants