feat: Supports topic partition increase. #115

jiangmichaellll · 2021-03-11T23:02:20Z

This adds support for topic partition increase for both micro batch and continuous mode.

CachedPartitionCountReader is used to cache the number of topic partitions and fetches once every 10s, that should be well within the limit (admin read limit is 600/min). Spark doesn't need a consistent read for it to work as long as it's eventually consistent.

For micro batch, the CachedPartitionCountReader is embedded inside HeadOffsetReader, and inside the lifecycle of each batch, as soon as the topic partition is read, this will serve as the topic partition across the whole lifecycle of this batch. It's implicitly embedded in the endOffset.

For continuous, a topic partition number is set once a ContinuousReader, and once needsReconfiguration() detects an updated value, Spark will reconstruct a new ContinuousReader with the updated value.

codecov · 2021-03-11T23:08:57Z

Codecov Report

Merging #115 (e408b3e) into master (d75274e) will decrease coverage by 0.50%.
The diff coverage is 63.09%.

@@             Coverage Diff              @@
##             master     #115      +/-   ##
============================================
- Coverage     59.46%   58.96%   -0.51%     
- Complexity       82       92      +10     
============================================
  Files            17       18       +1     
  Lines           528      580      +52     
  Branches         18       24       +6     
============================================
+ Hits            314      342      +28     
- Misses          210      232      +22     
- Partials          4        6       +2

Impacted Files	Coverage Δ	Complexity Δ
...d/pubsublite/spark/CachedPartitionCountReader.java	`0.00% <0.00%> (ø)`	`0.00 <0.00> (?)`
...m/google/cloud/pubsublite/spark/PslDataSource.java	`0.00% <0.00%> (ø)`	`0.00 <0.00> (ø)`
...oud/pubsublite/spark/LimitingHeadOffsetReader.java	`70.58% <44.44%> (-8.73%)`	`5.00 <2.00> (+1.00)`	⬇️
...le/cloud/pubsublite/spark/PslContinuousReader.java	`58.53% <75.00%> (-0.44%)`	`8.00 <2.00> (+1.00)`	⬇️
.../pubsublite/spark/MultiPartitionCommitterImpl.java	`83.33% <88.23%> (+3.33%)`	`14.00 <11.00> (+8.00)`
...le/cloud/pubsublite/spark/PslMicroBatchReader.java	`86.27% <88.88%> (+0.27%)`	`11.00 <6.00> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d75274e...e408b3e. Read the comment docs.

palmere-google

Thanks for this MJ! Mostly looks good, a few comments

src/main/java/com/google/cloud/pubsublite/spark/CachedPartitionCountReader.java

src/main/java/com/google/cloud/pubsublite/spark/LimitingHeadOffsetReader.java

src/main/java/com/google/cloud/pubsublite/spark/MultiPartitionCommitterImpl.java

src/main/java/com/google/cloud/pubsublite/spark/PslDataSource.java

src/main/java/com/google/cloud/pubsublite/spark/PslMicroBatchReader.java

src/main/java/com/google/cloud/pubsublite/spark/MultiPartitionCommitterImpl.java

jiangmichaellll · 2021-03-18T23:58:21Z

Hi Tianzi, can you help review the clirr-ignored-differences.xml change? Thanks

anguillanneuf

LGTM, but I'm not sure how to use it.

jiangmichaellll added 3 commits March 9, 2021 20:16

update

4a11f51

update

e031b8f

update

9834992

jiangmichaellll requested a review from a team as a code owner March 11, 2021 23:02

product-auto-label bot added the api: pubsublite Issues related to the googleapis/java-pubsublite-spark API. label Mar 11, 2021

google-cla bot added the api: pubsublite Issues related to the googleapis/java-pubsublite-spark API. label Mar 11, 2021

jiangmichaellll requested a review from palmere-google March 11, 2021 23:02

palmere-google suggested changes Mar 12, 2021

View reviewed changes

jiangmichaellll added 2 commits March 15, 2021 18:52

update

2ac6fae

update

539970f

jiangmichaellll requested a review from palmere-google March 15, 2021 22:59

update

72e33fa

palmere-google approved these changes Mar 17, 2021

View reviewed changes

jiangmichaellll added 2 commits March 17, 2021 17:11

update

0eb19b3

udpate

e408b3e

jiangmichaellll requested a review from a team as a code owner March 18, 2021 23:57

jiangmichaellll requested a review from anguillanneuf March 18, 2021 23:57

anguillanneuf approved these changes Mar 19, 2021

View reviewed changes

jiangmichaellll merged commit 20f3366 into master Mar 19, 2021

jiangmichaellll deleted the jiangmichael-up-size branch March 19, 2021 21:40

jiangmichaellll mentioned this pull request Mar 22, 2021

Support topic partition increase. #110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Supports topic partition increase. #115

feat: Supports topic partition increase. #115

jiangmichaellll commented Mar 11, 2021

codecov bot commented Mar 11, 2021 •

edited

palmere-google left a comment

jiangmichaellll commented Mar 18, 2021

anguillanneuf left a comment

feat: Supports topic partition increase. #115

feat: Supports topic partition increase. #115

Conversation

jiangmichaellll commented Mar 11, 2021

codecov bot commented Mar 11, 2021 • edited

Codecov Report

palmere-google left a comment

Choose a reason for hiding this comment

jiangmichaellll commented Mar 18, 2021

anguillanneuf left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 11, 2021 •

edited