Enhance Kinesis consumer #12806

Jackie-Jiang · 2024-04-07T23:36:04Z

Do not use a separate thread to fetch Kinesis records (this can fix the potential race condition)
Cache the shard iterator
Return the message batch immediately without combining multiple of them (timeout is ignored)
Change the default max records per fetch to 10,000 (Kinesis default)
Remove some unused dependencies

codecov-commenter · 2024-04-08T00:11:37Z

Codecov Report

Attention: Patch coverage is 72.72727% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 62.18%. Comparing base (59551e4) to head (4d71bf3).
Report is 438 commits behind head on master.

Files	Patch %	Lines
...e/pinot/plugin/stream/kinesis/KinesisConsumer.java	72.50%	8 Missing and 3 partials ⚠️
.../stream/kinesis/KinesisStreamMetadataProvider.java	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #12806      +/-   ##
============================================
+ Coverage     61.75%   62.18%   +0.43%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2515      +79     
  Lines        133233   137764    +4531     
  Branches      20636    21314     +678     
============================================
+ Hits          82274    85675    +3401     
- Misses        44911    45713     +802     
- Partials       6048     6376     +328

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`62.16% <72.72%> (+0.45%)`	⬆️
java-21	`62.06% <72.72%> (+0.43%)`	⬆️
skip-bytebuffers-false	`62.18% <72.72%> (+0.43%)`	⬆️
skip-bytebuffers-true	`62.04% <72.72%> (+34.31%)`	⬆️
temurin	`62.18% <72.72%> (+0.43%)`	⬆️
unittests	`62.18% <72.72%> (+0.43%)`	⬆️
unittests1	`46.74% <100.00%> (-0.15%)`	⬇️
unittests2	`27.80% <68.18%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

swaminathanmanish · 2024-04-08T21:34:37Z

...tion/pinot-kinesis/src/main/java/org/apache/pinot/plugin/stream/kinesis/KinesisConsumer.java

+
+    // NOTE: Kinesis enforces a limit of 5 getRecords request per second on each shard from AWS end, beyond which we
+    //       start getting ProvisionedThroughputExceededException. Rate limit the requests to avoid this.
+    long currentTimeMs = System.currentTimeMillis();


Do we need our own custom rate limiter here ? Does kinesis client provide options to do the same thing/handle this, instead of us having this logic.

I didn't find one from Kinesis client. Seems it will just throw LimitExceededException.
The rps is currently configured on Pinot side though, so I guess it makes sense to rate limit on the Pinot side.

swaminathanmanish · 2024-04-08T21:43:48Z

...tion/pinot-kinesis/src/main/java/org/apache/pinot/plugin/stream/kinesis/KinesisConsumer.java

    } else {
-      LOGGER.warn(message + ": " + throwable.getMessage());
+      // TODO: Revisit this logic to see if we always miss the first message when consuming from a new shard


Could you add more explanation to this ? Why would we miss the 1st message.

swaminathanmanish · 2024-04-08T21:46:13Z

...tion/pinot-kinesis/src/main/java/org/apache/pinot/plugin/stream/kinesis/KinesisConsumer.java

+    // Read records
+    GetRecordsRequest getRecordRequest =
+        GetRecordsRequest.builder().shardIterator(shardIterator).limit(_config.getNumMaxRecordsToFetch()).build();
+    GetRecordsResponse getRecordsResponse = _kinesisClient.getRecords(getRecordRequest);


This can be empty right, even if the stream has some data, given how kinesis works? We'll be return a response even if its empty

We need some test to verify the behavior. The consumer can handle empty message batch, but the consumption lag might be set to 0 because it thought there is no more message. Added a TODO to revisit

Good point. We can also a metric to track this when it happens.

swaminathanmanish

LGTM other than clarifications.

swaminathanmanish · 2024-04-09T00:14:24Z

...tion/pinot-kinesis/src/main/java/org/apache/pinot/plugin/stream/kinesis/KinesisConsumer.java

+    // Read records
+    GetRecordsRequest getRecordRequest =
+        GetRecordsRequest.builder().shardIterator(shardIterator).limit(_config.getNumMaxRecordsToFetch()).build();
+    GetRecordsResponse getRecordsResponse = _kinesisClient.getRecords(getRecordRequest);


Good point. We can also a metric to track this when it happens.

swaminathanmanish · 2024-04-09T00:21:43Z

...tion/pinot-kinesis/src/main/java/org/apache/pinot/plugin/stream/kinesis/KinesisConsumer.java

+   * Kinesis enforces a limit of 5 getRecords request per second on each shard from AWS end, beyond which we start
+   * getting {@link ProvisionedThroughputExceededException}. Rate limit the requests to avoid this.
+   */
+  private void rateLimitRequests() {


Thanks for creating a separate method. I guess this being a special kind of rate limiter that needs to block until we are ready to fetch again, we cannot leverage off the shelf ones like guava.

if kinesis has a limit, don't we need to adhere to that limit. So does getRpsLimit() need to be what Kinesis limit is ?

Kinesis limit is not very straight forward, so I guess we need to iterate on this to get the best settings.

swaminathanmanish · 2024-04-09T00:23:02Z

...tion/pinot-kinesis/src/main/java/org/apache/pinot/plugin/stream/kinesis/KinesisConsumer.java

+    long currentTimeMs = System.currentTimeMillis();
+    int currentTimeSeconds = (int) TimeUnit.MILLISECONDS.toSeconds(currentTimeMs);
+    if (currentTimeSeconds == _currentSecond) {
+      if (_numRequestsInCurrentSecond == _config.getRpsLimit()) {


This can be done later. A log.info or metric would help debug if rate limiting becomes an issue.

KKcorps · 2024-04-30T11:15:56Z

...tion/pinot-kinesis/src/main/java/org/apache/pinot/plugin/stream/kinesis/KinesisConsumer.java

+    // Get the shard iterator
+    String shardIterator;
+    if (startSequenceNumber.equals(_nextStartSequenceNumber)) {
+      shardIterator = _nextShardIterator;


will need to handle a case here when nextShardIterator has expired (since it has time limit of 5 minutes).

Jackie-Jiang added enhancement ingestion bugfix refactor labels Apr 7, 2024

Jackie-Jiang requested a review from KKcorps April 7, 2024 23:36

Jackie-Jiang added the real-time label Apr 7, 2024

Jackie-Jiang removed the enhancement label Apr 8, 2024

Jackie-Jiang force-pushed the enhance_kinesis_consumer branch 2 times, most recently from cbb2faf to d588079 Compare April 8, 2024 18:27

swaminathanmanish reviewed Apr 8, 2024

View reviewed changes

Jackie-Jiang force-pushed the enhance_kinesis_consumer branch from d588079 to 597e8ee Compare April 8, 2024 22:24

swaminathanmanish reviewed Apr 9, 2024

View reviewed changes

Jackie-Jiang force-pushed the enhance_kinesis_consumer branch from 597e8ee to 1f5d462 Compare April 9, 2024 22:05

KKcorps mentioned this pull request Apr 26, 2024

[WIP] Move offset validation logic to consumer classes" #13015

Draft

KKcorps reviewed Apr 30, 2024

View reviewed changes

Jackie-Jiang force-pushed the enhance_kinesis_consumer branch from 1f5d462 to bd15dac Compare May 3, 2024 23:42

Jackie-Jiang added 3 commits May 10, 2024 22:49

Enhance Kinesis consumer

443d88b

Simplify the handling

7338079

Address comments

4d71bf3

Jackie-Jiang force-pushed the enhance_kinesis_consumer branch from bd15dac to 4d71bf3 Compare May 11, 2024 05:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Kinesis consumer #12806

Enhance Kinesis consumer #12806

Jackie-Jiang commented Apr 7, 2024 •

edited

codecov-commenter commented Apr 8, 2024 •

edited

swaminathanmanish Apr 8, 2024

Jackie-Jiang Apr 8, 2024

swaminathanmanish Apr 8, 2024

Jackie-Jiang Apr 8, 2024

swaminathanmanish Apr 8, 2024

Jackie-Jiang Apr 8, 2024

swaminathanmanish Apr 9, 2024

swaminathanmanish left a comment

swaminathanmanish Apr 9, 2024

swaminathanmanish Apr 9, 2024

Jackie-Jiang Apr 9, 2024

swaminathanmanish Apr 9, 2024

KKcorps Apr 30, 2024

Enhance Kinesis consumer #12806

Are you sure you want to change the base?

Enhance Kinesis consumer #12806

Conversation

Jackie-Jiang commented Apr 7, 2024 • edited

codecov-commenter commented Apr 8, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swaminathanmanish left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jackie-Jiang commented Apr 7, 2024 •

edited

codecov-commenter commented Apr 8, 2024 •

edited