Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: Move to streaming writes in Rust consumer #5694

Merged
merged 9 commits into from Mar 27, 2024

Conversation

untitaker
Copy link
Member

Refactor the rust consumer so that from within Reduce, the HTTP request is
opened and rows are directly written into the socket.

This requires splitting up the BytesInsertBatch type as it can be in multiple
"states" depending on the stage of the pipeline. It contains a HttpBatch before
writing to clickhouse, and contains none before committing offsets.

The easiest way that I found is to first make it generic over its internal row
data R, then slowly fixing up all the instances where the generic was missing.

In a future PR I would like to see:

  • deleting InsertBatch entirely, and replacing all instances with BytesInsertBatch
  • renaming BytesInsertBatch to InsertBatch
  • splitting up types.rs into multiple files

@untitaker untitaker requested review from a team as code owners March 27, 2024 12:55
Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

Though this very much reminds me of one of my earliest comments about the whole arroyo architecture:

  • Defining things in reverse direction makes things very hard to read and understand
  • The whole architecture would possibly be very well expressed as a Stream using combinators that might just spawn futures onto the runtime, and a couple of bounded channels in between to act as concurrency limiters, etc.

@untitaker
Copy link
Member Author

untitaker commented Mar 27, 2024

The whole architecture would possibly be very well expressed as a Stream using combinators that might just spawn futures onto the runtime, and a couple of bounded channels in between to act as concurrency limiters, etc.

I have a prototype of an entire batching clickhouse consumer + streaming without arroyo, and I agree that arroyo's abstractions tend to get in the way here. However, I found rdkafka's streaming consumer to be basically useless as well. It's easier to un-design by removing some of the intermediate strategies and merging them into one. The biggest hurdle here was not arroyo's design per-se but the fact that we have a strategy for each individual thing we do, and then more strategies within each strategy, meaning a ton of type signatures to change.

Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.82%. Comparing base (6281a9a) to head (1bbda7f).

✅ All tests successful. No failed tests found ☺️

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5694      +/-   ##
==========================================
- Coverage   89.85%   89.82%   -0.04%     
==========================================
  Files         900      900              
  Lines       43747    43747              
  Branches      301      301              
==========================================
- Hits        39310    39295      -15     
- Misses       4395     4410      +15     
  Partials       42       42              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@untitaker untitaker merged commit 1406965 into master Mar 27, 2024
32 checks passed
@untitaker untitaker deleted the ref/clickhouse-streaming-write-2 branch March 27, 2024 13:27
@williamdes
Copy link

I am wondering if this might have create an even worse issue on getsentry/self-hosted#2876 (comment)
The logs are filled with skipping write of but ClickHouse stats show 14k/s inserts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants