New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: Move to streaming writes in Rust consumer #5694
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Though this very much reminds me of one of my earliest comments about the whole arroyo architecture:
- Defining things in reverse direction makes things very hard to read and understand
- The whole architecture would possibly be very well expressed as a
Stream
using combinators that might just spawn futures onto the runtime, and a couple of bounded channels in between to act as concurrency limiters, etc.
I have a prototype of an entire batching clickhouse consumer + streaming without arroyo, and I agree that arroyo's abstractions tend to get in the way here. However, I found rdkafka's streaming consumer to be basically useless as well. It's easier to un-design by removing some of the intermediate strategies and merging them into one. The biggest hurdle here was not arroyo's design per-se but the fact that we have a strategy for each individual thing we do, and then more strategies within each strategy, meaning a ton of type signatures to change. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found Additional details and impacted files@@ Coverage Diff @@
## master #5694 +/- ##
==========================================
- Coverage 89.85% 89.82% -0.04%
==========================================
Files 900 900
Lines 43747 43747
Branches 301 301
==========================================
- Hits 39310 39295 -15
- Misses 4395 4410 +15
Partials 42 42 ☔ View full report in Codecov by Sentry. |
I am wondering if this might have create an even worse issue on getsentry/self-hosted#2876 (comment) |
Refactor the rust consumer so that from within Reduce, the HTTP request is
opened and rows are directly written into the socket.
This requires splitting up the BytesInsertBatch type as it can be in multiple
"states" depending on the stage of the pipeline. It contains a HttpBatch before
writing to clickhouse, and contains none before committing offsets.
The easiest way that I found is to first make it generic over its internal row
data R, then slowly fixing up all the instances where the generic was missing.
In a future PR I would like to see:
InsertBatch
entirely, and replacing all instances withBytesInsertBatch
BytesInsertBatch
toInsertBatch
types.rs
into multiple files