Flink-like Interval joins #426
-
Hi there, I'm currently trying to develop a bytewax dataflow to join 2 event streams and the scenario I'm dealing with requires exactly the semantics implemented in the Apache Flink's Interval join feature. Basically I need to join elements with a common key and where elements of stream B have timestamps that lie in a relative time interval to timestamps of elements in stream A. I've been looking at the different windows and join types in bytewax but I haven't figured out a way to implement the same behaviour. Is there any way to implement a similar mechanism with the current bytewax semantics? Are there any future plans to implement this interval join? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
We currently don't have anything built into Bytewax which enables interval joins, but we would like eventually prepackaged that type of windower. FWIW It should be possible to re-create the basics of that functionality currently, though, using a You can re-create the "side labeling" logic from our existing non-windowed |
Beta Was this translation helpful? Give feedback.
We currently don't have anything built into Bytewax which enables interval joins, but we would like eventually prepackaged that type of windower.
FWIW It should be possible to re-create the basics of that functionality currently, though, using a
unary
and a customUnaryLogic
subclass, but we don't have any of the primitives you'll need to arrange already in Python, so it will be relatively advanced project.You can re-create the "side labeling" logic from our existing non-windowed
join
operator, extract the current timestamps from the incoming values, keep a buffer of items, re-implement the watermarking concept to monitor incoming event timestamps to know when to evict no-longer-needed v…