Skip to content

AndrewKuzmin/spark-structured-streaming-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Structured Streaming Examples

Spark structured streaming examples with using of version 3.4.0

Support matrix for joins in streaming queries

Left Input Right Input Join Type Example
Static Static All types TBD
Stream Static Inner TBD
Left Outer TBD
Right Outer Not supported
Full Outer Not supported
Left Semi TBD
Static Stream Inner TBD
Left Outer Not supported
Right Outer TBD
Full Outer Not supported
Left Semi Not supported
Stream Stream Inner ..streamstream.InnerJoinApp*, ..streamstream.InnerJoinWithWatermarkingApp*
Left Outer ..streamstream.LeftOuterJoinWithWatermarkingApp*
Right Outer TBD
Full Outer TBD
Left Semi TBD
*Base package: com.phylosoft.spark.learning.sql.streaming.operations.join

Use cases of processing modes (Triggers modes)

  1. Unspecified (default);
  2. Fixed interval micro-batches;
  3. One-time micro-batch (deprecated);
  4. Available-now micro-batch;
  5. Continuous with fixed checkpoint interval (experimental);

Optimizations

  1. Tungsten execution engine;
  2. Catalyst query optimizer;
  3. Cost-based optimizer;

Structured Sessionization

  1. KeyValueGroupedDataset.mapGroupsWithState;
  2. KeyValueGroupedDataset.flatMapGroupsWithState;

Links

  1. Structured Streaming Programming Guide;
  2. Stream-Stream Joins using Structured Streaming (Scala);
  3. Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark;
  4. Easy, Scalable, Fault-Tolerant Stream Processing with Structured Streaming in Apache Spark - continues;
  5. Deep Dive into Stateful Stream Processing in Structured Streaming;
  6. Monitoring Structured Streaming Applications Using Web UI;
  7. The Internals of Spark Structured Streaming;