Skip to content

nats-io/nats-spark-connector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NATS-SPARK-CONNECTOR

The following are all the current flavors of the nats-spark-connector, each flavor has its own directory structure.

At this point, the partitioned connector is not being actively developed anymore. It is now considered legacy and included only for educational purposes only. It will be removed entirely in the future.

  • balanced: In this scenario, NATS utilizes a single JetStream partition, using a durable and queued configuration to load balance messages across Spark threads, each thread contributing to single streaming micro-batch Dataframe at each Spark "pull". Current message offset is kept in the NATS queue for the purpose of fault tolerance (FT). Spark simply acknowledges each message during a micro-batch 'commit', and resends a message if an ack is not received within a pre-set timeframe.

This flavor is contained in the subdirectory 'balanced', which has its own README.md containing further information.

LEGACY: USE BALANCED INSTEAD

  • partitioned: In this scenario, NATS utilizes a number of JetStream partitions named <partition_prefix>-0, <partition_prefix>-1, ..., <partition_prefix>-N, each partition containing pre-configured associated subjects. Each partition sends messages to its own Spark affinity thread, and all partition threads contribute to a single streaming micro-batch Dataframe at each Spark "pull". Current offset for each partition is kept in Spark for the purpose of fault tolerance (FT). There also is an option to always start from each partition's first offset during a Spark restart, instead of the FT configuration.

This flavor is contained in the subdirectory 'partitioned', which has its own README.md containing further information.

  • filtered: This is a future scenario TBD.