Skip to content

Latest commit

 

History

History
35 lines (27 loc) · 1.76 KB

README.md

File metadata and controls

35 lines (27 loc) · 1.76 KB

NATS-SPARK-CONNECTOR

The following are all the current flavors of the nats-spark-connector, each flavor has its own directory structure.

At this point, the partitioned connector is not being actively developed anymore. It is now considered legacy and included only for educational purposes only. It will be removed entirely in the future.

  • balanced: In this scenario, NATS utilizes a single JetStream partition, using a durable and queued configuration to load balance messages across Spark threads, each thread contributing to single streaming micro-batch Dataframe at each Spark "pull". Current message offset is kept in the NATS queue for the purpose of fault tolerance (FT). Spark simply acknowledges each message during a micro-batch 'commit', and resends a message if an ack is not received within a pre-set timeframe.

This flavor is contained in the subdirectory 'balanced', which has its own README.md containing further information.

LEGACY: USE BALANCED INSTEAD

  • partitioned: In this scenario, NATS utilizes a number of JetStream partitions named <partition_prefix>-0, <partition_prefix>-1, ..., <partition_prefix>-N, each partition containing pre-configured associated subjects. Each partition sends messages to its own Spark affinity thread, and all partition threads contribute to a single streaming micro-batch Dataframe at each Spark "pull". Current offset for each partition is kept in Spark for the purpose of fault tolerance (FT). There also is an option to always start from each partition's first offset during a Spark restart, instead of the FT configuration.

This flavor is contained in the subdirectory 'partitioned', which has its own README.md containing further information.

  • filtered: This is a future scenario TBD.