Skip to content

Latest commit

 

History

History
40 lines (31 loc) · 1.61 KB

features.md

File metadata and controls

40 lines (31 loc) · 1.61 KB

FEATURES

Scribengin exists to provide an easily configurable, general solution to the problem of moving big data in a reliable, highly available way. Scribengin is a cutting edge product built on top of already reliable technologies to provide a reliable, flexible, and scalable solution.

###Flexibility

  • Support for many different kinds of data sources/sinks
  • Sources and sinks can easily be expanded to add support for new data sources/sinks
  • Can move data from any source to any sink
  • Custom Processors - filter, transform, copy data within the pipeline
  • Out-the-box support for
    • HDFS
    • Kafka
    • S3

###Scalability

  • Built with YARN
  • Easily configure how many nodes to work on any given dataflow
  • Run multiple dataflows simultaneously
  • Chain dataflows together
  • Scalable, Expandable, and Highly Available

###Reliability

  • Data is guaranteed to not be lost, and duplication kept to a minimum
  • Automatically replace nodes in the cluster that unexpectedly fail
  • Dataflows will resume upon unexpected failure
  • Dataflows can be paused, stopped, resumed by an administrator
  • Reliable Kafka writer - improvements made to the default Kafka writer to support overcoming Kafka failures
  • Configuration is stored in a central, highly available place for all nodes in the cluster (based on Zookeeper)

###Visibility

  • Graphical, real time metrics monitoring using Kibana
  • Logs stored in an expandable, highly available repository (elasticsearch)

###Testability

  • Can be deployed to simulate a working cluster locally using Docker images for testing or development
  • Can be deployed in a single JVM for unit testing