Skip to content

estebarb/datachan

Repository files navigation

Datachan: A Map Reduce like Framework

Build Status

Datachan is a library that allows Map Reduce paradigm like programming, chaining stages and taking care of launching workers, spilling to disk and other repetitive and bored tasks.

Datachan supports the following methods:

  • Source: Transmit data from a channel
  • Filter: Filters output of the previous stage
  • Map: Applies a function to each element in the previous stage
  • Reduce: Performs a reduction over previous stage data. Internally it handles spilling data to disk, so it can process data bigger than memory without issues.
  • Sort: Sorts previous stage output, using merge sort. Spills data to disk if required.
  • Combiner: Partial reduce of previous stage data. It streams to the next stage when the number of records reach a threshold.
  • Tee: Duplicates output of one stage into two new stages.
  • Merge: Merges several stages into a single one.
  • Sink: Transmit output to a channel

About

Process data using MapReduce like primitives

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages