Skip to content

bugthesystem/cerebro

Repository files navigation

###Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental) Build Status Coverage Status

  • It takes a data and distributes the data equally to workers;
  • When StatsCollector's getMedian is called, sends SORT message to sort data on workers as first step,
  • After sort operation confirmed for all workers, master sends GET_MEDIAN message to get median for each worker and stores median of medians. This value is likely to be the median of our data set.
  • After this step the binary search approach is applied to find exact median.
    • As a first step of this approach, the median estimation which is median of medians which are gathered from workers, will be used as a mid value in binary search. By collecting the values which are upper and lower than the estimated median, I updated the estimated median in order to equalize the counts of upper and lower values.
    • This step works recursively and I converge to the exact median.
    • The recursive step is that the master sends GET_LOWER_UPPER_COUNTS message to get lower and upper counts regarding to estimated median.

Improvements

  • Could be improve design by decouple from ZeroMQ to provide extensibility (e.g MPI).
  • Dynamically manage worker size and data distribution to workers and continuous data processing (streaming)
  • Could be implement multi-core processing using cluster on worker nodes to improve performance

Known issues

  • It needs refactoring to support duplicate data handling
  • It needs design refactoring

##Usage

###Install Dependencies

On Windows

npm install

On Linux

sudo npm install

Commands

Start App

//Start Workers up to size that determined in config file (for example:3)
node main.js --role='WORKER'
node main.js --role='WORKER'
node main.js --role='WORKER'

//Start Master
node main.js --role='MASTER'

Test

npm test

Coverage

npm run test-cov

ESLint

npm run lint

About

Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published