Skip to content
This repository has been archived by the owner on Mar 15, 2021. It is now read-only.

big-data-processor/big-data-processor

Repository files navigation

The Big Data Processor workbench

This repo is going to be deprecated since this workbench is renamed to BD-Processor.

Please change the repository to BD-Processor. Sorry for the inconvenience.

The Big Data Processor (BDP) is a generic web-based workflow management system with the capability of serving fully-customized web pages.

Our goal is to provide a lightweight, yet full-featured data workbench.

This workbench allows developers to write scripts, define tasks and workflows, construct portable packages, and freely design web interfaces for different data anlaysis workflows. Users who have set up the Big Data Processor can easily install the portable packages with near zero configuration. Although the BDP workbench already provides built-in web interfaces for users to manage projects, upload files, specify parameters, execute and monitor tasks/workflows, and records all the provenence of each task run, developers can provides better web pages specifically for their packages. With these developer-customized web pages, developers can help to guide users to upload files, specify parameters, execute and monitor tasks/workflows, and, the most important, visualize results interactively!

All of the above mentioned developer and user actions can be done via web browsers!! The BDP workbench is designed to be both user- and developer-friendly.

Documentation site

Please see the bdp-document page.

Installation

To setup the Big Data Processor, please follow the following steps. (Or follow the complete installation guide.)

1. Install NodeJS, Git, and Docker.

Please install the NodeJS, Git, and Docker.

2. Use the following commands to install the server.

git clone https://github.com/big-data-processor/big-data-processor.git
cd big-data-processor
npm install

3. Configure the Big Data Processor

Copy the file ./configs/server-config-template.yaml to ./configs/server-config.yaml and edit the ./configs/server-config.yaml file.

It is strongly recommended to configure the file for your own preferences. Please see here for detailed information.

4. Start the server

Go to the folder (the big-data-processor folder) that contains the big-data-processor.js file.

npm start # or node big-data-processor.js

5. Register the first account as the system root.

Use a web browser and open the link http://localhost:8080 (depends on your configurations). On the top-right of the landing page, you can find the Sign In link.

Components

  1. bdp-server: the server-side repo of BDP.
  2. bdp-client: the client-side repo of BDP.
  3. bdp-document: the document site hosted via github pages.
  4. bdp-page-api
  5. @big-data-processor/task-reader: The task reader parses the workflow playbook to get task/workflow specifications.
  6. @big-data-processor/task-adapter-base: This is the base class of a task adapter to extend and implment for different runtime environments.
  7. @big-data-processor/default-filters: This is the default filter function set for the task-reader to parse the workflow playbook. Additional filter functions can be developed to extend the capability of the workflow playbook.

Exemplary Task Adapters

Instead of providing our official built-in adapters for all kinds of runtime environments, we provide the extensible base class of the task adapter for developers. The following shows our example task adapters.

  1. @big-data-processor/task-adapter-local
  2. @big-data-processor/task-adapter-docker
  3. @big-data-processor/task-adapter-pbs
  4. @big-data-processor/task-adapter-ssh-docker
  5. ...

Roadmaps

(comming soon)

LICENSE

The BDP workbench

Licensed under the Apache-2.0 License (see the license file).

The Page API

Licensed under the MIT License (please see the license file).

The Base Class of the Task Adapter

Licensed under the MIT License (please see the license file).