Skip to content

saarthak2002/serverless-mr

Repository files navigation

Serverless MapReduce

Serverless MapReduce Architecture Diagram

MapReduce is a scalable computation model that allows batch processing of enormous datasets. Serverless computing is a cloud programming paradigm in which software can be deployed with resources allocated on-demand without the need to manage server infrastructure. We wanted to bring these two models together and create a Serverless MapReduce implementation that provides the advantages of both worlds, like cost-effectiveness and parallelism. We used AWS Lambda to invoke cloud functions that perform map or reduce tasks and AWS S3 as a distributed object store for inputs, outputs, and intermediate data.

Code files

The files in this repository are described below:

runner.py: Client side code used by the user to schedule map and reduce workers.

mapper_lambda_function.py: Code running on the Mapper AWS Lambda instances. The user defines a Map(K, V) → [(K, V)] function here.

reducer_lambda_handler.py: Code running on the Reducer AWS Lambda instances. The user defines a Reduce(K, Val-list) → Output function here.

process_output.py: Tester file to combine the output of the word count example job.

Releases

No releases published

Packages

No packages published

Languages