Skip to content

A Java library for creating structured parallel applications that can run on both shared memory and distributed memory systems.

Notifications You must be signed in to change notification settings

gursher16/Multilevel_Parallel_Patterns

Repository files navigation



Multilevel Parallel Patterns

Mulitlevel Parallel Patterns (MPP for short) is a Java library for creating structured parallel applications that can be executed either on shared memory or distributed memory systems.

The intent to enable parallelism at two levels:

  • Distributed Memory level: Using the single program, multiple data programming model (SPMD), data can be scattered over a distributed system, with each node (a shared memory system) running an instance of the program. The scattered data is then processed in parallel by the nodes, and the output of each node is combined, giving us the final result of the parallel application. The distribution of data is achieved using message passing via the MPJExpress library. We have created a wrapper class (MPPDistLib) in order to provide intuitive access to common MPJExpress methods and fields.
  • Shared Memory level: At the shared memory level, the MPP library aims to provide a top-down approach to structure parallel programs through the use of algorithmic skeletons, which are high-level programming models allowing for abstract descriptions of parallel programs. Each skeleton represents a generic computation pattern: users can simply express the desired parallel computation using these skeletons, the library will take care of the nitty-gritty execution details.
    Internally, the execution environment uses Java threads to execute computations as per the skeleton nesting.

This library aims to greatly reduce the complexity involved in creating complex parallel applications by allowing users to focus on the structure of the parallel application rather than low-level implementation details.

This is a work in progress, refinements and additions are being made to make the library easier to use.

Prerequisites

JDK 1.8 or above and the latest version of maven are required to use the shared-memory level skeleton library to build parallel applications.

For running applications at the distributed level, the latest version of MPJExpress library needs to be installed in all participating nodes. The library and its documentation, including installation instructions, can be found here: http://mpj-express.org/

Note: The MPJExpress library can also be installed on a singular, multi-core, system and be made to run in a Multicore mode, which is useful for building and testing distributed applications when access to a distributed system is limited or unavailable; the application can then be deployed over a distributed system, once available, with minimal effort.

Usage

The following workflow is typically used to create parallel applications using algorithmic skeletons:

  • Express the parallel structure of the application by using proper skeleton nesting.
  • Provide application specific sequential portions of the code as skeleton parameters.
  • Compile/Link the resulting code to obtain the running parallel program

For example, in the sample edge detection program provided with the library, there are three sequential operations that are performed in succession on an image. This can be represented by a pipeline skeleton structure:

Each stage in the pipeline can be performed in parallel for a given set of images. We provide an interface called Operation that needs to be implemented to provide sequential portions of the code that are intended to be parallelized.

Following is a snippet of sequential code for stage 1 of the edge detection program:

public class ImageConvolutionStage1 implements Operation<BufferedImage, double[][][]> {
	@Override
	public double[][][] execute(BufferedImage inputParam) throws Exception {		
		// This method contains the sequential code that is intended to run in parallel with other sequential code
		int width = inputParam.getWidth();
		int height = inputParam.getHeight();
		...	

Each user-defined sequential operation needs to be nested in the SequentialOpSkeleton which is one of the three Skeleton implementations currently supported by the library.

Operation o1 = new ImageConvolutionStage1(); \\ the first sequential operation
Operation o2 = new ImageConvolutionStage2(); \\ the second sequential operation
Skeleton stage1 = new SequentialOpSkeleton<BufferedImage, double[][][]>(o1, double[][][].class); \\ creating a skeleton for the firts sequential operation
Skeleton stage2 = new SequentialOpSkeleton<double[][][], double[][]>(o2, double[][].class); \\ creating a skeleton for the second sequential operation
...

Now that we have three sequential skeletons ready, we can create our final PipelineSkeleton, which represents the outermost skeleton nesting, to which we supply the input (list of images)

Skeleton[] stages = { stage1, stage2, stage3 };
Skeleton pipeLine = new PipelineSkeleton(stages, ArrayList.class); // The outermost skeleton nesting
Future<List<File>> outputFuture = pipeLine.submitData(imageList); // Submit input data to the pipeline skeleton

The result of the parallel computation is represented by a Future object.

Future<List<File>> outputFuture = pipeLine.submitData(imageList); // Future object containing the result of the parallel computation

The three stages of the pipeline can now be computed in parallel over three threads; the library takes care of thread management.

Examples

Synthetic Benchmarks

A few synthetic benchmarks for the shared memory skeletons are included. They demonstrate the performance of different types of skeleton nesting when compared with an equivalent sequential version. There are certain parameters that control the degree of granularity of the parallel computation and can be adjusted to produce different results based on CPU specifications of the system. Ideally, the performance of certain benchmarks should scale with the number of available CPU cores. For example, the execution time of the Task Farm benchmark should decrease with an increase in the number of available CPU cores, provided the inputSize argument is sufficiently large, the numberOfCores and numberOfWorkers is equal to the number of CPU cores, and the chunkSize is sufficiently configured. It is encouraged to play with around these arguments till optimal performance (for Task Farm, sequential execution time divided by parallel execution time should be close to the number of available CPU cores) is achieved.

These can executed by the following:

  • Navigate to the root of the project folder (one containing the pom.xml file)

  • Compile source code:

    mvn compile
    

The different benchmarks can be run via the following commands. Replace the arguments enclosed in [ ] as per the following:
[numberOfCores] with the number of processors to run the benchmark with; should match the number of CPU cores available; can be increased or decreased to fine tune performance.
[inputSize] with the number of tasks to be executed; should be increased with the increase in number of cores, preferably, should be multiple of 10.
[chunkSize] with the number of tasks per worker, preferably, should be multiple of 10 and less than inputSize; can be increased or decreased based on the number of workers and available CPU cores.
[ numberOfWorkers] with the total number of workers in the farm; can be increased or decreased based on the number of available CPU cores.

  • Three stage pipeline:
     java -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.skeletons.ThreeStagePipeline [numberOfCores][inputSize]
    
  • Task farm:
    java -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.skeletons.SimpleTaskFarm [numberOfCores][inputSize][chunkSize][numberOfWorkers]
    
  • Nesting pipeline in farm:
     java -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.skeletons.NestingPipelineInFarm [numberOfCores][inputSize][numberOfWorkers]
    
  • Nesting farm in pipeline:
     java -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.skeletons.NestingFarmInPipeline [numberOfCores][inputSize][numberOfWorkers]
    

Edge Detection Program

A simple edge detection program is also included in the library as an example. Images can be placed in the
examples/src/main/resources/inputImages/test folder. After executing the program, the resultant images will be stored in the examples/src/main/resources/outputImages folder.

The following commands can be used to run the program in different skeleton configurations. Replace the arguments enclosed in [ ] as per the following:
[numberOfCores] with the number of processors to run the program with; should match the number of CPU cores available; can be increased or decreased to fine tune performance.
[ numberOfWorkers] with the total number of workers in the farm; can be increased or decreased based on the number of available CPU cores.

  • Pipeline:
    java -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.skeletons.ImageConvolutionPipeline [numberOfCores] 3
    
  • Farm in pipeline:
    java -cp ".\lib\mpj-0.44.jar;.\target\classes" 	uk.ac.standrews.cs.mpp.examples.skeletons.ImgConvolNestingFarmInPipeline [numberOfCores] 3 [numberOfWorkers]
    
  • Pipeline in farm:
    java -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.skeletons.ImgConvolNestingPipelineInFarm [numberOfCores] 3 [numberOfWorkers]
    

The performance of these parallel versions can be measured up against a sequential version of the same program which can be run by the following command

java -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.skeletons.ImageConvolutionSequential 3

Running distributed version of the program

MPJExpress needs to be installed and configured in order to run the distributed version of the synthetic benchmark and edge detection program. They can be run in the multi-core mode of MPJExpress through the following commands. Replace the arguments enclosed in [ ] as per the following:
[numberOfSystems] with the number of nodes in the distributed system (in the multi-core mode of MPJExpress, this refers to the number of cores)
[numberOfCores] with the number of processors at the shared memory level
[numberOfWorkers] with the total number of workers in the farm

  • Edge Detection \w Pipeline Skeleton:

    mpjrun -np [numberOfSystems] -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.mpj.MPPDistImgConvolution [numberOfCores] 3
    
  • Edge Detection \w Nested Skeletons:

    mpjrun -np [numberOfSystems] -cp ".\lib\mpj-0.44.jar;.\target\classes" uk.ac.standrews.cs.mpp.examples.mpj.MPPDistImgConvolutionPipeInFarm [numberOfCores] 3 [numberOfWorkers]
    

Sample Outputs

Following is a sample output of the edge detection application:

WIP - Sample outputs to be added soon!

Performance

Following are performance figures of the benchmarks and the edge detection application when executed on a research server having two Intel® Xeon® CPUs clocked at 2.60 Ghz with 14 cores each, for a total of 28 cores.

Edge detection program

The maximum speedup for the distributed edge detection program for a total of 488 images (resolutions ranging between 1080p and 600p) is 6.7.
The sequential version runs in 97 seconds while the parallel version, using the Pipe(Seq, Farm(Seq), Seq) skeleton
composition with 8 nodes and 4 farm workers per node, runs in 19.3 seconds.

WIP - Supporting graphs and figures to be added soon!

Written with StackEdit.

About

A Java library for creating structured parallel applications that can run on both shared memory and distributed memory systems.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages