MIPr - MapReduce Image Processing framework for Hadoop

MIPr provides the ability to process images in Hadoop.

MIPr includes:

Writable Wrappers for images
InputFormat and OutputFormat for images
Several Jobs for image processing
OpenCV and OpenIMAJ support

Installation

Prerequisites

Java 7 (preferably Oracle)
Maven 3.2.5

Building

Clone repository with MIPr sources

git clone https://github.com/sozykin/mipr.git
Build package by using Apache Maven

To build full package with OpenIMAJ and OpenCV support run

mvn package

Notice that size of the package will be greater than separate build

To build separate packages run

mvn package -pl [desired_package] -am

Where desired_package is one of the followings:
- core_package
- includes_OpenCV (includes core with OpenCV support)
- includes_OpenIMAJ (includes core with OpenIMAJ support)
It will build jar file ...-jar-with-dependencies.jar and place it in the target folder.

Running

Copy image files to HDFS:

$ hadoop fs -copyFromLocal local_image_folder hdfs_image_folder
Run test MIPr Job which converts color images to grayscale:

2.0.If some error looks like "Mkdirs with META-INF/LICENSE" happened in MacOS, the solution is:

http://stackoverflow.com/questions/10522835/hadoop-java-io-ioexception-mkdirs-failed-to-create-some-path

2.1.modify jar

$ zip -d mipr-core-0.1-jar-with-dependencies.jar META-INF/LICENSE

$ zip -d mipr-core-0.1-jar-with-dependencies.jar LICENSE

2.2.Run test MIPr Job which converts color images to grayscale:

$ hadoop jar mipr-core-0.1-jar-with-dependencies.jar experiments.Img2Gray hdfs_image_folder hdfs_output_folder

Copy processed images back from HDFS to the local filesystem:

$ hadoop fs -copyToLocal hdfs_output_folder local_output_folder
Check that images were converted correctly.

Creating your own Hadoop job

To process images by your own way you need to create one class. For example, lets create job, which processes color images to grayscale by using OpenCV. For now, MIPr already has this class which placed in includes_OpenCV\src\main\java\experiments\Img2Gray_opencv.

Create public class inherited from Configured superclass and Tool interface.

public class Img2Gray_opencv extends Configured implements Tool{
    public static void main(String[] args) throws Exception {
        int res  = ToolRunner.run(new Img2Gray(), args);
        System.exit(res);
    }

Create run method inside your class. Fill it regarding library you will use.
```
public int run(String[] args) throws Exception {
    String input = args[0];
    String output = args[1];

    Job job = MiprMain.getOpenCVJobTemplate();
    job.setJarByClass(Img2Gray_opencv.class);
    job.setMapperClass(Img2Gray_opencvMapper.class);
    job.setInputFormatClass(MatImageInputFormat.class);
    job.setOutputFormatClass(MatImageOutputFormat.class);
    Path outputPath = new Path(output);
    FileInputFormat.setInputPaths(job, input);
    FileOutputFormat.setOutputPath(job, outputPath);
    job.setOutputKeyClass(NullWritable.class);
    job.setOutputValueClass(MatImageWritable.class);

    return job.waitForCompletion(true) ? 0 : 1;
}
```
Most important configurations are:
- job.setInputFormatClass([InputFormat].class)
  
  Where [InputFormat] one of the following:
  - Java 2D
    
    BufferedImageInputFormat
  - OpenIMAJ
    
    MBFImageInputFormat
  - OpenCV
    
    MatImageInputFormat
    
    CombineMatImageInputFormat
- job.setOutputFormatClass([OutputFormat].class)
  
  Where [OutputFormat] is similar to [InputFormat]
- job.setMapperClass([MapperClass].class)
  
  Where [MapperClass] is your implemented Mapper class which contains map-method.
- job.setOutputKeyClass(NullWritable.class)
  
  In most cases of image processing Key class doesn't necessary. You can leave it by using special NullWritable hadoop-class which contains nothing.
- job.setOutputValueClass([Value].class)
  
  [Value] depends on which library you are going to use.
  - Java 2D
    
    BufferedImageWritable
  - OpenIMAJ
    
    MBFImageWritable
  - OpenCV
    
    MatImageWritable

Create Mapper class. Your class should extend OpenCVMapper superclass to make available usage of OpenCV library in parallel mode. Method map contains image processing algorithm.

    public static class Img2Gray_opencvMapper extends OpenCVMapper<NullWritable, MatImageWritable, NullWritable, MatImageWritable>{
        protected void map(NullWritable key, MatImageWritable value, Context context) throws IOException, InterruptedException {
            Mat image = value.getImage();
            Mat result = new Mat(image.height(), image.width(), CvType.CV_8UC3);

            if (image.type() == CvType.CV_8UC3) {
                Imgproc.cvtColor(image, result, Imgproc.COLOR_RGB2GRAY);
            } else result = image;

            context.write(NullWritable.get(), new MatImageWritable(result, value.getFileName(), value.getFormat()));
        }
    }

Return to running section and build package including your own hadoop-job.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
core_package		core_package
includes_OpenCV		includes_OpenCV
includes_OpenIMAJ		includes_OpenIMAJ
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core_package

core_package

includes_OpenCV

includes_OpenCV

includes_OpenIMAJ

includes_OpenIMAJ

.DS_Store

.DS_Store

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

Repository files navigation

MIPr - MapReduce Image Processing framework for Hadoop

Installation

Prerequisites

Building

Running

Creating your own Hadoop job

About

Releases

Packages

Languages

yangboz/mipr

Folders and files

Latest commit

History

Repository files navigation

MIPr - MapReduce Image Processing framework for Hadoop

Installation

Prerequisites

Building

Running

Creating your own Hadoop job

About

Topics

Resources

Stars

Watchers

Forks

Languages