Power-Grep

String matching algorithm(KMP) for very large datasets using Mapreduce. For implementation details and analysis refer to Presentation.pdf.

How to use

Refer to this link for Hadoop installation.
Clone this repository
Run following commands

cd Power-Grep/grep/src/main/java
<PATH TO HADOOP EXECUTABLE> com.sun.tools.javac.Main com/ds/Distributed.java
<PATH TO HADOOP EXECUTABLE> jar run.jar com.ds.Distributed <INPUT FILE CONTAINING STRING> <OUTPUT DIRECTORY> <SUBSTRING TO SEARCH>
cat out1/part*

OR simply edit run.sh and run it using bash run.sh. By default run.sh is configured to search for the string "india" in a 96MB wikipedia dump.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
grep		grep
.gitignore		.gitignore
Presentation.pdf		Presentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grep

grep

.gitignore

.gitignore

Presentation.pdf

Presentation.pdf

README.md

README.md

Repository files navigation

Power-Grep

How to use

About

Releases

Packages

Languages

h-sinha/Power-Grep

Folders and files

Latest commit

History

Repository files navigation

Power-Grep

How to use

About

Resources

Stars

Watchers

Forks

Languages