Skip to content

String matching algorithm(KMP) for very large datasets using Mapreduce

Notifications You must be signed in to change notification settings

h-sinha/Power-Grep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Power-Grep

String matching algorithm(KMP) for very large datasets using Mapreduce. For implementation details and analysis refer to Presentation.pdf.

How to use

  • Refer to this link for Hadoop installation.
  • Clone this repository
  • Run following commands
cd Power-Grep/grep/src/main/java
<PATH TO HADOOP EXECUTABLE> com.sun.tools.javac.Main com/ds/Distributed.java
<PATH TO HADOOP EXECUTABLE> jar run.jar com.ds.Distributed <INPUT FILE CONTAINING STRING> <OUTPUT DIRECTORY> <SUBSTRING TO SEARCH>
cat out1/part*

OR simply edit run.sh and run it using bash run.sh. By default run.sh is configured to search for the string "india" in a 96MB wikipedia dump.

About

String matching algorithm(KMP) for very large datasets using Mapreduce

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published