ARKS

Content Based Search Add-on API Implemented for Hadoop Ecosystem 🔍

Introduction

Unstructured data like doc, pdf, ePub is lengthy to search and filter for desired information. We need to go through every file manually for finding information. It is very time consuming and frustrating. It doesnt need to be done this way if we can use high computing power to achieve much faster content retrieval.

We can use features of big data management system like Hadoop to organize unstructured data dynamically and return desired information. Hadoop provides features like Map Reduce, HDFS, HBase to filter data as per user input. Finally we can develop Hadoop Addon for content search and filtering on unstructured data. This addon will be able to provide APIs for different search results and able to download full file, part of files which are actually related to that topic. This Addon can be used by other industries and government authorities to use Hadoop for their data retrieval as per their requirement.

Current Systems Focus on Search by Title, Author, etc. Which Is time consuming and finding relevant content from those documents is tedious task. So there is a need of such a system which shall find the relevant con tents to the end user.

Here objective is to find the relevant content from the huge number of PDF files present on Hadoop Distributed File System (HDFS)

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
doc		doc
hadoop-installation		hadoop-installation
modules-samples		modules-samples
source		source
.project		.project
LICENSE.txt		LICENSE.txt
README.md		README.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

hadoop-installation

hadoop-installation

modules-samples

modules-samples

source

source

.project

.project

LICENSE.txt

LICENSE.txt

README.md

README.md

contributing.md

contributing.md

Repository files navigation

ARKS

Content Based Search Add-on API Implemented for Hadoop Ecosystem 🔍

Table of Contents

Introduction

Documentation

Architecture

Data Flow

Contribution

License

About

Releases

Packages

Contributors 2

Languages

License

arks-api/arks-api

Folders and files

Latest commit

History

Repository files navigation

ARKS

Content Based Search Add-on API Implemented for Hadoop Ecosystem 🔍

Table of Contents

Introduction

Documentation

Architecture

Data Flow

Contribution

License

About

Resources

License

Stars

Watchers

Forks

Languages