Skip to content

A Ranked Information Retrieval System using Tf-Idf score with text summarization and query expansion.

Notifications You must be signed in to change notification settings

PranjalGupta2199/open-source-search

 
 

Repository files navigation

Open Source Search Engine

A text based search engine.
This search engine will crawl through README files of various Github projects on the internet and store them as documents for our retrieval system. Each document is then indexed and stored for future use. Based on the query by the user, relevant results are returned to the user based on some ranking (to be included later) of documents.

Getting Started

  • Python (preferably version 3.7)
  • Pip and Pipenv
  • Git

Installation

  • Clone the repo using this command in your preferred directory
git clone https://www.github.com/PranjalGupta2199/open-source-search.git
  • Change your working directory to the repo's codebase
cd open-source-search
  • Create and install the dependencies using Pipenv.
pipenv install
pipenv shell
  • Create a python terminal to install nltk dependencies
>>> import nltk
>>> nltk.download('punkt')
>>> nltk.download('stopwords')
>>> nltk.download('wordnet')
>>> exit()
  • Run the search.py file to make query.
python search.py

Team:

About

A Ranked Information Retrieval System using Tf-Idf score with text summarization and query expansion.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 59.9%
  • Python 40.1%