Skip to content

lilwon/ICS-Search-Engine

Repository files navigation


ICS Search Engine

A simple search engine of UCI's ICS web pages.

Table of Contents

  1. About The Project
  2. Getting Started
  3. Contributors

About The Project

This search engine was a group assignment for a class at UCI Winter 2021. We were given a large corpus (roughtly 56000 web pages).

Features

  • Uses an inverted index containing tf-idf scores
  • No databases! The inverted index is not loaded in memory. It is kept in a text file.
  • Less than 300ms search retrieval for queries

Built With

Getting Started

To create a local copy and run the program, follow these steps on a Windows OS.

Prerequisite

You would first need to obtain the course's corpus file and extract it. There should be less than 56000 files after extraction, totaling about 3GB of disk space.

You may also need to install a few libraries if you have never used them before. Click the links under Built With and follow the instructions on how to install the libraries.

Installation

  1. Clone the repo
    git clone https://github.com/lilwon/ICS_Search_Engine.git
  2. Run the indexer on PowerShell
    py -3 inverted_index.py
  3. Wait for the indexer to finish creating the inverted index. Takes about 20 minutes.
  4. Run the search retrieval on Powershell
    py -3 search_component.py
  5. (Optional) You can also use the search retrieval on a Web Browser
    py -3 webgui.py
  6. (Optional) When running the webui.py file, open a browser and paste the following url to your adderess bar: http://127.0.0.1:5000/

Contributors

See the contributors section on the side of this Github page.

About

A search engine built from scratch using UCI's ICS web pages.

Topics

Resources

Stars

Watchers

Forks