Skip to content

DhavalTaunk08/Wiki-Search-Engine

Repository files navigation

Wiki-Search-Engine

I did this project in the course Information Retrieval and Extraction during my MS by Research @ IIITH. This project contains code for creating a search engine from scratch in python.

Libraries used are NLTK, PyStemmer, xml.Sax, re, math etc.

The search engine is implemented in 2 languages viz english and hindi.

Link to the data is given below:-

English dump, Hindi dump

If you want to create index for english language, you can try below command:-

python3 english_indexer.py path_to_xml_dump

And for hindi you can try:-

python3 hindi_indexer.py path_to_xml_dump

To run the search for english, you can try below command

python3 english_search.py --filename queries.txt --num_results 15

The fields --filename and --num_results are optional. By default --num_results is initilaized to 10. And if you don't pass --filename parameter, it will prompt you to enter query on command line.

For hindi, you can try below command.

python3 hindi_search.py --filename queries.txt --num_results 15

The queries file should contain queries on seperate lines.

About

Contains code to build a search engine by creating an index and perform search over Wikipedia data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages