Skip to content

forhadsidhu/Wiki-Movies-Crawler

Repository files navigation

Wiki Movies Crawler

This project is based on BeautifulSoup BeautifulSoup,Pandas which crawl the data from Wikipedia Movies List and gives data as API response Currently this system take ~1 min to crawl whole movies data.

Table of Contents

Background

For Machine learing or deep learning model dataset is crucial point achieving good accuracy. So customized dataset is needed for almost every AI related task. Wikipedia is a great source of data. this project is like pipeline for data preparation machine learning model

Installation

  • Python >= 3.6
  • Dependencies: pip install -r requirements.txt

Usage

For Parsing

 python application.py -i parse 

For Getting API response

python application.py -i serve

For Particular movie detail,paste url in browser

 http://localhost:8000/movie/123/ 

For 10 movies list ,paste in browser

localhost:8000/movies/count=10/page_size=100/page_no=1

Here page_size means, i want to chunk the db in 100 page and page_no means 10 movies list from page 1