Skip to content
This repository has been archived by the owner on May 19, 2020. It is now read-only.

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. ๐Ÿ“œ

License

Notifications You must be signed in to change notification settings

0x01h/hepsiburada-review-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

41 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Made with Python

Hepsiburada Review Scraper HB Review Scraper Build Status GPLv3

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. Nearly 30M reviews with category and product links can be crawled and used for text classification, sentiment analysis, text mining, NLP models etc. Supported by multithreading, written in Python.

Prerequisites

$ pip3 install -r requirements.txt

Installation

$ git clone https://github.com/0x01h/hepsiburada-review-scraper.git
$ cd hepsiburada-review-scraper
$ python3 hepsiburada.py

Usage

Program provides an human-friendly interactive shell for users.

Features

  • Shutdown computer after finishing: Optional choice for deep and long scrapings.
  • Threads: Try to give a proper number. (Recommended value is 64.)
  • Timeout: Giving a large number could result in long waiting times, small numbers could lead connection failures. (Recommended time range is 15-30 seconds.)
  • Pagination Depth: Maximum number of paginated review pages for each product.

You can track your progress via progress bars. categories.txt, products.txt, hepsiburada.txt will be saving to your current directory.

Do not scrape aggressively! Otherwise, you will be caught by captcha challenge!

Say thanks.

For educational purposes only.