Skip to content

tokawah/TripAdvisor-Crawling-Suite

Repository files navigation

TripAdvisor Crawling Suite

DISCLAIMER

THIS SOURCE CODE IS PROVIDED FOR GENERAL PYTHON PROGRAMMING LEARNING ONLY. YOUR USE OF ANY OF THE SOURCE CODE IS AT YOUR OWN RISK.

Update: June 2020

The current suite is no longer working as TripAdvisor has changed its website layout. However, most of the code used is still applicable to the crawling procedure of TripAdvisor. If you are interested in using this suite, please feel free to make necessary changes to the code. In another repository, a viable solution is provided to collect restaurant information from TripAdvisor.

Instructions

See TripAdvisor Crawling Suite User Guide for instructions to collect and extract hotel, review and reviewer data from TripAdvisor.

Features

  • Flexible crawling speed control
  • Resumable crawling process with data corruption detection
  • Easy access to a wide range of data fields
  • SQLite Database storage for collected data

TODOs

  • General surveys on collected data
  • Incremental reviews update
  • Photo crawling support