Skip to content

vlad1kudelko/scraping-01

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

Crawler of cooking sites

The customer needed to write a crawler to collect ratings and reviews from two types of cooking pages.

🚀 Highlights on this project

  • Cloudflare bypass protection has been implemented;
  • The project uses the Selenium library as well as Undetected_chromedriver;
  • The project is dockerized;
  • Two types of site layout are crawled;
  • Project execution time is 2 days;
  • From each page is parsed: the total rating, the number of comments. The author's name, the date of the comment and the text of the comment are parsed from each comment;
  • The result is collected in jsonl format (modified json format for line-by-line writing).

🚀 Run

To run it is enough to run one command.

docker-compose up --build

The links used for parsing are located in the file main.py.

🚀 Screenshots

screenshot 1

screenshot 2

screenshot 3