Skip to content

hjsblogger/web-crawling-with-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Crawling with Python

cover-image
Image generated using Grok

In this 'Web Crawling with Python' repo, we have covered the following scenario:

Unique links from LambdaTest E-commerce Playground are crawled using Beautiful Soup. Content (i.e., product meta-data) from the crawled content is than scraped with Beautiful Soup. I have a detailed blog & repo on Web Scraping with Python, details below:

Pre-requisites for test execution

Step 1

Create a virtual environment by triggering the virtualenv venv command on the terminal

virtualenv venv
VirtualEnvironment

Step 2

Navigate the newly created virtual environment by triggering the source venv/bin/activate command on the terminal

source venv/bin/activate

Follow steps(3) and (4) for performing web scraping on LambdaTest Cloud Grid:

Step 3

Run the make install command on the terminal to install the desired packages (or dependencies) - Beautiful Soup,urrlib3, etc.

make install
make-install

With this, all the dependencies and environment variables are set. We are all set for web crawling with Beautiful Soup (bs4).

Web Crawling using Beautiful Soup

Follow the below mentioned steps to for crawling the LambdaTest E-commerce Playground

Step 1

Trigger the command make clean to clean the remove pycache folder(s) and .pyc files

cover-image

Step 2

Trigger the make crawl-ecommerce-playground command on the terminal to crawl the LambdaTest E-Commerce Playground

web-crawling-1 web-crawling-2

As seen above, the content from LambdaTest E-commerce playground was crawled successfully! Fifty five unique product links are now available to be scraped in the exported JSON file (i.e., ecommerce_crawled_urls.json)

Step 3

Now that we have the crawled information, trigger the make scrap-ecommerce-playground command on the terminal to scrap the product information (i.e., product name, product price, product availability, etc.) from the exported JSON file.

web-scraping-1 web-scraping-2

Also, all the 55 links on are scraped without any issues!

Have feedback or need assistance?

Feel free to fork the repo and contribute to make it better! Email to himanshu[dot]sheth[at]gmail[dot]com for any queries or ping me on the following social media sites:

LinkedIn: @hjsblogger
Twitter: @hjsblogger

About

Demonstration of Web Crawling using Python and Beautiful Soup

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published