
In this 'Web Crawling with Python' repo, we have covered the following scenario:
Unique links from LambdaTest E-commerce Playground are crawled using Beautiful Soup. Content (i.e., product meta-data) from the crawled content is than scraped with Beautiful Soup. I have a detailed blog & repo on Web Scraping with Python, details below:
Step 1
Create a virtual environment by triggering the virtualenv venv command on the terminal
virtualenv venv

Step 2
Navigate the newly created virtual environment by triggering the source venv/bin/activate command on the terminal
source venv/bin/activate
Follow steps(3) and (4) for performing web scraping on LambdaTest Cloud Grid:
Step 3
Run the make install command on the terminal to install the desired packages (or dependencies) - Beautiful Soup,urrlib3, etc.
make install

With this, all the dependencies and environment variables are set. We are all set for web crawling with Beautiful Soup (bs4).
Follow the below mentioned steps to for crawling the LambdaTest E-commerce Playground
Step 1
Trigger the command make clean
to clean the remove pycache folder(s) and .pyc files

Step 2
Trigger the make crawl-ecommerce-playground
command on the terminal to crawl the LambdaTest E-Commerce Playground


As seen above, the content from LambdaTest E-commerce playground was crawled successfully! Fifty five unique product links are now available to be scraped in the exported JSON file (i.e., ecommerce_crawled_urls.json)
Step 3
Now that we have the crawled information, trigger the make scrap-ecommerce-playground
command on the terminal to scrap the product information (i.e., product name, product price, product availability, etc.) from the exported JSON file.


Also, all the 55 links on are scraped without any issues!
Feel free to fork the repo and contribute to make it better! Email to himanshu[dot]sheth[at]gmail[dot]com for any queries or ping me on the following social media sites:
LinkedIn: @hjsblogger
Twitter: @hjsblogger