Amazon Bestseller Scraper

About the Project

This is a step-by-step project tutorial on how to scrape the Amazon Bestsellers Fashion page using Python , Beautifulsoup and Selenium. This project is a part of my portfolio to showcase my skills in web scraping and data extraction.

You can reach out to me via :

Prerequisites

Before we begin, make sure you have the following prerequisites:

Python installed on your system.
pip (Python package manager) installed.
Chrome browser and driver installed.

URL to be scraped is - https://www.amazon.com/gp/bestsellers/fashion/ref=zg-bs_fashion_dw_sml

Step 1: Install Selenium

To get started, we need to install the Selenium library. Open your terminal or command prompt and run the following command:

pip install selenium

Step 2: Import Necessary Libraries

Create a Python script or Jupyter Notebook for your project and import the necessary libraries at the beginning of your script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

These libraries will help us automate web scraping and parse HTML content.

Step 3: Set Up Chrome Driver

We will use Selenium with a Chrome driver. Set up the Chrome driver in headless mode (i.e., without opening a visible browser window):

options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

This configuration allows us to run the scraping process silently.

Step 4: Define the URL

Now, let's define the URL of the Amazon Bestsellers Fashion page we want to scrape:

url = "https://www.amazon.com/gp/bestsellers/fashion/ref=zg-bs_fashion_dw_sml"

Step 5: Navigate to the URL

Navigate to the URL using the Chrome driver and wait for the page to load:

driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, "p13n-desktop-grid")))

We are waiting for the element with the class name "p13n-desktop-grid" to ensure that the page has loaded before proceeding.

Step 6: Find the Product Column

Next, we find the product column on the page using its class name:

product_column = driver.find_element(By.CLASS_NAME, "p13n-desktop-grid")

Step 7: Parse HTML Content

Parse the HTML content of the product column using BeautifulSoup:

soup = BeautifulSoup(product_column.get_attribute('innerHTML'), 'html.parser')

Step 8: Extract Product Details

Now, let's find all the product items on the page and extract their details:

products = soup.find_all('div', class_='a-cardui _cDEzb_grid-cell_1uMOS expandableGrid p13n-grid-content')
for product in products:
    # Extract product name
    name = product.find('div', {'class': '_cDEzb_p13n-sc-css-line-clamp-3_g3dy1'}).text.strip()

    # Extract product review
    review = product.find('span', {'class':  'a-icon-alt'}).text.strip()

    # Extract product price
    price = product.find('span', {'class': '_cDEzb_p13n-sc-price_3mJ9Z'})
    if price is not None:
        price = price.text
    else:
        price = 'N/A'

    # Extract product link
    link = product.find('a', {'class': 'a-link-normal'})['href']

    print(f'Title: {name}')
    print(f'Review: {review}')
    print(f'Price: {price}')
    print(f'Product Link : https://www.amazon.com{link}')
    print("")

This code snippet extracts product names, reviews, prices, and links for each product and prints them to the console.

Step 9: Run the Script

Save your Python script and run it. You should see the scraped product details

That's it! Now you can scrape the Amazon Bestsellers Fashion page and extract product details using Python and Selenium. Remember to respect website scraping policies and terms of service when scraping any website.

Remember to be respectful of website scraping policies and terms of service when using this scraper.

Happy scraping!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
bestseller_scraper.ipynb		bestseller_scraper.ipynb
test_commit.py		test_commit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

bestseller_scraper.ipynb

bestseller_scraper.ipynb

test_commit.py

test_commit.py

Repository files navigation

Amazon Bestseller Scraper

About the Project

Prerequisites

URL to be scraped is - https://www.amazon.com/gp/bestsellers/fashion/ref=zg-bs_fashion_dw_sml

Step 1: Install Selenium

Step 2: Import Necessary Libraries

Step 3: Set Up Chrome Driver

Step 4: Define the URL

Step 5: Navigate to the URL

Step 6: Find the Product Column

Step 7: Parse HTML Content

Step 8: Extract Product Details

Step 9: Run the Script

License

About

Releases

Languages

eaintkyawthmu/Amazon-Bestseller-Scraper

Folders and files

Latest commit

History

README.md

README.md

bestseller_scraper.ipynb

bestseller_scraper.ipynb

test_commit.py

test_commit.py

Repository files navigation

Amazon Bestseller Scraper

About the Project

Prerequisites

URL to be scraped is - https://www.amazon.com/gp/bestsellers/fashion/ref=zg-bs_fashion_dw_sml

Step 1: Install Selenium

Step 2: Import Necessary Libraries

Step 3: Set Up Chrome Driver

Step 4: Define the URL

Step 5: Navigate to the URL

Step 6: Find the Product Column

Step 7: Parse HTML Content

Step 8: Extract Product Details

Step 9: Run the Script

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages