Skip to content

The Amazon Bestseller Scraper is a Python-based web scraping tool that allows you to extract product details from the Amazon Bestsellers Fashion page effortlessly. Whether you're a data enthusiast, an aspiring data scientist, or simply curious about the top fashion products on Amazon, this tool is designed to help you gather valuable insights.

eaintkyawthmu/Amazon-Bestseller-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Amazon Bestseller Scraper

MIT License Eaint

About the Project

This is a step-by-step project tutorial on how to scrape the Amazon Bestsellers Fashion page using Python , Beautifulsoup and Selenium. This project is a part of my portfolio to showcase my skills in web scraping and data extraction.

You can reach out to me via :

Gmail Badge Medium Badge Linkedin Badge GitHub Badge

Prerequisites

Before we begin, make sure you have the following prerequisites:

  1. Python installed on your system.
  2. pip (Python package manager) installed.
  3. Chrome browser and driver installed.

Step 1: Install Selenium

To get started, we need to install the Selenium library. Open your terminal or command prompt and run the following command:

pip install selenium

Step 2: Import Necessary Libraries

Create a Python script or Jupyter Notebook for your project and import the necessary libraries at the beginning of your script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

These libraries will help us automate web scraping and parse HTML content.

Step 3: Set Up Chrome Driver

We will use Selenium with a Chrome driver. Set up the Chrome driver in headless mode (i.e., without opening a visible browser window):

options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

This configuration allows us to run the scraping process silently.

Step 4: Define the URL

Now, let's define the URL of the Amazon Bestsellers Fashion page we want to scrape:

url = "https://www.amazon.com/gp/bestsellers/fashion/ref=zg-bs_fashion_dw_sml"

Step 5: Navigate to the URL

Navigate to the URL using the Chrome driver and wait for the page to load:

driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, "p13n-desktop-grid")))

We are waiting for the element with the class name "p13n-desktop-grid" to ensure that the page has loaded before proceeding.

Step 6: Find the Product Column

Next, we find the product column on the page using its class name:

product_column = driver.find_element(By.CLASS_NAME, "p13n-desktop-grid")

Step 7: Parse HTML Content

Parse the HTML content of the product column using BeautifulSoup:

soup = BeautifulSoup(product_column.get_attribute('innerHTML'), 'html.parser')

Step 8: Extract Product Details

Now, let's find all the product items on the page and extract their details:

products = soup.find_all('div', class_='a-cardui _cDEzb_grid-cell_1uMOS expandableGrid p13n-grid-content')
for product in products:
    # Extract product name
    name = product.find('div', {'class': '_cDEzb_p13n-sc-css-line-clamp-3_g3dy1'}).text.strip()

    # Extract product review
    review = product.find('span', {'class':  'a-icon-alt'}).text.strip()

    # Extract product price
    price = product.find('span', {'class': '_cDEzb_p13n-sc-price_3mJ9Z'})
    if price is not None:
        price = price.text
    else:
        price = 'N/A'

    # Extract product link
    link = product.find('a', {'class': 'a-link-normal'})['href']

    print(f'Title: {name}')
    print(f'Review: {review}')
    print(f'Price: {price}')
    print(f'Product Link : https://www.amazon.com{link}')
    print("")

This code snippet extracts product names, reviews, prices, and links for each product and prints them to the console.

Step 9: Run the Script

Save your Python script and run it. You should see the scraped product details

That's it! Now you can scrape the Amazon Bestsellers Fashion page and extract product details using Python and Selenium. Remember to respect website scraping policies and terms of service when scraping any website.


Remember to be respectful of website scraping policies and terms of service when using this scraper.

Happy scraping!

License

MIT © Eaint Kyawt Hmu

About

The Amazon Bestseller Scraper is a Python-based web scraping tool that allows you to extract product details from the Amazon Bestsellers Fashion page effortlessly. Whether you're a data enthusiast, an aspiring data scientist, or simply curious about the top fashion products on Amazon, this tool is designed to help you gather valuable insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published