Skip to content

A scrapy project designed to extract useful information inside websites

License

Notifications You must be signed in to change notification settings

KevsterAmp/texts-website-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

texts-website-scraper

This is a Scrapy Project designed to extract specific information from a list of websites.

Installation

To install Scrapy, you must have Python 3.5 or higher installed on your computer. You can install Scrapy using pip, by running the following command: pip install scrapy

Usage

  1. Clone the repository to your local machine.

  2. Create a CSV file containing a list of domain URLs with a column name of domain_url.

  3. Navigate to the cloned repository in your command line interface.

  4. Run the following command:

      scrapy crawl shopify_spider -a csv_file=<path_to_csv_file> -o <output_csv_file>
    

Replace <path_to_csv_file> with the path to your CSV file and replace the <output_csv_file> to the name of the output csv file.

The spider will start crawling the URLs in the CSV file and extracting the text, title, meta description, and alt text from each website. The extracted data will be saved in a CSV file you indicated in the -o <output_csv_file>.

Customization

You can modify the spider to extract additional information from the websites, such as product names or prices. To do so, you will need to modify the parse method in the spider.

About

A scrapy project designed to extract useful information inside websites

Topics

Resources

License

Stars

Watchers

Forks

Languages