texts-website-scraper

This is a Scrapy Project designed to extract specific information from a list of websites.

Installation

To install Scrapy, you must have Python 3.5 or higher installed on your computer. You can install Scrapy using pip, by running the following command: pip install scrapy

Usage

Clone the repository to your local machine.
Create a CSV file containing a list of domain URLs with a column name of domain_url.
Navigate to the cloned repository in your command line interface.

Run the following command:

  scrapy crawl shopify_spider -a csv_file=<path_to_csv_file> -o <output_csv_file>

Replace <path_to_csv_file> with the path to your CSV file and replace the <output_csv_file> to the name of the output csv file.

The spider will start crawling the URLs in the CSV file and extracting the text, title, meta description, and alt text from each website. The extracted data will be saved in a CSV file you indicated in the -o <output_csv_file>.

Customization

You can modify the spider to extract additional information from the websites, such as product names or prices. To do so, you will need to modify the parse method in the spider.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
text_scraper		text_scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text_scraper

text_scraper

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

texts-website-scraper

Installation

Usage

Customization

About

Languages

License

KevsterAmp/texts-website-scraper

Folders and files

Latest commit

History

Repository files navigation

texts-website-scraper

Installation

Usage

Customization

About

Topics

Resources

License

Stars

Watchers

Forks

Languages