Skip to content

bdmorin/gutenberg-botany-spider

Repository files navigation

Project Name: gutenberg-botany-spider

Scrapy spider that scrapes project gutenberg for ebooks from the search term botany.

Setting Up

This project uses Python 3.7+ Follow these steps to setup your Python environment.

Creating a Virtual Environment

Python3 comes with built-in support for virtual environments (via the venv module). You can create a virtual environment using the following command:

python3 -m venv env

Activating the Virtual Environment

Before you can start installing or using packages in your virtual environment you’ll need to activate it. Activating a virtual environment will put the virtual environment-specific python and pip executables into your shell’s PATH.

On macOS and Linux:

source env/bin/activate

On Windows:

.\env\Scripts\activate

You’ll know your virtual environment is activated once the name of it shows up on the left side of the terminal line (e.g. (env)).

Installing Dependencies

Once you've activated your virtual environment, you can install the project dependencies from the requirements.txt file:

pip install -r requirements.txt

Running the Project

You can modify this to suite your needs, however this worked for me.

scrapy crawl gutenberg-botany-spider -o gutenberg-botany-spider.csv -L DEBUG

I'm using icecream for simple debugging.

License

No license, whatever python and scrapy is.

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages