Count Love Crawler

Installation

To isolate the crawler and its dependencies, it is recommended that you install in a Python virtual environment.

Tested with Python 3.9 (but should be compatible with a range of versions).

Install dependencies

To install dependencies, run:

pip install -r requirements.txt

Setup SQLite database

The SQLite3 database stores the source list, crawler queue, and content extracted from pages. To create a database run:

sqlite3 data.db < schema.sql

Running crawl

To run the crawl, run:

python crawler.py

While the crawl is running, details and diagnostic information is logged to "crawl.log". Because the Sources table is initially empty, running python crawler.py has no effect until a source is added. Here's an example of how to add a source by directly interacting with the database table:

sqlite3 data.db
INSERT INTO Sources VALUES (NULL, 'https://nytimes.com', 'New York, NY', 1, datetime('now'), NULL);

Rerunning python crawler.py will now print a list of potential articles with protest keywords to the console.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
categorize.py		categorize.py
common.py		common.py
config.py		config.py
crawler.py		crawler.py
email_crawl.py		email_crawl.py
extract_text.py		extract_text.py
log_file.py		log_file.py
log_stdout.py		log_stdout.py
paragraphs.py		paragraphs.py
requirements.txt		requirements.txt
schema.sql		schema.sql
utility.py		utility.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Count Love Crawler

Installation

Install dependencies

Setup SQLite database

Running crawl

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

count-love/crawler

Folders and files

Latest commit

History

Repository files navigation

Count Love Crawler

Installation

Install dependencies

Setup SQLite database

Running crawl

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages