cookiecutter-collector

A Python data collector Cookiecutter template. The scraper is designed to work in a ScraperWiki "box", however it can be deployed virtually in any Unix environment. For detailed documentation about how to create and manage scrapers on ScraperWiki please refer to its official documentation.

Usage

Generate a new collector:

cookiecutter https://github.com/reubano/cookiecutter-collector.git

Then:

Edit config.py.
Edit app/utils.py.
Edit app/models.py.
Run manage setup to create the db.
Run manage run to populate the db.

Collector Structure

The default way to use ScrapeWiki is to store data in a SQLite database named scraperwiki.sqlite in the user's root directory. This enables a series of features such as an interactive SQL querier, an html table view with filters, API endpoints for making remote SQL queries, etc.

The folder structure is as follows:

collector-skeleton
    +---LICENSE
    +---Makefile
    +---README.md
    +---app
    |   +---__init__.py
    |   +---models.py
    |   +---utils.py
    +---bin
    |   +---check-stage
    |   +---upload
    |   +---setup
    +---config.py
    +---dev-requirements.txt
    +---http
    |   +---index.html
    +---manage.py
    +---requirements.txt
    +---setup.cfg
    +---setup.py
    +---tests
        +---__init__.py
        +---standard.rc
        +---test.sh

manage.py contains the main script commands.
config.py contains the configuration settings.
http generally contains an index.html file with the summary of the scraping task and any other files that are intended to be available through an API endpoint, such as a log.txt file.
app contains the collector model and initialization.

Looking for collector examples?

reubano/hdxscraper-acled: Armed Conflict Location & Event Data Project (ACLED) Realtime Data collector.
reubano/hdxscraper-fao: Food Aid Organization Data collector.
reubano/hdxscraper-fts: UN Financial Tracking Service (FTS) API collector.

Want to contribute?

I will glady accept pull requests if they improve the collector development experience.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
{{cookiecutter.project_name}}		{{cookiecutter.project_name}}
.gitignore		.gitignore
README.rst		README.rst
cookiecutter.json		cookiecutter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{{cookiecutter.project_name}}

{{cookiecutter.project_name}}

.gitignore

.gitignore

README.rst

README.rst

cookiecutter.json

cookiecutter.json

Repository files navigation

cookiecutter-collector

Usage

Collector Structure

Looking for collector examples?

Want to contribute?

About

Releases

Packages

Languages

reubano/cookiecutter-collector

Folders and files

Latest commit

History

Repository files navigation

cookiecutter-collector

Usage

Collector Structure

Looking for collector examples?

Want to contribute?

About

Topics

Resources

Stars

Watchers

Forks

Languages