Skip to content

shrikant-joshi/scraperwiki_local

 
 

Repository files navigation

Local ScraperWiki Python Library

This library aims to be a drop-in replacement for the Python scraperwiki library for use locally. That is, functions will work the same way, and data will go into a local SQLite database; a targeted bombing of ScraperWiki's servers will not stop this local library from working, unless you happen to be running it on one of ScraperWiki's servers.

Installing

This will soon be in PyPI, but for now you can just install from the git repository.

Documentation

Read the standard ScraperWiki Python library's documentation, then look below for some quirks about the local version.

Quirks

The local library aims to be a drop-in replacement. In reality, the local version sometimes works better, though not all of the features have been implemented.

Differences

Datastore differences

The local scraperwiki.sqlite is powered by DumpTruck, so some things work a bit differently.

Data is stored to a local sqlite database named scraperwiki.sqlite.

Bizarre table and column names are supported.

scraperwiki.sqlite.execute will return an empty list of keys on an empty select statement result.

scraperwiki.sqlite.attach downloads the whole datastore from ScraperWiki, the first time it runs; it then uses the cached database

Other Differences

Status of implementation

In general, features that have not been implemented raise a NotImplementedError.

Datastore

scraperwiki.sqlite is missing the following features.

  • All of the verbose keyword arguments (These control what is printed on the ScraperWiki code editor)

Geo

The UK geocoding helpers (scraperwiki.geo) documented on scraperwiki.com have been implemented. They partially depend on scraperwiki.com being available.

Utils

scraperwiki.utils is implemented, as well as the following functions.

  • scraperwiki.log
  • scraperwiki.scrape
  • scraperwiki.pdftoxml
  • scraperwiki.swimport

Deprecated

These submodules are deprecated and thus will not be implemented.

  • scraperwiki.apiwrapper
  • scraperwiki.datastore
  • scraperwiki.jsqlite
  • scraperwiki.metadata
  • scraperwiki.newsql

Development

Run tests with ./runtests; this small wrapper cleans up after itself.

Specs

Here are some ScraperWiki scrapers that demonstrate the non-local library's quirks.

https://scraperwiki.com/scrapers/scraperwiki_local/ https://scraperwiki.com/scrapers/cast/ https://scraperwiki.com/scrapers/things_happen_when_you_do_not_commit/ https://scraperwiki.com/scrapers/what_does_show_tables_return/ https://scraperwiki.com/scrapers/on_conflict/ https://scraperwiki.com/scrapers/spaces_in_table_names/ https://scraperwiki.com/scrapers/spaces_in_table_names_1/

About

Offline version of scraperlibs

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%