Skip to content

patcon/toronto-lobbyist-registry-tools

Repository files navigation

Toronto Lobbyist Registry Tools

A tool for processing Toronto's lobbyist data.

The City of Toronto Lobbyist Registrar conveniently make available its open data from the registry on the City's open data portal in XML format.

This code repository consists of:

  1. A command-line tool for:
    • converting the raw XML into CSV.
    • uploading this CSV to a Google Spreadsheet. 📝 CSV
    • updating an online visualization of the relationships 🌐 Graph
  2. A configuration to run the tool nightly in the cloud. 📜 Logs

Usage

To prepare your local development environment for the first time:

# Install Python package dependencies
$ pipenv install

# If you would like to set some configuration from environment variables,
# use this scaffold file.
cp sample.env .env

Command: parse-xml

To process the xml file into spreadsheet format on either the local filesystem or Google Spreadsheet.

$ pipenv run python cli.py parse-xml --help

Usage: cli.py parse-xml [OPTIONS] XML_FILE

  Process XML file of Toronto lobbyist registry data into a CSV file.

  TODO: Document how to generate Google service account credentials.

Options:
  -o, --output-file <file.csv>  If provided, will write local CSV file.
                                Default: print to screen
  --output-gsheet <url/key>     If provided with writable Google spreadsheet
                                URL, CSV will be uploaded.
  --google-creds <file>         JSON keyfile with Google service account
                                credentials. Default: service-key.json
  -h, --help                    Show this message and exit.

Screenshot of running the command

Command: update-graphcommons

For updating a GraphCommons visualization from the XML.

$ pipenv run python cli.py update-graphcommons --help
Usage: cli.py update-graphcommons [OPTIONS] XML_FILE

Options:
  --graph-id <string>  Graph Commons graph ID (find in graph url)
  --api-key <string>   Graph Commons API key  [required]
  -d, --delete         Delete all data from the graph before procesing
  --noop               Skip API calls that change/destroy data
  -h, --help           Show this message and exit.

Screenshot of graph visualization

Technologies Used

  • Python. A programming language common in scripting.
  • Click. A Python library for writing simple command-line tools.
  • CircleCI. A script-running service that runs scheduled tasks for us in the cloud.

🙌 Acknowledgements

This tool was initially created at the request of Bernard Rudny, in a barter deal that saw the exchange of one month's accomodation for a web scraper. Thanks Bernard!

Releases

No releases published

Sponsor this project

Packages

No packages published

Languages