Skip to content

open-austin/data-portal-analysis

Repository files navigation

Data Portal Analysis

There's a lot of content on the City of Austin's open data portal. This project is about studying that content so we can make the portal better.

Status

Build Status

We're currently developing the second release of the Portal Analyzer; previous releases can be found on this page.

Current project goals

Write code that grabs specific pieces of information from Austin's public data portal and rearranges it into a format that's useful for analysis.

Next goals include automated publishing to the City's data portal, so everyone can access and analyze this data.

Why we're doing this

There are many ways to explore data quality. Improving data quality is a job that's never done.

Current business needs/issues to explore include:

Identifiers... How often are departments using unique identifiers for City assets? What is the nature of those identifiers? Where might we benefit from using common identifiers?

Redundancy... How often are departments publishing the same information within their datasets? Are there any departments publishing about the same topics who might want to collaborate?

Accessibility... Are we using multiple resources to publish the same information repeatedly for different time periods? (Not ideal for API consumers.) What column labels and descriptions don't match up with their values, and could perhaps use some tuning? How often are schemas changing? Are these changes good or bad for data consumers?

Table grain... How often are we publishing aggregate information (subtotals and totals) when we could be publishing atomic data? This one is huge!

Quick Start Guide

Installation and Use

Run the following commands from a terminal:

git clone https://github.com/open-austin/data-portal-analysis.git
cd data-portal-analysis

Optional steps:

  • If you will be usng virtualenv, create an environment and activate it before continuing.
  • To run the most recent stable release, see the note about branches below.

This command will install dependencies:

pip install -r requirements.txt

After pip is finished, run the test suite with:

nosetests -v

Finally, use the folowing command to run the analyzer in online mode; you can replace results.csv with a filename of your choice:

./PortalAnalyzer.py results.csv

Note: PortalAnalyzer.py also creates a file called portal_analyzer.log that can be used for troubleshooting. Passing either -v or --verbose on the command line will result in a more detailed logfile. Use --help for a complete list of options.

Regarding branches

The master branch always contains stable code that passes the same tests as the most recent release, but it may have patches that were not included in that release. The default branch, develop, contains code that is still being tested and should not be used "in production."

The following command can be used to track and checkout master:

git checkout -b master origin/master

To switch back to the development branch, use git checkout develop.

Documentation

Documentation Status

How to contribute

The easiest way for Python developers to contribute is by fixing problems detected by QuantifiedCode, because the "learn to fix" link provides guidelines for resolving each issue. Click on the badge below to get started.

Code Issues

Developers can also help by creating enhancements and new features; visit the project board on waffle.io to get an overview of development status.

Stories in Ready

If you'd like to contribute but you're not sure how to start, comment on the meta-issue for the current release and one of the project maintainers will be happy to help.

Contributing terms

When you contribute to this project, you are sharing and/or creating content. Please do not contribute content unless you agree with the terms here.

Credits

Coming soon

History

A detailed record of significant changes can be found in the changelog

License

Unlicense