Data Portal Analysis

There's a lot of content on the City of Austin's open data portal. This project is about studying that content so we can make the portal better.

Status

We're currently developing the second release of the Portal Analyzer; previous releases can be found on this page.

Current project goals

Write code that grabs specific pieces of information from Austin's public data portal and rearranges it into a format that's useful for analysis.

Next goals include automated publishing to the City's data portal, so everyone can access and analyze this data.

Why we're doing this

There are many ways to explore data quality. Improving data quality is a job that's never done.

Current business needs/issues to explore include:

Identifiers... How often are departments using unique identifiers for City assets? What is the nature of those identifiers? Where might we benefit from using common identifiers?

Redundancy... How often are departments publishing the same information within their datasets? Are there any departments publishing about the same topics who might want to collaborate?

Accessibility... Are we using multiple resources to publish the same information repeatedly for different time periods? (Not ideal for API consumers.) What column labels and descriptions don't match up with their values, and could perhaps use some tuning? How often are schemas changing? Are these changes good or bad for data consumers?

Table grain... How often are we publishing aggregate information (subtotals and totals) when we could be publishing atomic data? This one is huge!

Quick Start Guide

Installation and Use

Run the following commands from a terminal:

git clone https://github.com/open-austin/data-portal-analysis.git
cd data-portal-analysis

Optional steps:

If you will be usng virtualenv, create an environment and activate it before continuing.
To run the most recent stable release, see the note about branches below.

This command will install dependencies:

pip install -r requirements.txt

After pip is finished, run the test suite with:

nosetests -v

Finally, use the folowing command to run the analyzer in online mode; you can replace results.csv with a filename of your choice:

./PortalAnalyzer.py results.csv

Note: PortalAnalyzer.py also creates a file called portal_analyzer.log that can be used for troubleshooting. Passing either -v or --verbose on the command line will result in a more detailed logfile. Use --help for a complete list of options.

Regarding branches

The master branch always contains stable code that passes the same tests as the most recent release, but it may have patches that were not included in that release. The default branch, develop, contains code that is still being tested and should not be used "in production."

The following command can be used to track and checkout master:

git checkout -b master origin/master

To switch back to the development branch, use git checkout develop.

Documentation

How to contribute

The easiest way for Python developers to contribute is by fixing problems detected by QuantifiedCode, because the "learn to fix" link provides guidelines for resolving each issue. Click on the badge below to get started.

Developers can also help by creating enhancements and new features; visit the project board on waffle.io to get an overview of development status.

If you'd like to contribute but you're not sure how to start, comment on the meta-issue for the current release and one of the project maintainers will be happy to help.

Contributing terms

When you contribute to this project, you are sharing and/or creating content. Please do not contribute content unless you agree with the terms here.

Credits

Coming soon

History

A detailed record of significant changes can be found in the changelog

License

Unlicense

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
docs		docs
etc		etc
tests		tests
utilities		utilities
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
PortalAnalyzer.cfg		PortalAnalyzer.cfg
PortalAnalyzer.py		PortalAnalyzer.py
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt

License

open-austin/data-portal-analysis

Folders and files

Latest commit

History

Repository files navigation

Data Portal Analysis

Status

Current project goals

Why we're doing this

Quick Start Guide

Installation and Use

Regarding branches

Documentation

How to contribute

Contributing terms

Credits

History

License

About

Resources

License

Stars

Watchers

Forks

Languages