Skip to content

researchsherpa/access-the-data-workshop-311-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

H4LA Support

This repository is a collection of notebooks, python modules, and data, addressing data literacy issues in the access the data project at H4LA. Hopefully it provides some ideas on analytic techniques and provides some guidance to the UX/UI team.

Introduction - Annotated Workflow

Let's start with a farily generic data science based workflow and some details for each step:



Initial questons, and data, will support the 311-data at H4LA. The 311-data site gets data from the API but I will be using a snapshot with similar data. I will use the personas identified in the 311 analysis to design the questions.

My analysis is straight forward. I use standard python libraries for dataframes (and geodataframes) to map from file-based inputs to analytics ready data. I like to get a sense of what's in the data by understanding the semantics of the columns, dtypes, and missing values. For this 311-data set I need to convert the csv to a geodataframe for display and spatial analytics. These are all standard idioms for this data type. (Note: What kinds of questions, data fusion, ... does the geo enable?). Important libraries used in the analysis include pandas, geopandas, and numpy.

I'm developing/demonstrating with jupyter lab. Lab provides a rich environment for exploration, iteration, visualization... I make use of multiple widget packages to analyze and visualize the data. The key packages are ipywidgets, ipyleaflet, and (soon) bqplot. I think good dashboard design is a black art. I'm going to take some inspiration from the CfAtl ideas (well maybe GATech).

As any analysis unfolds new questions are always uncovered. This drives us to new and different sources of information for answers. Part of our research is uncovering information sources to address these new questions. Part of our understanding comes from applying different analytic techniques such as spatial, time series and crossfiltering. Our motivation is always about adding more structure to the data we have! Once I'm beyond the basics I propose to look at the following: 1) NC census data to "sort"/"select"/... NC's; 2) Land use data from LA county to determine relative percentage of residential vs commercial; and 3) Street network data to indicate road network complexity. These are all techniques to use contextual data from the NC's as an order/searching criteria for the 311-calls.

As I develop analytic notebooks, I like to keep my eye out for opportunities develop packages/modules that can be shared. I have one simple example in the src directory for reading and transforming the 311 shape file. As you look at the notebooks you should see examples of repetitive hacks. These should be converted to code! Techniques to link, fuse, and share are very important.

The work flow allows me to package, and document, the processes used in the analysis. Notebooks (like this one) can be published and shared. It provides a structured approach to uncovering details in the data, software interfaces for various services, integrating context for understanding, and creating new data sets where they add value.

Contents of the Repository

This is just the starting point. There is much more to do!

Note: (01/18/2022) I'm creating a branch to use parquet files. Code related to other file types is still included, but commented out.

About

Updated previous version to use parquet files and generate required data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published