H4LA Support

This repository is a collection of notebooks, python modules, and data, addressing data literacy issues in the access the data project at H4LA. Hopefully it provides some ideas on analytic techniques and provides some guidance to the UX/UI team.

Introduction - Annotated Workflow

Let's start with a farily generic data science based workflow and some details for each step:

Initial questons, and data, will support the 311-data at H4LA. The 311-data site gets data from the API but I will be using a snapshot with similar data. I will use the personas identified in the 311 analysis to design the questions.

My analysis is straight forward. I use standard python libraries for dataframes (and geodataframes) to map from file-based inputs to analytics ready data. I like to get a sense of what's in the data by understanding the semantics of the columns, dtypes, and missing values. For this 311-data set I need to convert the csv to a geodataframe for display and spatial analytics. These are all standard idioms for this data type. (Note: What kinds of questions, data fusion, ... does the geo enable?). Important libraries used in the analysis include pandas, geopandas, and numpy.

I'm developing/demonstrating with jupyter lab. Lab provides a rich environment for exploration, iteration, visualization... I make use of multiple widget packages to analyze and visualize the data. The key packages are ipywidgets, ipyleaflet, and (soon) bqplot. I think good dashboard design is a black art. I'm going to take some inspiration from the CfAtl ideas (well maybe GATech).

As any analysis unfolds new questions are always uncovered. This drives us to new and different sources of information for answers. Part of our research is uncovering information sources to address these new questions. Part of our understanding comes from applying different analytic techniques such as spatial, time series and crossfiltering. Our motivation is always about adding more structure to the data we have! Once I'm beyond the basics I propose to look at the following: 1) NC census data to "sort"/"select"/... NC's; 2) Land use data from LA county to determine relative percentage of residential vs commercial; and 3) Street network data to indicate road network complexity. These are all techniques to use contextual data from the NC's as an order/searching criteria for the 311-calls.

As I develop analytic notebooks, I like to keep my eye out for opportunities develop packages/modules that can be shared. I have one simple example in the src directory for reading and transforming the 311 shape file. As you look at the notebooks you should see examples of repetitive hacks. These should be converted to code! Techniques to link, fuse, and share are very important.

The work flow allows me to package, and document, the processes used in the analysis. Notebooks (like this one) can be published and shared. It provides a structured approach to uncovering details in the data, software interfaces for various services, integrating context for understanding, and creating new data sets where they add value.

Contents of the Repository

311 data is not included in this version (but neighborhood's are)
Execute the first three notebooks in order below to generate 311
Zoning and API data are generated by the corresponding nb
Notebooks directory contains the initial notebooks available here:

311 Data:
- 311-data-cleaning.ipynb - Process the 311 csv file
- 311-data-cleaning-part2.ipynb - Extend with service region info and leaflet data
- build-datasets.ipynb - Run this to test and generate data for graffiti and ECWANDC notebooks
H4LA API Data:
- api-hacks.ipynb - Build data sets from the 311-data.org API
- api-data-analysis.ipynb - Compare myla and api data
Add Context:
- NC-service-region.ipynb - Starting point to add some context from empower.la
- NC-population-density.ipynb - Build a choropleth map with population per square mile
Selecting Data:
- 311-ncviz.ipynb - Example of selecting an NC and visualizing the data
- ECWANDC.ipynb - This example looks at Empowerment Congress West Area
Graffiti Analysis:
- 311-request-type-analysis.ipynb Experiement with different parameters to select datasets
- 311-folium-presentation.ipynb - Choropleth and animated maps
- 311-folium-with-density.ipynb - Adding a new section showing density of requests
Context Again:
- zoning.ipynb - Lot's-o-work left on this one

This is just the starting point. There is much more to do!

Note: (01/18/2022) I'm creating a branch to use parquet files. Code related to other file types is still included, but commented out.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
data		data
img		img
notebooks		notebooks
src		src
ENV.md		ENV.md
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

img

img

notebooks

notebooks

src

src

ENV.md

ENV.md

README.md

README.md

environment.yml

environment.yml

Repository files navigation

H4LA Support

Introduction - Annotated Workflow

Contents of the Repository

About

Releases

Packages

Languages

researchsherpa/access-the-data-workshop-311-analysis

Folders and files

Latest commit

History

Repository files navigation

H4LA Support

Introduction - Annotated Workflow

Contents of the Repository

About

Resources

Stars

Watchers

Forks

Languages