GitHub - tsdataclinic/Vera: A consolidated dataset of 911 call for response data for 5 US cities

911 calls for service

In 2019 the Vera Institute of Justice (Vera) partnered with Two Sigma Data Clinic to produce a consolidated datasets of 911 data from 5 US cities. Each of these cities publish their the data on their respective open data portals however, the schema for each dataset, the units used for location and time, as well as the categories for each variable, vary wildly from city to city. This repo contains code that downloads, standardizes, and consolidates data from these different sources. Once standardized, we attached demographic information from the 2017 US Census American Community Survey (ACS) to provide additional context to each call.

In addition to the scripts used to standardize the data, we include code for descriptive statistics and visualizations of the data. See the end of the readme file for how to use this code.

To read about the process of creating this project, check out the blog series on Data Clinic's Medium page:

The cities

The cities we have focused on for this project are:

New Orleans
Seattle
Dallas
Detroit
Charleston

These were selected because their 911 call data has the largest coverage of the variables of interest. The selection of specific variables (listed below) to focus on was driven by the research interests of Vera, as well as different questions we hoped to ask of the data.

Call for Action (CFA) code (the reason for the call)
Disposition (the ultimate outcome of the call from an enforcement activity standpoint)
Response Time (how long it took to respond to each call)
Call Type (whether the call initiated from a 911 call, a police officer, or otherwise).

In addition to these variables, we attached the following socio-demographic variables from the 2017 ACS. These variables are assigned based on the tract in which the call was reported to originate in.

total_pop : B01003_001
median_age : B01002_001
white_pop : B03002_003
black_pop": B03002_004
amerindian_pop : B03002_005
asian_pop : B03002_006
other_race_pop : B03002_008
hispanic_pop : B03002_012
married_households : B11001_003
in_school : B14001_002
high_school_diploma : B15003_017
high_school_including_ged : B07009_003
poverty : B17001_002
median_income : B19013_001
gini_index : B19083_001
housing_units : B25002_001
vacant_housing_units : B25002_003
occupied_housing_units : B25003_001
median_rent : B25058_001
percent_income_spent_on_rent : B25071_001
pop_in_labor_force : B23025_002
employed_pop : B23025_004
unemployed_pop : B23025_005

Accessing the data

The data can be downloaded directly from the following links. It comes in the following forms:

A csv of all cities combined with demographic data attached.
A csv for each individual city with demographic data attached.
- New Orleans
- Dallas
- Detroit
- Charleston
- Seattle
A csv for each individual city with no demographic data.
- New Orleans
- Dallas
- Detroit
- Charleston
- Seattle

Building the data

If you want to build the data from scratch, the easiest way is to use the docker container within this project. To do so run the following commands:

With Docker on Linux

docker build -t vera .
docker run -it --rm -v $(pwd):/data /bin/bash
python generate_dataset.py

With Docker on Windows

Install Docker Desktop for Windows. Then, share your c:// drive with Docker via your installed Docker's settings.

Then, from git bash enter the following:

docker build -t vera .
docker run -it -v /$(pwd):/data vera bash
python generate_dataset.py

💡 Note: If you get an error like "input device is not a TTY", try the same docker run command but with "winpty" appended at the beginning

This will download the datasets from the various open data portals, apply the standardization procedure, and output the results. Depending on your hardware / internet connection the process might take a few hours.

Once the script has run, you can find the data in data/processed. There should be one feather file and one csv file for each city.

Analyzing the data

Once the data has been generated, you can use the included classes to easily summarize, visualize, and analyse the data. There is a class per city that can be accessed as follows:

from src.cities.new_orleans import NewOrleans

new_orleans  = NewOrleans()
data = new_orleans.clean_data()

In addition to simply accessing the data, you can use the following functions on each city to produce summaries of the data. For each function, if a variable such as year / call type is not used, the entire universe of that variable is used.

disposition_by_tract(call_type, year, norm_by) : Make a summary of the call outcome (disposition) by census tract.
self_initiated_by_call_type(year): Make a summary of the number of calls that are self initiated (officer initiated) vs not by the type of call

Visualizing the data

A number of methods for visualizing the data can be found in the src.visualization module. Each of these takes a city object and some additional parameters as an argument and returns a matplotlib plot. For example:

from src.cities.new_orleans import NewOrleans
import src.visualization.visualize as vis

new_orleans = NewOrleans()
vis.plot_self_initiated_by_call_type(year=1995)

An easy way to do this is to start a jupyter lab session in the provided docker container:

docker build -t vera .
./start_notebook_docker.sh

then navigate to http://localhost:8888.

Contributing to the project

If you find a bug in the data or the processing code, please feel free to open an issue on this repo describing the problem.

If you want to add a new city to the analysis, start by opening an issue on the repo declaring that you would like to do so, then take a look at how cities are specified by opening up one of the existing city config files in src/cities. This should give you an idea of the kinds of things that need to be specified for each city and how to override parts of processing pipeline where necessary.

If you would like to add a new feature to existing cities, take a look at the code in src/features.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
download_census_tracts.py		download_census_tracts.py
generate_dataset.py		generate_dataset.py
requirements.txt		requirements.txt
setup.py		setup.py
start_notebook_docker.sh		start_notebook_docker.sh

License

tsdataclinic/Vera

Folders and files

Latest commit

History

Repository files navigation

911 calls for service

The cities

Accessing the data

Building the data

With Docker on Linux

With Docker on Windows

Analyzing the data

Visualizing the data

Contributing to the project

About

Resources

License

Stars

Watchers

Forks

Languages