Predicting Chicago Business Closures

Aya Liu, Ben Fogarty, Parth Khare
12 June 2019

CAPP 30254: Machine Learning for Public Policy
Harris School of Public Policy

Project overview & requirements

This project's folder contains the following files:

pipeline_library.py: general functions for a machine learning pipeline (reading data, preprocessing data, generating features, building models, etc.)
predict_closures.py: specific functions for applying pipeline_library to the Chicago Business Licenses data
load_data.py: downloads and links all the necessary datasets for this analysis
tokens.json: a json file containing Chicago Open Data Portal and US Census Bureau API tokens to allow for API access to these data sources; the provided file includes formats but not actual tokens for security reasons
data_exploration.ipynb: contains code demonstrating basic data exploration
get_pickle.sh: downloads pickled version of dataset from 10 June 2019 (Linux-specific)
get_pickle_mac.sh: downloads pickled version of dataset from 10 June 2019 (Mac-specific)
getfiles_mac.sh: downloads csv files for datasets that could not be obtained through an API
mlproject-env.yml: Anaconda environment configuration file for running the project
configs/: contains json files specifying preprocessing, feature generation, and model specifications to be passed to the predict_closures program
ethics_aq/: contains files related to the bias and fiarness report
ethics_aq/ethics_aequitas.py: code for producing the bias and fairness report

The project was developed using Python 3.7.3 on MacOS Mojave 10.14.4, and results were obtained by running the project on a compute node of the Research Computing Center at the University of Chicago and on a c4.8xlarge, c5.9xlarge, and c5.18xlarge AWS EC2 virtual machines. It requires the following libraries and their dependencies:

Package	Version
certifi	2019.3.9
geopandas	0.5.0
graphviz	0.10.1
matplotlib	3.0.3
numpy	1.16.2
pandas	0.24.2
scikit-learn	0.20.3
seaborn	0.9.0
shapely	1.6.4
sodapy	1.5.2
urllib3	1.25.3

Alternatively, a conda environment including all of the necessary data libraries is available on Anaconda Cloud at fogarty-ben/mlproject or in the mlproject-env.yml file in the root of the repository.

Helpful documentation and references are cited throughout the docstrings of the code.

To run the program, use the following command:

python3 predict_closures.py -f <path to features config JSON file> -m <path to models config JSON file> [-p <path to optional preprocessing config file>] [-d <path to pickled dataset>] [-s <optional random seed>] [--savefigs (denotes that figures should be saved instead of displayed)] [--savepreds (denotes that predictions from each testing set should be saved)] [--saveeval (denotes that evaluation tables should be saved)]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

configs

configs

ethics_aq

ethics_aq

README.md

README.md

data_exploration.ipynb

data_exploration.ipynb

get_pickle.sh

get_pickle.sh

get_pickle_mac.sh

get_pickle_mac.sh

getfiles_mac.sh

getfiles_mac.sh

load_data.py

load_data.py

mlproject-env.yml

mlproject-env.yml

pipeline_library.py

pipeline_library.py

predict_closures.py

predict_closures.py

tokens.json

tokens.json

Repository files navigation

Predicting Chicago Business Closures

Project overview & requirements

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
configs		configs
ethics_aq		ethics_aq
README.md		README.md
data_exploration.ipynb		data_exploration.ipynb
get_pickle.sh		get_pickle.sh
get_pickle_mac.sh		get_pickle_mac.sh
getfiles_mac.sh		getfiles_mac.sh
load_data.py		load_data.py
mlproject-env.yml		mlproject-env.yml
pipeline_library.py		pipeline_library.py
predict_closures.py		predict_closures.py
tokens.json		tokens.json

aya-liu/predict-business-closure

Folders and files

Latest commit

History

Repository files navigation

Predicting Chicago Business Closures

Project overview & requirements

About

Resources

Stars

Watchers

Forks

Languages