Skip to content

Binary classification of residential utility problems in NYC; Capstone project for the IBM Certificate in Data Science

Notifications You must be signed in to change notification settings

Sara1583/IBM-Data-Science-project

Repository files navigation

IBM-Data-Science-project

Capstone project for the IBM Certificate in Data Science on EdX from 2019

In 2019 I took the IBM-sponsored Data Science certificate course on the EdX platform. The notebooks here are from the capstone project which onvolved a utilities complaints dataset from New York City. Each notebook represents a different question from the project and includes the loading, cleaning, visualization, and subsetting of the dataset. The final portion is a classification exercise to predict whether a complaint was about water and heating or something else based on the characteristics of the complaint.

The first notebook answers what type of complaint the department should focus on. Since there was no additional guidance on decision-making, I took that to mean the most common type of complaint. The question also asked to subset the complaints before January 1, 2019.

The second notebook focuses on the question of which borough should be the focus, given the complaint type identified in the first section.

The third notebook focuses on whether there are any associations between complaint characteristics. This notebook includes more data cleaning based on some issues identified in the second notebook and correlation visualizations.

The final notebook compares five different classifiers for classifiying complaints as heat and hot water related versus everything else.

About

Binary classification of residential utility problems in NYC; Capstone project for the IBM Certificate in Data Science

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published