Skip to content

Regression and Statistical analysis of crimes in Chicago from 2015 until 2020.

Notifications You must be signed in to change notification settings

leo-cavalcante/crimes-in-chicago

Repository files navigation

Ironhack Logo

"Crimes in Chicago" statistically significant insights and Linear Regression

Leonardo Cavalcante Araújo, Vinamrata Yadav, Natalia Calderón

Data Analytics Full-Time FEB2021, Paris & March 12nd 2021

Content

Chicago crimes

Project Description

Group project developed in trio, during a weekend and 2 weekdays (totalising 4 days).

Objective

The project had 2 distinct objectives:

  1. Derive statistically significant insights from a database.
  2. Model a regression analysis for a variable (in this project, we have chosen to do use the linear regression to predict the probability of a crime to happen in a given date with some given circunstances.)

Workflow

  1. Database search and download, finally deciding on a open source database from the Chicago Data Portal - Crimes from 2001 to Present. The resulting database had 20 years of observations, totalising 7.5 million rows.
  2. Data Cleaning and filtering for the past 5 years (2015-2020), resulting in a database of around 1.5 million observations.
  3. Data Analysis & Visualisations: Using Python, Matplotlib and Seaborn.
  4. Hypothesis Testing: to test statistically significant events.
  5. Linear Regression using OLS (Ordinary Least Squares): to predict crimes happening in a given date with known circonstances.
  6. Assumptions testing: verification of the assumptions for the OLS model.
  7. Presentation: slides construction and oral presentation to our Ironhack Cohort.

Organization

Group members responsibilities

  • Leonardo: full Data Cleaning, some data visualisations, 1 Hypothesis Test, the whole Linear Regression (using OLS), plus a big part of the Google Slides presentation.
  • Vina: some data analysis, some data visualisations, 2 hypothesis tests and some slides in the Google Slides presentation.
  • Natalia: research of database and some interesting insights, some data analysis and a few slides of the final presentation.

Links

Here you may find the relevant links for the main documents produced during this project:

Chicago Crimes - Google Slides Final Presentation

GitHub Repository: crimes-in-chicago

Crimes in Chicago - Cleaning

Crimes in Chicago - Geographical Analysis

Crimes in Chicago - Typology of Crimes and Arrests

Crimes in Chicago - Crimes per Communities

Crimes in Chicago - Time Analysis

PS.: only the main files have been mentioned in this section, nevertheless the repository contains also other auxiliary files.