Skip to content

Exploratory Data Analysis on a traffic accident dataset from the Contiguous United States, focusing on the weather/environmental factors

Notifications You must be signed in to change notification settings

lodi-m/US-Traffic-Accidents-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

US Traffic Accidents Analysis

This is an exploratory data analysis conducted on a dataset from the Contiguous United States. The dataset is continously being collected and currently contains information from 3 million traffic accidents. The data was collected using multiple Traffic APIs, such as MapQuest and Bing, and several other data providers.

Although this dataset can be used for a variety of applications, I will be focusing on how precipitation and other environmental factors impacted the accidents.

If you would like to learn more about the dataset itself please visit: https://smoosavi.org/datasets/us_accidents


High-level Findings

  • The majority of accidents, 31%, lasted one hour, from 1AM - 3AM. I assume this is due to driver's being tired late at night, which leads to decreased alertness, slower reaction time and impaires judgement as well.
  • Accidents last the longest in Wyowing and Oregon, lasting 6 and 4 hours respectively.
  • There is a sharp increase in accidents near the end of the year. As seen in the graph below, the number of accidents start to exponentially increase beginning October. I assume this is due to the holiday season, as more and more people go outside to purchase gifts and to generally prepare for the holidays.

  • Most accidents take place when the temperature is between 50° - 60° F, with 34% of those accidents having a Severity of 2 (Severity in the dataset was on a scale from 1 - 5)

  • Out of 92 different weather conditions in the dataset, most accidents occured during Cloudy and Fair weather. Most of these accidents also had a Severity of 2, like with temperature

  • Severity is most correlated to the wind speed at the time of the accident and is least correlated to precipitation and humidity as seen in the correlation matrix below.
    • NOTE: The closer the value in the square is to zero, the more the variables are less correlated. As such, the further the value in the square gets from zero, the more correlated the two variables are.

  • An increased amount of accidents happen when there is NO precipitation. Out of all accidents, ~88% of them occured with NO precipitation and ~11% occurred WITH precipitation.In both cases, most of the accidents had a Severity of 2.


Python libraries used

  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Folium

Resources

About

Exploratory Data Analysis on a traffic accident dataset from the Contiguous United States, focusing on the weather/environmental factors

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published