Skip to content

General Assembly Data Science Immersive (GA-DSI) Group Project - A machine learning model to predict the likelihood of a California wildfire based on historical weather and wildfire data.

License

Notifications You must be signed in to change notification settings

osyounis/cali_wildfire_likelihood_predictor

Repository files navigation

Predicting California Wildfire Likelihood Utilizing Historical Weather and Fire Data

In Collaboration With


Problem Statement

To help the California Department of Forestry and Fire Protection allocate resources, can we predict the likelihood of fires utilizing historical weather and wildfire data?


Executive Summary

Machine learning algorithms are some of the most powerful and interesting tools in tech right now and their applications are nearly limitless. From predicting stock prices to determining how likely someone is to have cancer, we can use them to help us anticipate the future. In this project, our team sought to predict whether there would be a wildfire in locations of California using historical fire and weather data between 2008 and 2020. We know weather conditions are highly colinear in that every variable changes as a direct result of another. Fire likelihood is also a direct result of weather conditions in that heat usually leads to dryness and dryness leads to more ignition sources resulting in an elevated fire risk. Because of these correlations we wanted to see if we could predict whether I fire is likely to occur somewhere using this historical data.

1. Sample details:

  • 10,988 total entries from every month between 2008 - 2020
  • Multi-indexing for instances of multiple fires at different locations occurring in the same month and year
  • 4,279 occurrences of fire within our combined dataset
  • 6,709 entries without record of a fire starting

2. Sources:

3. Data Details:

Feature Type Description
date object The month and year of when the fire took place.
county object The county the fire started in.
maxtempF float The average maximum temperature of that month in °F.
mintempF float The average min temperature of that month in °F.
avgtempF float The monthly average, in °F, of the daily average temperature for that month .
totalSnow float The total snow for that month.
humid float The average humidity for that month.
wind float The average wind speed for that month.
precip float The total precipitation for that month.
q_avgtempF float The quarterly average temperature in °F.
q_avghumid float The quarterly average humidity.
q_sumprecip float The quarterly total precipitation.
sunHour float The average hours of sun for that month.
FIRE_NAME object The name of the fire.
CAUSE float The cause of the fire.
lat float The latitude coordinate of the center of the county in which the fire was located.
long float The longitude coordinate of the center of the county in which the fire was located.
GIS_ACRES float The total number of acres burned.

4. Target:

Four classification models were used: Logistic Regression, KNN Classifier, Random Forest Classifier, and Voting Classifier utilizing a KNN Classifier, Random Forest Classifier, and Ada Boost Classifier.

5. Model Performance:

A total of four models were fit using many different parameters. Listed below are the two best metrics for each model, further metrics are used for evaluation in the modeling notebook.

Model Type Metric Score
Logistic Regression Accuracy 76%
Logistic Regression Precision 67%
Random Forest Classifier Accuracy 88%
Random Forest Classifier Precision 84%
Voting Classifier Accuracy 87%
Voting Classifier Recall 86%

Conclusion & Recommendations

Our models were able to predict whether a fire would occur with a surprising level of accuracy. Out best overall model achieved 88% accuracy while our second best was close behind with a great recall score of 86%. Due to the danger of wild fires growing out of control and the threat of climate change looming, we recommend the use of our Random Forest model as it positively predicts more fires. However, it also predicts more false positives meaning there's a higher chance for predicting fires that don't actually happen. This is where you, the California Department of Forestry and Fire Protection, must prioritize what is most important to you and whether or not you have enough resources and man power to use the above model.

If resources are particularly thin the Voting Classifier may be best as it resulted in fewer false positives which would minimize wasted time and resources at the expense of possibly missing out on fires somewhere else.


About

General Assembly Data Science Immersive (GA-DSI) Group Project - A machine learning model to predict the likelihood of a California wildfire based on historical weather and wildfire data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published