Skip to content

It is a web application which you aggregate Wildfire data, and make prediction of causes by the help of ML

Notifications You must be signed in to change notification settings

ByUnal/Wildfire-Application

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wildfire-cover

Wildfires

Developed by M.Cihat Unal

Overview

The API provides User Interface for the SQL aggregations and XGBoost model to predict cause of the Wildfires according to given inputs. 1.88 Million US Wildfires is used for the training. This dataset includes several tables, but I used only "Fires" table for both training the model and SQL aggregations.

Model Training Details

identifier learning rate tree method database
XGBoostClassifier 0.5 hist 1.88 Million US Wildfires (Preprocessed)

Data Installation and Preparation

Firstly, create data and logs folder in the directory. Then, you need to download the dataset. Next, put it under the "data" folder. You can see the steps I followed while preparing the data for the training below. Open the terminal in the project's directory first. Then go inside "operation" folder.

  • As I mentioned above, 1.88 Million US Wildfires in SQL format and includes lots of tables. We're going to extract Fires table only. Then, we will convert SQL table to CSV file and save it. For this:
python extract_db_to_csv.py

It will save the DataFrame as "1.88_Million_US_Wildfires.csv" by taking specific columns into account. You can examine the extracted CSV file.

Before training the model, we should extract useful information from the dataset and get rid of the unnecessary things to make our model successful.

  • To prepare the data for training we need to do;
    • Convert columns into numerical format (if they are not already.)
    • Drop unnecessary columns
    • Drop duplicates
    • "DISCOVERY_SIZE" is in Julian Date format. So, convert it to date type(the type we used in every day). Then save in "DATE" column.
    • Divide "DATE" column as "MONTH" and "DAY_OF_WEEK" to increase feature number.

To do aforementioned steps:

python data_preprocessing.py

Lastly, the final DataFrame is ready for the training, and it will be extracted to "wildfire_cleansed.csv". Also, final DataFrame saved at "wildfires.sqlite" to use in aggregations in UI. Datasets can be found in data folder.

Running the API

via Docker

Build the image inside the Dockerfile's directory

docker build -t wildfire .

Then for running the image in local network

docker run --network host --name wildfire-cont wildfire

Finally, you can reach the API from your browser by entering:

http://localhost:5000/

via Python in Terminal

Open the terminal in the project's directory. Install the requirements first.

pip install -r requirements.txt

Then, run the main.py file

python main.py

User Interface

You will encounter with this page when you run the API successfully.

image

Example Usage

Wildfire Cause Prediction

After entering the inputs, click the submit button and see the prediction of Wildfire Cause image

SQL Aggregation

You can make SQL query by using only "Fires" table, and you can see the columns by: Select * From Fires.

Enter the SQL Query in the textbox. image

Then click the search button. Then you will encounter with this kind of page: image

You can examine the results through the pages.

In the end, you return the home page by clicking "Go Back" button.

Train Model

Training can be done by using different parameters by using environment variable.

python train.py --learnin_rate 0.3 --train_size 0.7 --tree_method hist --model_name wildfire.pkl

Inference

You can also use model for inference by giving inputs (all of them are required)

python inference.py --state NM --date 22.07.2008 --latitude 40.8213 --longitude -121.5397 --fire_size 9.0

Examine my work further

You can glance my works with Jupyter Notebook in notebooks folder. Notebooks cover:

  • Examining data in detail
  • EDA (Exploratory Data Analysis)
  • Data Cleaning
  • Correlation Matrix
  • RandomForrest and Decision Tree training
  • Hyperparameter optimization.

Improvement Suggestions

  • SQL database is slow in loading. Therefore, MongoDB can be one of the efficient in terms of latency.
  • Current model has %56.42 accuracy. Total label count 12. It may be too much to make correct prediction. Also, data is imbalanced in terms of labels. Hence, label count can be lowered by defining new labels and distributing existed labels into these labels. For example,
    • natural = ['Lightning']
    • accidental = ['Structure','Fireworks','Powerline','Railroad','Smoking','Children','Campfire','Equipment Use','Debris Burning']
    • malicious = ['Arson']
    • other = ['Missing/Undefined','Miscellaneous']
  • Mlflow can be used to track ML operations.

Citation

  • Short, Karen C. 2017. Spatial wildfire occurrence data for the United States, 1992-2015 [FPAFOD20170508]. 4th Edition. Fort Collins, CO: Forest Service Research Data Archive. https://doi.org/10.2737/RDS-2013-0009.4

About

It is a web application which you aggregate Wildfire data, and make prediction of causes by the help of ML

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published