Disaster Response Pipeline Project

Installation

Clone the repository.
Create a virtual environment.

$ virtualenv --python=python3  ds-project2 --no-site-packages 
$ source ds-project2/bin/activate

Go to the project folder (datascience_project2) and run the following command to install all the dependencies:

$ pip install -r requirements.txt

Project Motivation

For this project was used data from Figure Eight to build a model for an API that classifies disaster messages. With this information I was able to put into practice ETL skills, and the creation of ML Pipelines. This application will help people and organizations during and event of disaster because they could categorize the messages sent by the people and could make a mitigation plan faster.

File Descriptions

app: Folder with html files and code to run the API.
data: Folder with the file to make the preprocessing of the data.
data_analysis: Folder with the Jupyter Notebooks used to make the initial exploration of the data and models.
img: Folder with images of the results.
models: Folder with the file to make, train and evaluate the model.
README.md: File with repository information.
requirements.txt: File with requirements of the project.

File structure

- app
| - template
| |- master.html # main page of web app
| |- go.html # classification result page of web app
|- run.py # Flask file that runs app
- data
|- disaster_categories.csv # data to process
|- disaster_messages.csv # data to process
|- process_data.py
|- InsertDatabaseName.db # database to save clean data to
- data_analysis
|- ETL Pipeline Preparation.ipynb # first processing of the data
|- ML Pipeline Preparation.ipynb # # exploration of models
- img
|- models_accuracy.png
- models
|- train_classifier.py
|- classifier.pkl # saved model
README.md

Instructions

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Go to http://0.0.0.0:3001/

Results

In the step to build the model I tried different models, and they give me the following results:

RandomForest score: 81%
DecisionTrees score: 79%
KNeighborsClassifier score: 26%

You could see more detailed information on the jupyter notebook.

I choose the RandomForest model because it had the highest score, and I tried to optimize this model with the GridSearch. It took more than 10 hours to train and the score descended to 21%. For that reason I decided to choose the model before applying the GridSearch. I couldn't upload the model because the size of the file was around 1Gb.

Some target classes didn't have different values (all values were 0) and that could make that the model couldn't generalize in a better way.

Licensing, Authors, Acknowledgements

Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information here. Otherwise, feel free to use the code here as you would like!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

data

data

data_analysis

data_analysis

img

img

models

models

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

Installation

Project Motivation

File Descriptions

File structure

Instructions

Results

Licensing, Authors, Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
data		data
data_analysis		data_analysis
img		img
models		models
README.md		README.md
requirements.txt		requirements.txt

carogomezt/datascience_project2

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

Installation

Project Motivation

File Descriptions

File structure

Instructions

Results

Licensing, Authors, Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages