Flask Salary Predictor

In this project, we are going to use a random forest algorithm (or any other preferred algorithm) from scikit-learn library to help predict the salary based on your years of experience. We will use Flask as it is a very light web framework to handle the POST requests.

Project description video on YouTube

The dataset is from Kaggle Years of experience and Salary dataset

Architecture

The above diagram is the cloud architecture of our salary prediction system. Inside the cloud diagram we have our google cloud services listed: Storage Bucket, Compute Instance, Cloud Run, and Cloud Build. The storage bucket stores the kaggle salary dataset. The compute instances are where code files lie within the cloud platform. We update our github repo by merging feature branch into master branch or change part of the code of the website layout. This will automatically trigger Cloud Build to deploy the code updates into the production flask container.

Model

model.py trains and saves the model to disk. model.pkl is the model compressed in pickle format.

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pickle
import requests
import json

# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

# Train the model

# random forest model (or any other preferred algorithm)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)

regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)

# Saving model using pickle
pickle.dump(regressor, open('model.pkl','wb'))

# Loading model to compare the results
model = pickle.load( open('model.pkl','rb'))
print(model.predict([[1.8]]))

App

main.py has the main function and contains all the required functions for the flask app. In the code, we have created the instance of the Flask() and loaded the model. model.predict() method takes input from the json request and converts it into 2D numpy array. The results are stored and returned into the variable named output. Finally, we used port 8080 and have set debug=True to enable debugging when necessary.

import numpy as np
from flask import Flask, request, jsonify, render_template
import pickle

app = Flask(__name__)
# Load model from model.pkl
model = pickle.load(open('model.pkl', 'rb'))

# Homepage route
@app.route('/')
def home():
    return render_template('index.html')

@app.route('/predict',methods=['POST'])
def predict():
    '''
    For rendering results on HTML GUI
    '''
    int_features = [int(x) for x in request.form.values()]
    final_features = [np.array(int_features)]
    prediction = model.predict(final_features)

    output = round(prediction[0], 2)

    return render_template('index.html', prediction_text='Salary is {}'.format(output))

if __name__ == "__main__":
    app.run(host='127.0.0.1', port=8080, debug=True)

How to Run the App

Clone the repo

git clone https://github.com/YisongZou/IDS721-Final-Project.git

Setup - Install the required packages

make all

Train the model

python3 model.py

Run the application

python3 main.py

Set up Google Cloud Project

Step 1: Create new GCP project

Step 2: Check to see if the console is pointing to the correct project

gcloud projects describe $GOOGLE_CLOUD_PROJECT

Step 3: Set working project if not correct

gcloud config set project $GOOGLE_CLOUD_PROJECT

Step 4: Follow Step 1-4 to set up github repo and test Flask application

Step 5: In the root project of the folder, replace PROJECT-ID below with the correct GCP project-id, and build the google cloud containerized flask application

gcloud builds submit --tag gcr.io/<PROJECT-ID>/app

Step 6: In the root folder of the project, replace PROJECT-ID below with the correct GCP project-id, and run the flask application

gcloud run deploy --image gcr.io/<PROJECT-ID>/app --platform managed

Step 7: Paste the URL link provided on the console, in a preferred browser to run the application

Set up Continuous Deployment (CD)

Create a new build trigger
Specify github repository
Deployment specifications already available in: cloudbuild.yaml file
Push a simple change; Triggered on Master branch
View progress in build triggers page

Load Testing

Website link with Continuous Delivery enabled. https://final-project-311720.uc.r.appspot.com/

Loadtest code repo: https://github.com/YisongZou/IDS721-Finalproject-Locust-load-test

Photo shows the project scaling to 1K+ requests per second

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
.theia		.theia
static/css		static/css
templates		templates
.DS_Store		.DS_Store
.gcloudignore		.gcloudignore
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
Salary_Data.csv		Salary_Data.csv
Screen Shot 2021-04-22 at 1.42.12 AM.png		Screen Shot 2021-04-22 at 1.42.12 AM.png
US_Gold_Medals.csv		US_Gold_Medals.csv
app.yaml		app.yaml
cloudbuild.yaml		cloudbuild.yaml
cloudbuild.yaml~		cloudbuild.yaml~
main.py		main.py
model.pkl		model.pkl
model.py		model.py
requirements.txt		requirements.txt

YisongZou/Flask-Salary-Predictor-with-Random-Forest-Algorithm

Folders and files

Latest commit

History

Repository files navigation

Flask Salary Predictor

Architecture

Model

App

How to Run the App

Set up Google Cloud Project

Set up Continuous Deployment (CD)

Load Testing

About

Topics

Resources

Stars

Watchers

Forks

Languages