Skip to content

sughoshdeshpande7/Flight_Price_Prediction

Repository files navigation

Flight Price Predictor: Project Overview

ML Project

  • Built a model to predict flight prices depending on various user inputs and deployed it on flask
  • Trained 10 different models to get the best performing model and optimized it for even better results
  • R2 score of the final trained model is 80.97%

Model Web App

Table of Contents

Code and Resources Used

Python Badge GitHub Notebook Article
Libraries Badge

To install the required packages and libraries for this project, run this command in the project directory after cloning the repository:

pip install -r requirements.txt

Dataset: Download the entire dataset from Data.xls

Directory Tree

├── Dataset and Images 
│   ├── Data_Train.xlsx
│   ├── model.png
│   ├── plane.jpeg
├── Templates
│   ├── final_trained_model.pkl
│   ├── home.html
│   ├── plane.jpeg
├── Flight Price Prediction.ipynb
├── README.md
├── app.py
├── final_trained_model.pkl
├── requirements.txt

Data Preprocessing

Following changes were made to the data to make it usable for a model:

  • Column with Null Values was removed.
  • Data values for 'Delhi' and 'New Delhi' were combined.
  • Date and Duration which was present as string values was converted into timestamp format.

EDA

Following analysis were made related to dataset:

  • What time of the day most flights take off
  • If the duration of flights affect its price
  • If total number of stops affect the flight price
  • Ticket Fare Distribution by Airline
  • Median ticket fare by Airline

Feature Engineering

  • One-Hot Encoding and Label Encoding was done to convert categorical dataset into vector
  • Target Guided Encoding was done to avoid curse of dimensionality during feature encoding
  • A one hot encoder library was used whereas a label encoder was made manually

Data Cleaning

  • The unwanted columns for the model were removed
  • Outlier range and the outliers were detected using IQR method
  • The outliers were replaced with the median of the remaining data values

Feature Selection

  • Mutual Information Regression was used to identify dependency between the variables to select the best features for the model
  • As all the features showed a good dependency with the target variable no specific feature was selected

Model Building

The data was split into 75% training and 25 % test set. An automated ML model was made so that mutiple models can be evaluated in a single code

10 different models were tried and evaluated based on their metrics:

  • Random Forest Regression : R2 score = 79.69%
  • Decision Tree Regressor : R2 score = 64.87%
  • Linear Regression : R2 score = 60.77%
  • Ridge Method : R2 score = 60.77%
  • Lasso Method : R2 score = 60.77%
  • ElasticNet Method : R2 score = 57.27%
  • Support Regression : R2 score = 2.62%
  • K-NN : R2 score = 64.77%
  • MLP Regressor : R2 score = 56.71%
  • Huber Regressor : R2 score = 59.4%

Hyper-parameter Tuning

Clearly Random Forest outperforms the other methods but its performance can be still improved. RandomizedSearchCV was used to find the hyper-parameters and optimize the model upto an R2 score of 80.97%

Deployment

A Final Trained model was built on Random Forest regression and deployed on flask as a web app. The Final model can be downloaded from final_trained_model.pkl

About

Repo for Flight Price Predictor Model and Web App

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages