This end-to-end machine learning project is focused on predicting medical insurance price using regression. Data was collected from Kaggle and cleaned for Exploratory Data Analysis using Statistical Analysis and Feature Engineering. A Custom Transformer was also built for Feature Engineering. Transformation pipelines were implemented for convenient preprocessing of existing and new data. Machine Learning techniques such as Linear Regression, Polynomial Regression, Decision Tree Regression, Support Vector Regression and Random Forest Regression were applied for creation of models and cross-validation was used to evaluate their performance. Hyperparameter Tuning was applied using GridSearchCV to improve performance of some models. The Random Forest classification model was deployed using Flask on Render Cloud Hosting.
- Primary Objective
- Results
- Installation
- Usage
- Contributing
- Credits
- License
- Contact
To develop regression models that predict the medical insurance cost for an individual based on their personal information like age, sex, BMI, number of children, region and their lifestyle habits such as smoking using the dataset available on Kaggle. R^2 score will be used to evaluate model performance and the best performing model will be deployed for educational purposes.
Model | Trainset accuracy | Testset accuracy | Inference |
---|---|---|---|
poly_reg | 0.8419 | 0.8665 | Good performance |
tree_reg | 0.8670 | 0.8641 | Good performance |
forest_reg | 0.8739 | 0.8737 | Good performance |
Since Random Forest Classification model achieved the highest R^2 score (0.8737) on test set without any signs of overfitting on the training data after Hyperparameter Tuning, it was chosen for deployment.
Prerequisites:
- Anaconda Python Distribution
- python 3.9.13
Note: The steps below for installing packages involve 'requirements.txt' file. This file contains only those packages that were necessary for deployment of the flask app and therefore doesn't include all the packages that were used for the development of the project.
- Install Conda: If you do not have Conda installed on your system, you can download and install the appropriate version for your operating sytem from the official Conda Website (https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html).
- Clone the repository: To clone this repository on your local machine, open Terminal or Git Bash CLI (for Windows), navigate to a folder where you want to clone the repository, type this command
git clone https://github.com/shre-db/Medical-Insurance-Price-Prediction.git
and press Enter. The repository will be cloned to your local machine. - Create an environment: To avoid conflicts between packages, create a new environment. You can create one using the following command:
conda create -n ENVNAME python=3.9.16
. ReplaceENVNAME
with the name of your choice, for example:medi-dep
,medi-dev
. - Activate the environment: Once you have created the environment, you need to activate it to start using it. You can activate the environment using the following command:
conda activate ENVNAME
. - Install packages: You can now install the required packages in the environment using the either of the following commands:
conda install --yes --file requirements.txt
orconda install --file requirements.txt
. The former automatically answers "yes" to all prompts during installation, while the latter requires user to manually confirm each installation prompt. If you're on a windows computer, you may have issues while running the above command because of gunicorn package. Since gunicorn is not needed for running an app locally, I recommend removing it from requirements.txt file before running the command mentioned earlier in this step. - Deactivate the environment: Once you are done working with the environment, you can deactivate using
conda deactivate
and then close the prompt usingexit
. That's it! You have now installed the packages using Conda.
You can access the deployed project by following the link: https://medical-insurance-price-prediction.onrender.com/. Alternatively, after installation you can run the project locally by following the steps below:
- Open Anaconda prompt.
- Navigate to the project folder.
- Run this command:
python main.py
. - Copy the url (http://localhost:5000 or similar) generated in the prompt.
- Open a web browser and paste the url to access the web application.
Thank you for your interest in this project! At this time we are not accepting contribution from external collaborators. If you have any feedback or suggestions, please feel free to create an issue or contact us directly.
- Data for this project was collected from Kaggle.
- Cover Image in this project was Designed by Vilmosvarga / Freepik.
- Flowcharts in this project were made with the help of https://app.diagrams.net/.
This project is licensed under the MIT License - see the LICENSE.txt file for details.
- Name: Shreyas
- Email: shreyasdb99@gmail.com
- GitHub: shre-db
- Instagram: shryzium