GitHub - coolmunzi/end_to_end_ml-fitbit_calorie_counter: This is an End to end Machine Learning project based on Flask APIs covering model training and prediction pipelines for finding calorie from fitbit band's data.

About the project

This is an End to End ML project to determine Calorie from fitbit health data. The project covers training and prediction pipelines. The project involves a regression model to predict the calories burnt based on the given indicators in the training data.

Dataset

Dataset is taken from fitbit dataset from kaggle. The data contains following features:

1. Id: The customer ID
2. ActivityDate: The date for which the activity is getting tracked.
3. TotalSteps:  Total Steps taken on that day.
4. TotalDistance: Total distance covered.
5. TrackerDistance: Distance as per the tracker
6. LoggedActivitiesDistance: Logged 
7. VeryActiveDistance: The distance for which the user was the most active. 
8. ModeratelyActiveDistance: The distance for which the user was moderately active.
9. LightActiveDistance: The distance for which the user was the least active.
10. SedentaryActiveDistance: The distance for which the user was almost inactive.
11. VeryActiveMinutes: The number of minutes for the most activity.
12. FairlyActiveMinutes: The number of minutes for moderately activity.
13. LightlyActiveMinutes: The number of minutes for the least activity
14. SedentaryMinutes: The number of minutes for almost no activity
15. Calories(Target): The calories burnt.

High Level Project Workflow

Training Pipeline Workflow

Data Capture: Data is captured from the files inside_Training_Batch_Files_ directory
Data Validation: rawValidation.py inside Training_Raw_files_Validated validates the data captured based on schema defined in schema_training.json. Data which satisfies the schema conditions is then saved in Training_Raw_files_validated/Good_Raw and the data which violates the schema is saved in Training_Raw_files_validated/Bad_Raw directory.
Data Transformation: DataTransform.py inside DataTransform_Training performs transformations on data in Training_Raw_files_validated/Good_Raw like adding double quotes to string values in columns
Data insertion to Database: DataTypeValidation.py inside DataTypeValidation_Insertion_Training directory, saves the transformed data in Training.db inside Training_Database
Export data from DB to CSV format: DataTypeValidation.py takes data from Training.db and creates InputFile.csv inside Training_FileFromDB which will be later used for model training
Data Pre-processing: preprocessing.py inside data_preprocessing perform necessary pre-processing steps like removing unnecessary columns, separate the label feature, replace null values using KNN Imputer, encode Categorical values etc.
Data Clustering: The project is based on customized ML approach where using KNN algorithm clusters from data is created. ML algorithm will be applied later on the data in these clusters to prevent overfitting in the model.
Model Selection & Hyper parameter tuning: tuner.py inside best_model_finder performs Grid Search CV for hyperparameter optimization on XGBoost Regressor and Random Forest Regressor to select the best model with best hyper parameters which is then saved at models directory.

Prediction Pipeline Workflow

Data Capture: Data is captured from the files inside_Prediction_Batch_Files_ directory
Data Validation: rawValidation.py inside Prediction_Raw_files_Validated validates the data captured based on schema defined in schema_prediction.json. Data which satisfies the schema conditions is then saved in Prediction_Raw_files_validated/Good_Raw and the data which violates the schema is saved in Prediction_Raw_files_validated/Bad_Raw directory.
Data Transformation: DataTransform.py inside DataTransform_Prediction performs transformations on data in Prediction_Raw_files_validated/Good_Raw like adding double quotes to string values in columns
Data insertion to Database: DataTypeValidation.py inside DataTypeValidation_Insertion_Prediction directory, saves the transformed data in Prediction.db inside Prediction_Database
Export data from DB to CSV format: DataTypeValidation.py takes data from Prediction.db and creates InputFile.csv inside Prediction_FileFromDB which will be later used for model prediction
Data Pre-processing: preprocessing.py inside data_preprocessing perform necessary pre-processing steps like removing unnecessary columns, separate the label feature, replace null values using KNN Imputer, encode Categorical values etc.
Data Cluster identification: Prediction pipeline in predictFromModel.py check in which cluster the given data is present.
Model Prediction: Prediction pipeline in predictFromModel.py predicts the calorie value based on model for cluster in which given data is present.

Technolgy stack used

Flask - Web framework to develop APIs
Scikit-learn - To create Machine Learning models for KNN and Random Forest algorithms
XGBoost - To create XGBoost based model for calorie prediction
SQLite - Database to store the validated Raw data and data submitted for prediction.
Python 3.6 - As a programming language

Prerequisites

Create a Python 3.6 environment, activate the same and install the necessary dependencies from requirements.txt file. conda create -n fitbit_calorie_counter python=3.6 pip install -r requirements.txt

Installation & Usage

Clone the repo using following command

 $ git clone https://github.com/coolmunzi/restaurant_bot.git

Run the Flask app, by executing main.py file $ python main.py
To train the models, go to any API testing tool like Postman. Create a POST request with URL as '127.0.0.1:5000/train' and JSON body as {"folderPath" : "Training_Batch_Files"}
Once the model is trained, you can perform batch prediction from web browser by opening 'http://127.0.0.1:5000/' and pasting the absolute folder path of Prediction_Batch_files folder (which is inside the project directory)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
DataTransform_Training		DataTransform_Training
DataTransformation_Prediction		DataTransformation_Prediction
DataTypeValidation_Insertion_Prediction		DataTypeValidation_Insertion_Prediction
DataTypeValidation_Insertion_Training		DataTypeValidation_Insertion_Training
EDA		EDA
EncoderPickle		EncoderPickle
PredictionArchivedBadData		PredictionArchivedBadData
Prediction_Batch_files		Prediction_Batch_files
Prediction_Database		Prediction_Database
Prediction_FileFromDB		Prediction_FileFromDB
Prediction_Logs		Prediction_Logs
Prediction_Output_File		Prediction_Output_File
Prediction_Raw_Data_Validation		Prediction_Raw_Data_Validation
TrainingArchiveBadData		TrainingArchiveBadData
Training_Batch_Files		Training_Batch_Files
Training_Database		Training_Database
Training_FileFromDB		Training_FileFromDB
Training_Logs		Training_Logs
Training_Raw_data_validation		Training_Raw_data_validation
application_logging		application_logging
best_model_finder		best_model_finder
data_ingestion		data_ingestion
data_preprocessing		data_preprocessing
demo_images		demo_images
file_operations		file_operations
models		models
preprocessing_data		preprocessing_data
templates		templates
Procfile		Procfile
README.md		README.md
flask_monitoringdashboard.db		flask_monitoringdashboard.db
main.py		main.py
manifest.yml		manifest.yml
predictFromModel.py		predictFromModel.py
prediction_Validation_Insertion.py		prediction_Validation_Insertion.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt
schema_prediction.json		schema_prediction.json
schema_training.json		schema_training.json
trainingModel.py		trainingModel.py
training_Validation_Insertion.py		training_Validation_Insertion.py

coolmunzi/end_to_end_ml-fitbit_calorie_counter

Folders and files

Latest commit

History

Repository files navigation

About the project

Dataset

High Level Project Workflow

Training Pipeline Workflow

Prediction Pipeline Workflow

Technolgy stack used

Prerequisites

Installation & Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages