author: Sai Chimata date: December 12, 2021
The goal of this project is to build an automated model training, scoring and monitoring system.
- ingestion.py - To ingest multiple datasets into a dataframe.
- training.py - To train the final data with a logistic regression model.
- scoring.py - To generate an F1 Score for the trained logistic regression model against testdata.csv under testdata folder.
- deployment.py - To record the production model pkl file and production model metrics including model scores, confusion matrix plot.
- diagnostics.py - To generate feature statistic , prediction summary, check missing values, and to calculate the execution time of running processes
- reporting.py - To report model performance by generating confusion matrix
- app.py and apicalls.py - To create an api and to verify real-time responses.
- fullprocess.py - For future model maintenance and monitoring, new data processes and model drift
- Run ingestion.py
- Run training.py
- Run scoring.py
- Run deployment.py
- Run fullprocess.py
Project is set up for periodic model re-training using cronjob every 10 minutes. If a model drift is detected on new data, retraining, redeploying, diagnostics, and reporting will run accordingly.