🏥Continuous Training and Deployment Pipeline for Medical MNIST Image Classification with Airflow & MLflow

💡Description:

This project embodies a resilient pipeline harmonizing continuous training and deployment principles, guided by MLOps methodologies. Leveraging a Streamlit frontend, users seamlessly upload medical MNIST images, eliciting real-time predictions derived from MLFlow model registry models. Additionally, an error correction mechanism empowers users to rectify misclassifications, fostering model refinement. Orchestrated by an Airflow DAG, the pipeline automates weekly data retrieval from AWS S3 and updates the MLFlow model registry with retrained models, manifesting an iterative approach to model improvement and deployment. Furthermore, the project operates seamlessly on AWS EC2 instances, ensuring scalability and reliability.

⭐Pipeline Architecture:

User Interaction:
- The pipeline begins with a Streamlit frontend, providing users with an intuitive interface to upload medical MNIST images for classification.
Model Inference:
- Upon image upload, the pipeline fetches the latest trained model from the MLFlow model registry. This model is then used to make predictions on the uploaded images.
Error Correction Mechanism:
- In case the prediction is incorrect, users have the option to provide the correct class label. If provided, the image along with the corrected label is stored in AWS S3 for future reference.
Airflow DAG:
- Scheduled to run weekly, an Airflow Directed Acyclic Graph (DAG) automates the process of fetching new data from AWS S3 and retrieving the latest model from the MLFlow model registry.
Model Retraining:
- Once the new data and model are fetched, the pipeline initiates the process of retraining the model on the updated dataset. This ensures that the model remains up-to-date and capable of making accurate predictions.
Model Registry Update:
- Upon successful retraining, the newly trained model is uploaded to the MLFlow model registry, replacing the previous version. This ensures that the latest model is readily available for inference in subsequent interactions with the Streamlit frontend.
Continuous Improvement:
- The iterative nature of the pipeline ensures continuous improvement in model performance over time as it adapts to new data and updates.

🔮Features:

Streamlit Frontend: Users can upload medical MNIST images for classification.
MLFlow Integration: Models are fetched from the MLFlow model registry to make predictions.
Correction Mechanism: If the prediction is incorrect, users can provide the correct class label, and the image with the corrected class label will be uploaded to AWS S3.
Airflow DAG: Scheduled to run weekly, Airflow DAG fetches new data from AWS and the latest model from the MLFlow model registry, retrains the model on new data, and uploads the new model to the MLFlow registry.
AWS S3 Integration: Images with corrected labels are stored in AWS S3 for future reference and analysis.

🔨Components:

Streamlit Frontend: Handles user interaction and image uploads.
MLFlow Model Registry: Stores trained models and facilitates model deployment.
Airflow DAG: Automates data fetching, model retraining, and model deployment.
AWS S3: Stores images with corrected labels for future use.

🧪MLflow Experiment Tracking

🪄MLflow Model Registry

🪭Airflow

🪄Streamlit Frontend

Streamlit \

🪣AWS

🔨Setup Instructions:

Clone the repository from GitHub.
Install the necessary dependencies listed in the requirements.txt file.
Configure AWS credentials for S3 access.
Create env file to store MLFlow URI and Dags hub credentials.
Configure Airflow with appropriate connections and DAG configurations.
Run the Streamlit application.
Ensure Airflow scheduler is running to trigger DAG executions.

🚀Configuration Instructions

To run the project, please follow these configuration steps:

Create a config.py File: Create a config.py file in the root directory of the project with the following content:
```
config = {
    "aws_key": "YOUR_AWS_KEY",
    "aws_secret": "YOUR_AWS_SECRET",
}
```

Set Environment Variables: Set the following environment variables in your environment:

MLFLOW_TRACKING_URI = https://dagshub.com/JatinSingh28/Medical-Image-Classification.mlflow
MLFLOW_TRACKING_USERNAME = JatinSingh28
MLFLOW_TRACKING_PASSWORD = YOUR_MLFLOW_PASSWORD
DAGSHUB_USER_TOKEN = YOUR_DAGSHUB_USER_TOKEN

Replace YOUR_AWS_KEY, YOUR_AWS_SECRET, YOUR_MLFLOW_PASSWORD, and YOUR_DAGSHUB_USER_TOKEN with your actual credentials.

These configurations are necessary for proper functioning of the project.

🧑‍🔬Usage:

Access the Streamlit frontend through the provided URL.
Upload medical MNIST images for classification.
View predictions and correct any inaccuracies if necessary.
The Airflow DAG will automatically fetch new data and retrain the model weekly.
Corrected images are stored in AWS S3 for future analysis.

💌Authors:

This data pipeline is brought to you by Jatin Singh Sagoi. If you have questions, suggestions, or feedback, please don't hesitate to reach out at contact.sagoisinghjatin9951@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
ResNetDir		ResNetDir
dags		dags
img		img
vit		vit
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
ResNet.py		ResNet.py
aws_controls.py		aws_controls.py
requirements.txt		requirements.txt
streamlit.py		streamlit.py
uploaded_img.jpeg		uploaded_img.jpeg

License

JatinSingh28/Medical-MNIST-MLOPs-CT-CD

Folders and files

Latest commit

History

Repository files navigation

🏥Continuous Training and Deployment Pipeline for Medical MNIST Image Classification with Airflow & MLflow

💡Description:

⭐Pipeline Architecture:

🔮Features:

🔨Components:

🧪MLflow Experiment Tracking

🪄MLflow Model Registry

🪭Airflow

🪄Streamlit Frontend

🪣AWS

🔨Setup Instructions:

🚀Configuration Instructions

🧑‍🔬Usage:

💌Authors:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages