Model Deployment

We will visit different steps involved in MLOps pipeline.

Machine Learning Model Operationalization Management - MLOps, as a DevOps extension, establishes effective practices and processes around designing, building, and deploying ML models into production.

In paper titled Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology, the authors introduce a methodology or a process model for development of ML applications called CRoss-Industry Standard Process model for the development of Machine Learning applications with Quality assurance methodology (CRISP-ML(Q)). CRISP-ML(Q) offers ML community a standard process to streamline ML and data science projects making results reproducible. It is designed for development of ML applications where ML model is deployed and maintained as part of product or service.

Source

CRISP-ML(Q) process model consits of 6 phases:

Business & Data Understanding
Data Preparation
Modelling
Evaluation
Deployment
Monitoring and Maintenance

For each phase, the flow chart below explains quality assurance approach in CRISP-ML(Q). In the first step, clear objective for the current phase are defined, followed by taking steps to initiate the task, followed by identifying the risks that might negatively impact the efficiency and success of the ML application (e.g., bias, overfitting, lack of reproducibility, etc.), quality assurance methods to mitigate risks when these risks need to be diminished (e.g., cross-validation, documenting process and results, etc.).

Source

Model Deployment

The ML model deployment includes following tasks :

Define inference hardware and optimize ML model for target hardware
Evaluate model under production condition
Assure user acceptance and usability
Minimize the risks of unforseen errors
Deployment strategy

A wise person on the Internet once said: deploying is easy if you ignore all the hard parts. If you want to deploy a model for your friends to play with, all you have to do is to create an endpoint to your prediction function, push your model to AWS, create an app with Streamlit or Dash. The hard parts include making your model available to millions of users with a latency of milliseconds and 99% uptime, setting up the infrastructure so that the right person can be immediately notified when something went wrong, figuring out what went wrong, and seamlessly deploying the updates to fix what’s wrong. Source by Chip Huyen

Model Serving and Deployment Patterns

Source: https://ml-ops.org/content/three-levels-of-ml-software

Model serving is a way to integrate the ML model in a software system. There are two aspects for deploying ML system in a production environment. First deploying pipeline for automated retraining and second providing endpoint to ingest input data and provide predictions using ML model.

There are 5 popular model serving patterns to put ML model into production

Model-as-Service

Source

Model-as-Dependency

Source

Precompute

Source

Model-on-Demand

Source

Hybrid-Serving

Source

There are 2 popular deployment strategies

Deploying ML models as Docker Containers

Source

Deploying ML Models as Serverless Functions

Source

Model Serving

In this project, our focus will be on different approaches we can serve ML model. MLOps.toys provides a comprehensive survey of different frameworks that exists for Model Serving. The focus of this project would be to explore all 10+ frameworks and many more along with cloud services for serving and testing the endpoint of deployed ML model.

We will start with simple exercise of how to make use of Github Actions for CI/CD. As we go down, we will integrate various technologies such as Github Actions, Docker, PyTest, Linting while testing different ML model serving frameworks visiting best practices.

Makefile : In this exercise, we will automate the task of installing packages, linting, formatting and testing using Makefile.

Technologies : Pytest, Make
Github Actions Makefile: In this exercise, we will automate the task of installing packages, linting, formatting and testing using github actions.

Technologies: Pytest, Make, Github Actions
Github Actions Docker: In the exercise, we will implement the following:
- Containerize a GitHub project by integrating a Dockerfile and automatically registering new containers to a Container Registry.
- Create a simple load test for your application using a load test framework such as locust or loader io and automatically run this test when you push changes to a staging branch
Technologies: Docker, Github Actions, Locust
FastAPI Azure: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on Azure using Azure App Services and Azure DevOps Pipelines.

Technologies: Docker, FastAPI, Continuous Delivery using Azure App Services, Azure DevOps Pipelines
FastAPI GCP: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on GCP using Cloud Run and Cloud Build.

Technologies: Docker, FastAPI, Continuous Delivery using GCP Cloud Run and Cloud Build
FastAPI AWS: In this exercise, we will build a fastapi ML application and deploy it with continuous delivery on AWS using AWS using Elastic Beanstalk and Code Pipeline.

Technologies: Docker, FastAPI, sklearn, Continuous Delivery using Elastic Beanstalk and Code Pipeline
AWS Terraform Deploy: To be implemented
FastAPI GKE: In this project, we will deploy a sentiment analyser model using fastapi on GCP using GKE.
- Containerizing different components of projects
- Writing tests and testing individual modules using pytest
- Using trunk for automatic code checking, formatting and liniting
- Deploying application on GKE
Technologies: Docker, FastAPI, HuggingFace Transformer model, Pytest, Trunk, GKE
FastAPI Kubernetes Monitoring: In this exercise, we will introduce Kubernetes. Using Kubernetes deploy fastapi application and monitor this application using Prometheus and Grafana, following best practices of writing tests and trigger a CI workflow using github actions.

Technologies: Docker, Docker-compose, Pytest, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Kubernetes, Prometheus, Grafana
BentoML Deploy: In this exercise, we will use BentoML library to deploy the sentiment classification model from Hugging Face 🤗 on following services.
Technologies: Docker, Pytest, FastAPI, HuggingFace Transformer model, AWS Lambda, Azure Functions, Kubernetes, BentoML
Cortex Deploy: In this exercise, transformers sentiment classifier fastapi application is deployed using Cortex two different APIs.
- Realtime API
- Async API
Technologies: Docker, Cortex, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Trunk.io linter
Serverless Deploy: In this exercise, hugging face transformers sentiment classifier FastAPI application is deployed using Serverless Framework.
- Serverless Framework
Technologies: Docker, Serverless Framework, FastAPI, HuggingFace Transformer model, Continuous Integration using Github Actions, Trunk.io linter
Bodywork Train and Deploy: This exercise contains a Bodywork project that demonstrates how to run a ML pipeline on Kubernetes, with Bodywork. The example ML pipeline has two stages:
- Run a batch job to train a model.
- Deploy the trained model as service with a REST API.
Technologies: Bodywork, Sklearn, Flask, Kubernetes, Cronjob
KServe Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model. Since MLServer does not provide out-of-the-box support for PyTorch or Transformer models, we will write a custom inference runtime to deploy this model and test the endpoints.

Technologies: Docker, KServe, HuggingFace Transformer model, Pytest, Kubernetes, Istio, Knative, Kind, TorchServe

TorchServe: Deploying hugging face transformer model using torchserve.
MLServer Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model. Since MLServer does not provide out-of-the-box support for PyTorch or Transformer models, we will write a custom inference runtime to deploy this model and test the endpoints.

Technologies: Docker, MLServer, HuggingFace Transformer model
Ray Serve Deploy: In this exercise, we will deploy the sentiment analysis huggingface transformer model using Ray Serve so it can be scaled up and queried over HTTP using two approaches.
- Ray Serve default approach
- Ray Serve with FastAPI
Technologies: Docker, Ray Serve, FastAPI, HuggingFace Transformer model
Seldon core Deploy: In this exercise, we will deploy a simple sklearn iris model using Seldon Core. We will deploy using two approaches and test the endpoints .
- Seldon core default approach
- V2 Inference protocol
Technologies: Docker, Seldon Core, Sklearn model, Kubernetes, Istio, Helm, Kind
Nvidia Triton Deploy: Coming Soon

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
exercises		exercises
.gitmodules		.gitmodules
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

exercises

exercises

.gitmodules

.gitmodules

Readme.md

Readme.md

Repository files navigation

Model Deployment

Model Deployment

Model Serving and Deployment Patterns

Recommended Readings

Model Serving

About

Releases

Packages

Languages

dudeperf3ct/Model-Deployment

Folders and files

Latest commit

History

Repository files navigation

Model Deployment

Model Deployment

Model Serving and Deployment Patterns

Recommended Readings

Model Serving

About

Topics

Resources

Stars

Watchers

Forks

Languages