Devfest MLOps

Sum-up

Welcome to this lab dealing with MLOps problematics.

In this lab we will manipulate data about wine.

We will train a model trying to predict if a wine will be good or not, but more importantly we will integrate our data science pipeline into an Ops pipeline, allowing us to train, deploy, and monitor a model in production letting us concentrate on the important task of data science, improving the performance of the model.

So follow us, while we show you how to get MLOps tasks out of the way !!

Objectives

The objective of this lab is to run a training pipeline in vertexAI, create an endpoint, deploy a model and deploy a monitoring job. We use tensorflow extended (tfx) as a framework to achieve our goal.

During this lab, you will design components for vertexAI integrating custom tensorflow code. This will allow you to later be able to handle training, deploying and monitoring models on your own, on a scalable and highly available platform.

Pre requisites

Comprehension of TensorFlow coding
Comprehension of Kubernetes principles
Notions of Kubeflow
Notions of VertexAI
Notions of GCP (GCS, Bigquery, Cloud Shell)
Notions of Docker
Understanding of ML problematics
Knowledge of MLOps problematics
Understanding of data project management

Technical details

Duration : 2H
Requirements : A computer, an internet connexion and a gmail e-mail

Workshop

The starter notebook wine_quality_notebook.ipynb is located at the root.

Two datasets corresponding to white and red wines are located in par-devfest-sfeir.wine_quality BigQuery dataset.

Setting things up

The gcp platform

Go to gcp front page in a browser
Access the devfest-sfeir project
Start a cloudshell
Create a GCS bucket: gcloud storage buckets create gs://<USERNAME>_mlops_lab --location='europe-west1'

The dev environment

We strongly suggest, for this lab, that you use a basic Cloud Shell environment as some tools like python, pip and gcloud CLI will already be present in the environment. Later on you can always explore the repo on your own with a locally built environment.

Clone the following repo: git clone *repo_address*

Run the following installs:

pip install --upgrade pip
pip install --upgrade "tfx[kfp]<2"

The pipeline image

/!\ : Due to time constraints, the docker image part is purely informative, you have the dockerfile, and the commands to build the image, but the image that we are going to use for this lab has already been built.

To start :

cd devfest_mlops
PROJECT_ID=par-devfest-sfeir
Build a custom docker image: docker build -f docker/dockerfile_monitoring.dev . -t europe-west1-docker.pkg.dev/$PROJECT_ID/devfest-2022/tfx_augm:1.9.1
Push it to artefact registry: docker push europe-west1-docker.pkg.dev/$PROJECT_ID/devfest-2022/tfx_augm:1.9.1

Getting started

The repository you've just cloned should contain the following files:

|_ docker
|   dockerfile_monitoring.dev
|_ src
| |_codelab
| | |  create_pipeline.py
| | |  main.py
| | |  trainer.py
| | |  transformer.py
| | |_ monitorer_component
| |    __init__.py
| |    component.py
| |    executor.py
| |_solutions
|   |  create_pipeline.py
|   |  main.py
|   |  trainer.py
|   |  transformer.py
|   |_ monitorer_component
|       __init__.py
|       component.py
|       executor.py
|
|_ wine_quality_notebook.ipynb

Structure of the lab

This lab is divided into codelab and solutions. Solutions contain a complete version of code that can be executed. CodeLab contains a couple of TODOs with links to the necessary documentation for you to complete ;-)

Place yourself int devfest_mlops/src/codelab directory: cd devfest_mlops/src/codelab before starting

They can be found in :

create_pipeline.py

By completing this TODOs you will understand how to link components between each other and how the flow of the pipeline is being built.
monitorer/component.py

We have mostly used predefined components, but monitorer_component is a custom defined component. In this component we need to define the interface of this component by completing the MonitorerComponentSpec class from component.py The Executor inside the component, and the component class have already been defined. This part matters less as our custom docker image already embarks the solutions.monitorer_component allowing you to have a fully running pipeline even if you didn't finish this part.

Nevertheless, don't be discouraged from exploring the rest of the code as well!

Running things

At any point you can launch the pipeline with the following command:

python main.py --google_cloud_project=par-devfest-sfeir --google_cloud_region=europe-west1 --dataset_id=wine_quality --wine_table=white_wine --gcs_bucket=<USERNAME>_mlops_lab --username=<USERNAME>

You can then inspect the logs and see your progresses and errors

At the end of logs appearing in your terminal, the command will generate a link to the GCP platform where you will be able to follow your pipeline run interactively.

You can then inspect the objects, further explore the logs and see your progress and errors.

/!\ : Once the monitoring job has been created, trying to re-launch the pipeline might throw an error at the last step.

Endpoint Predictions

It's possible to send a prediction request to a Vertex Endpoint either using a client or a curl command.

The tests can be performed either from Cloud Shell or from a Colab Notebook.

The default serving signature requires a TFRecord, so we will use a python client to send a request.

First, you need to install the necessary packages :

pip install --upgrade pip
pip install --upgrade "tfx[kfp]<2"

Querying from Cloud Shell

If we want to specify a signature with the request, we need to use curl. Signature specification in python library is currently not supported.

Write the request data manually or programmatically to a json file first, using python interpreter:

import json
data = {
  "instances" : [{
      "alcohol": [13.8],
      "chlorides": [0.036],
      "citric_acid": [0.0],
      "density": [0.98981],
      "fixed_acidity": [4.7],
      "free_sulfur_dioxide": [23.0],
      "ph": [3.53],
      "residual_sugar": [3.4],
      "sulphates": [0.92],
      "total_sulfur_dioxide": [134.0],
      "volatile_acidity": [0.785]
    }],
  "signature_name": "serving_raw"
}
with open("input.json", "w") as f:
  f.write(json.dumps(data))

And send a POST request from the Cloud Shell:

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://europe-west1-aiplatform.googleapis.com/v1/projects/par-devfest-sfeir/locations/europe-west1/endpoints/<TO DEFINE>:rawPredict \
-d "@input.json"

Querying with python

If you request a prediction from Colab Notebook, you need to authorize the connection :

import sys
if 'google.colab' in sys.modules:
  from google.colab import auth
  auth.authenticate_user()

Then set the endpoint parameters:

ENDPOINT_ID='<TO DEFINE>'
GOOGLE_CLOUD_PROJECT = 'par-devfest-sfeir'
GOOGLE_CLOUD_REGION = 'europe-west1'

Define the client and the endpoint:

from google.cloud import aiplatform
import base64
import tensorflow as tf

client_options = {
    'api_endpoint': GOOGLE_CLOUD_REGION + '-aiplatform.googleapis.com'
    }

client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)

endpoint = client.endpoint_path(
    project=GOOGLE_CLOUD_PROJECT,
    location=GOOGLE_CLOUD_REGION,
    endpoint=ENDPOINT_ID,
)

Define utility functions, that are going to convert the request data to an endpoint consumable format:

def _float_feature(value):
  """Returns a float_list from a float / double."""
  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def serialize_example(alcohol, chlorides, citric_acid, density, fixed_acidity, free_sulfur_dioxide,
                      ph, residual_sugar, sulphates, total_sulfur_dioxide, volatile_acidity):
  """Creates a tf.train.Example message.  """

  # create a dictionary mapping the feature name to the tf.train.Example-compatible
  # data type
  feature = {
      'alcohol': _float_feature(alcohol),
      'chlorides': _float_feature(chlorides),
      'citric_acid': _float_feature(citric_acid),
      'density': _float_feature(density),
      'fixed_acidity': _float_feature(fixed_acidity),
      'free_sulfur_dioxide': _float_feature(free_sulfur_dioxide),
      'ph': _float_feature(ph),
      'residual_sugar': _float_feature(residual_sugar),
      'sulphates': _float_feature(sulphates),
      'total_sulfur_dioxide': _float_feature(total_sulfur_dioxide),
      'volatile_acidity': _float_feature(volatile_acidity),
  }

  # create a features message using tf.train.Example
  example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
  return example_proto.SerializeToString()

Finally, send a request:

instances = [{
      "examples":{'b64':
      base64.b64encode(serialize_example(13.8, 0.036, 0.0, 0.98981, 4.7,
                                         23.0, 3.53, 3.4, 0.92, 134.0, 0.785)).decode()}
      }]
response = client.predict(endpoint=endpoint, instances=instances)

Conclusions

Congratulations !!

You have now finished this lab, we encourage you to explore the objects generated by your pipelines:

Click on the boxes and use the option expand artifacts to get sense of the steps, objects and logs available after the run
The artifacts can be found on GCS
The models and endpoints can be seen in some Vertex AI pages
In the pipelines page you can see your runs and other participants run
You can also explore bigquery to the dataset, and the monitoring dataset you have created where you should be able to see the request previously sent (if you have any doubts about which is your dataset the ID of the monitoring job can be found by exploring your monitoring job from the endpoint page in the Vertex AI section of GCP)

Regards !

License

This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems>, Elsevier, 47(4):547-553. ISSN: 0167-9236.

Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

Title: Wine Quality
Sources Created by: Paulo Cortez (Univ. Minho), António Cerdeira, Fernando Almeida, Telmo Matos and José Reis (CVRVV) @ 2009
Past Usage:

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems>, Elsevier, 47(4):547-553. ISSN: 0167-9236.

You can find the original dataset at : https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docker		docker
src		src
.gitignore		.gitignore
README.md		README.md
wine_quality_notebook.ipynb		wine_quality_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker

docker

src

src

.gitignore

.gitignore

README.md

README.md

wine_quality_notebook.ipynb

wine_quality_notebook.ipynb

Repository files navigation

Devfest MLOps

Sum-up

Objectives

Pre requisites

Technical details

Workshop

Setting things up

The gcp platform

The dev environment

The pipeline image

Getting started

Structure of the lab

Running things

Endpoint Predictions

Querying from Cloud Shell

Querying with python

Conclusions

License

About

Releases

Packages

Contributors 2

Languages

Sfeir/devfest_mlops

Folders and files

Latest commit

History

Repository files navigation

Devfest MLOps

Sum-up

Objectives

Pre requisites

Technical details

Workshop

Setting things up

The gcp platform

The dev environment

The pipeline image

Getting started

Structure of the lab

Running things

Endpoint Predictions

Querying from Cloud Shell

Querying with python

Conclusions

License

About

Resources

Stars

Watchers

Forks

Languages