Skip to content

jerife/MLOps-on-kubernetes

Repository files navigation

MLOps-on-kubernetes

Why Kubernetes in MLOps?

Kubernetes provides flexible control over containers through orchestrations such as Scheduling, Load balancing, and Scaling.
Therefore, it is suitable to systematically build and operate all configurations of ML projects, including Data collection, Preprocessing, Feature extraction, Data validation, Monitoring, and Deploying.
In particular, containers packaged with train code can be run on nodes with GPUs, and containers packaged with data preprocessing code can be run on nodes with plenty of memory.
Also, because it is managed as a Kubernette container, Dockerfile ensure the same environment as engineers.

Architecture


  • Kubernetes: Deploy a Kubernetes cluster with the Google Kubernetes Engine.
  • ML PIPELINE: Build an ML PIPELINE that learns and deploys the model only by entering parameters with Kubeflow.
  • Data Storage: Manage your data with Google Cloud Storage.
  • Experiment Tracking & AutoML: Use Weight & Biases to track the experiment and find the optimal Hyperparameter.
  • Model Versioning: Manage and save models by version with Mlflow.
  • Model Serving: API communication with user through BentoML.
  • Monitoring: Monitor the cluster's resources with Prometheus & Grafana.

ML PIPELINE


Pipeline has two conditions depending on the user's parameter input.

- Condition 1: Hyperparameter tunning
- Condition 2: Train
Condition 1. Hyperparameter tunning 🔍

A. Hyperparameter tunning with Weight&Biases



If Condition: Hyperparameter tuning, the model is not trained and only tuning is performed.
The number of tuning can be controlled through input variables, and the tuning process can be checked through Weight & Biases Porject.


Condition 2. Train 🛠

A. Model Versioning with Mlflow


In "Condition: Train", the model is trained according to the input parameters, and if you do not enter the parameters, the model is trained with the default value.
The trained model is compared to the model registered in the mlflow, and if it has better accuracy, it is mlflow uploaded and versioning.

B. Model Serving with Bentoml



If the model is stored in Mlflow, the model will is pushed to BentoML.
Pushed models are deployed as desired by the user. (eg. CPU, GPU, Memory etc)


Cluster Monitoring


Customize Prometheus & Grafana to monitor clusters.

Model Inference

$ curl \                                                                                                                                                   
    -X POST \
    -H "content-type: application/json" \
    --data "[[[1.1, 2.2, 3.3, 4.4],
                      ... 
               5.5, 6.6, 7.7, 8.8]]]" \ # shape: N x 22 x 750
    https://demo-default-yatai-127-0-0-1.apps.yatai.dev/classify
    
>>> "left"
Currently, there are many problems with BentoML1.0 in the GKE environment, so it is registered as an issue, and the writing is temporarily written.

-> Issue: bentoml/Yatai#322 If this issue is resolved, then i will apply bentoml again.


Machine Learning Task



Overview

In this task, I aim to build a motor image (MI) task, which is mainly covered in the Brain Computer Interface, into MLOps
(ie. MI task: input brain waves generated when imagining moving into the model to derive results)

Data

BCI Competition IV 2a Dataset (Classification of EEG signals affected by eye movement artifacts)

Task

  1. Preprocessing
    • Band Pass Filter 8~30Hz
    • Segment the data into trainable shapes
  2. Feature Extraction
    • Common Spatial Pattern
  3. Modeling
    • Support Vector Machine

Author

Github: @jeirfe
Website: jerife.github.io
Email: jerife@naver.com

Copyright © 2022 jerife.
This project is Apache-2.0 licensed.