Movie-Recommendation-System-MLOps

Overall architecture

The following architecture shows our deployment movie recommendation system

⚙️ Software & Tools

Continuous Integration

1. A pipeline for movie recommendation

Data storage
- Apache Kafka is a distributed event store and stream-processing platform
- Collect Kafka log data
  - Data (movies watched by user) --> for (re)training model and for online evaluation
  - Rate (rating by user) --> for (re)training model and for online evaluation
  - Request --> for online evaluation
- This pipeline, once run, continues to run until it is intentionally stopped.
- After online evaluation, expired data is automatically deleted.
Data preprocessing
- pre-processing the stored raw data
- Generate a compresssed sparse row (CSR) matrix
- Split it into train/validation sets
Model (re)training

Matrix Factorization (MF)
- SVD
- SVD++
Offline evaluation
- 'RMSE' as metric for offline evaluation

2. Code integrity checks with uni-test

The process is integrated on Jenkins pipeline, which runs automatically.
The result can be identified in a coverage report format on Jenkins

3. Automatic integration pipeline with Jenkins

Continuous integration
- Jenkins
  - Unit test 1 to 5 --> model management & offline evaluation (model) --> online evaluation
- Using Blue Ocean plugin
  - A more visualized dashboard than ever before
  - Commit occurs in master branch of github --> Autorun the entire pipeline
  - Save after pipeline build --> Jenkinsfile for pipeline is committed to master branch on github
- Using freestyle project
  - Automatically run once in a specific period of time
  - Setting the "build periodically" option

Continuous Deployment

1. Containerization with Rancher

Rancher
- A complete container management platform that includes everything necessary for container management during the production process
Deploymeny components
- Our system manages two recommendation models as different deployments in one cluster
- Each deployment consists of two pods, one replica of the ohter, which distributes and processes tasks

2. Automatic Continuous Deployment with Jenkins

Automatic Continuous Deployment with Jenkins
- Extending our integration pipeline to model deployment
- We leverage jenkins to transmit the deployment signal to the Rancher
- Whenever committed to Github, the pipeline is executed:
  - Continuous Integration : Data fetching, Data preprocessing, Model retraining
  - Continuous Deployment : Build docker images, Push images to docker repo
  - Model deployment : Pull docker images for retrained models and redeploy it through Rancher

Zero downtime for model redeployment

    - The new redeployment also has 2 pods with replica
    - After one new pod is deployment, one existing pod is terminated
    - After a new pod is deployed again, the remaining existing pod is also terminated --> ZERO DWONTIME in the process of deploying the retrained models

All these process are stable controlled under the Rancher platform

3. Monitoring

Monitoring infrastructure
- Prometheus, Grafana and Node Exporter to monitor our infrastructure
  - Memory usage
  - CPU usage
  - Latency time in flask
  - Model quality

- Sending alerts to our slack #alert channel

4. Versioning and tracking provenance

Provenance
- DVC
  - An open-source version control system
  - DVC stores the information of dataset and the model in .dvc format
- Process
  - Track modification --> Add changes to git --> push git tag

Conclusion

Collect data from Kafka Streaming and data preprocessing for movie recommendation model training
Deploy and measure a model inference service
Build and operate infrastructures
- A continuous integration infrastructure for evaluate a model in production
- A monitoring infrastructure for the system health and model quality
- A continuous deployment infrasturcture for automatic periodic retraining and versioning
Design and implement a monitoring strategy to detect possible issues in ML systems

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Milestone1		Milestone1
Milestone2		Milestone2
Milestone3		Milestone3
Milestone4		Milestone4
Monitering		Monitering
.gitignore		.gitignore
Final_Presentation_Slide.pdf		Final_Presentation_Slide.pdf
Jenkinsfile		Jenkinsfile
README.md		README.md
dockerfile		dockerfile
dvc_run.py		dvc_run.py
run.sh		run.sh

yuhogun0908/Movie-Recommendation-System-MLOps-

Folders and files

Latest commit

History

Repository files navigation

Movie-Recommendation-System-MLOps

Table of contents

Overall architecture

⚙️ Software & Tools

Continuous Integration

1. A pipeline for movie recommendation

Matrix Factorization (MF)

2. Code integrity checks with uni-test

3. Automatic integration pipeline with Jenkins

Continuous Deployment

1. Containerization with Rancher

2. Automatic Continuous Deployment with Jenkins

3. Monitoring

4. Versioning and tracking provenance

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Languages