I hope you enjoy it.
This demo is inspired by and based on the following resources:
Part I
- Python Guide: Structuring Your Project
- Cookiecutter Data Science
- DVC: Get Started
- DVC: Tutorial - Data and Model Versioning
- Iterative: Example Versioning with DVC
- Machine Learning Template code
- Keras: Building powerful image classification models using very little data (outdated)
Part II
- DVC: Data Pipelines
- Reproducibility in Machine Learning blog series
- Tensorflow Determinism
- CML on GitHub
- DVC: SSH setup with
dvc remote modify
- MLFlow Tracking
- DVC: Dvclive Usage Guide
Further Reading on self-Hosting CML and Github Actions
- How to use own server to run Github Actions: GitHub Actions - Self-hosting workflow runners
- How to use own (cloud-)server to run CML: CML self-hosted runner
- C. Olah and S. Carter: Research Debt
- D. Sculley, et al: Hidden Technical Debt in Machine Learning Systems
- D. Sato, A. Wider, C. Windheuser: Continuous Delivery for Machine Learning
- Chip Huyen: Machine Learning Systems Design
- Martin Zinkevich: Best Practices for ML Engineering
- A. Lavin, et al.: Technology Readiness Levels for Machine Learning Systems
- E. Breack, et al: The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
- Neil Lawrence: Data Readiness Levels