Skip to content
Dhaval Salwala edited this page Dec 12, 2023 · 17 revisions

Welcome to the SAIL wiki!

  1. What is SAIL?
  2. Difference with River and other existing incremental machine learning libraries.
  3. Spark vs Ray for incremental models.
  4. SAIL Architecture and Model Framework
  5. Contributing to SAIL code
  6. Developer's Guide
  7. Installation
  8. Introduction to SAIL Models
  9. A guide to SAIL Pipeline
  10. Getting started with SAIL AutoML
  11. Step by step guide to AutoML training
  12. Pipeline Strategy with SAIL AutoML
  13. Examples and Notebooks

What is SAIL?

SAIL is a python library for experimenting with streaming processing engines (SPEs) and incremental machine learning (IML) models. The main features of SAIL are:

  • Common interface for all incremental models available in libraries like Scikit-Learn, Pytorch, Keras and River.
  • Distributed computing for model selection, ensembling etc.
  • Hyperparameter optimization for incremental models.
  • Interface and pipelines that implement incremental models for both offline and online learning.

Difference with River and other existing incremental machine learning libraries.

SAIL leverages the existing machine learning libraries like River, sklearn etc and creates a common set of APIs to run these models in the backend. In particular, while River provides minimal utilities for deep learning models, it does not focus on deep learning models developed through Pytorch and Keras. In addition, models in SAIL are parallelized using Ray. The parallelization results in three major advatages that are particularly important for incremental models with high volume and high velocity data:

  • Faster computational times for ensemble models.
  • Faster computational times for ensemble of forecasts.
  • Creates a clean interface for developing AutoML algorithms for incremental models.

Spark vs Ray for incremental models.

SAIL could have been parallelized using Spark as well. However, to keep the streaming processing engines and machine learning tasks independent, Ray was preferred as the data can then be handled using Pandas, Numpy etc efficiently. This flexibility further allows using other SPEs like Flink or Storm without updating the parallelization framework for IML models.

SAIL Architecture

SAIL Pipeline

Model Framework