Skip to content
This repository has been archived by the owner on Jun 13, 2023. It is now read-only.

simplybusiness/code-first-pipelines

Repository files navigation

Code-First Pipelines

A framework built on top of Ploomber that allows code-first definition of pipelines. No YAML needed!

Installation

To get the minimum code needed to use the pipelines, install it from PyPI:

pip install code-first-pipelines

Usage

Pipelines

import pandas as pd
from sklearn import datasets
from cf_pipelines import Pipeline

iris_pipeline = Pipeline("My Cool Pipeline")

@iris_pipeline.step("Data ingestion")
def data_ingestion():
    d = datasets.load_iris()
    df = pd.DataFrame(d["data"])
    df.columns = d["feature_names"]
    df["target"] = d["target"]
    return {"raw_data.csv": df}

iris_pipeline.run()

See the tutorial notebook for a more comprehensive example.

ML Pipelines

import pandas as pd
from sklearn import datasets
from cf_pipelines.ml import MLPipeline

iris_pipeline = MLPipeline("My Cool Pipeline")

@iris_pipeline.data_ingestion
def data_ingestion():
    d = datasets.load_iris()
    df = pd.DataFrame(d["data"])
    df.columns = d["feature_names"]
    df["target"] = d["target"]
    return {"raw_data.csv": df}

iris_pipeline.run()

See the tutorial notebook for a more comprehensive example.

Getting started with a template

Once installed, you can create a new pipeline template by running:

pipelines new [pipeline name]