Skip to content

Gcav66/learn-dagster-mlflow

Repository files navigation

dagster_mlflow_project

This is an example ml pipeline with mlfow, dagster, and github actions

The goal is to solve for these challenges:

  • How to track experiments --> MLflow
  • How to manage the various steps in a ML pipeline --> Dagster
  • How to confirm code changes don't break the project --> Github Actions
  • foo

Standard Dagster Setup

Getting started

First, install your Dagster repository as a Python package. By using the --editable flag, pip will install your repository in "editable mode" so that as you develop, local code changes will automatically apply.

pip install -e ".[dev]"

Then, start the Dagit web server:

dagit

Open http://localhost:3000 with your browser to see the project.

You can start writing assets in dagster_mlflow_project/assets/. The assets are automatically loaded into the Dagster repository as you define them.

Development

Adding new Python dependencies

You can specify new Python dependencies in setup.py.

Unit testing

Tests are in the dagster_mlflow_project_tests directory and you can run tests using pytest:

pytest dagster_mlflow_project_tests

Schedules and sensors

If you want to enable Dagster Schedules or Sensors for your jobs, start the Dagster Daemon process in the same folder as your workspace.yaml file, but in a different shell or terminal.

The $DAGSTER_HOME environment variable must be set to a directory for the daemon to work. Note: using directories within /tmp may cause issues. See Dagster Instance default local behavior for more details.

dagster-daemon run

Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.

Deploy on Dagster Cloud

The easiest way to deploy your Dagster project is to use Dagster Cloud.

Check out the Dagster Cloud Documentation to learn more.