[Feature Request]: Bits shift operator (>>) for functions dependency in workflow deployment. #3305

npsables · 2023-03-20T09:56:50Z

Feature Type

Adding new functionality to MLRun

Problem Description

I currently got training to work with mlrun in HCI. I know you guys make it as atomic as possible, providing a lot of controllability. But it seems to make the CI/CD very redundant. For example:

First we have to set functions, each function in a file:

def create_hadoop_table(context: mlrun.MLClientCtx):
    ...

def etl_something(context: mlrun.MLClientCtx):
    ...

def store_1_to_fs(context: mlrun.MLClientCtx):
    ...

def store_2_to_fs(context: mlrun.MLClientCtx):
    ...

# 4 functions for 4 files

then create workflows in a file, the function names are not even referred to any function above!

@dsl.pipeline(name="process-vector")
def pipeline():
    create_hadoop_tables_function = mlrun.run_function(
        function="create_missing_hadoop_tables"
    )

    etl_vector = mlrun.run_function(
        function="etl_from_hadoop_file"
    ).after(create_hadoop_tables_function)

    mlrun.run_function(
        function="store_1_to_fs"
    ).after(etl_vector)

    mlrun.run_function(
        function="store_2_to_fs"
    ).after(etl_vector)

then create a bunch of set_function and set_workflow with the name referred to above pipelines:

    project = mlrun.get_or_create_project(name="common")

    project.set_function(
        name="create_missing_hadoop_tables",
        kind="job",
        func="<path-to-create_hadoop_table-file>",  # ---> this is the reference
        handler="create_hadoop_table",
        image=MLRUN_IMAGE
    )

    project.set_function(
        name="etl_from_hadoop_file",  
        kind="job",
        func="<path-to-etl_something-file>",
        handler="etl_something",
        image=MLRUN_IMAGE
    )

    project.set_function(
        name="store_1_to_fs",
        kind="job",
        func=""<path-to-store_1_to_fs-file>",
        handler="store_1_to_fs",
        image=MLRUN_IMAGE
    )

    project.set_function(
        name="store_2_to_fs",
        kind="job",
        func="<path-to-store_2_to_fs-file>",
        handler="store_2_to_fs",
        image=MLRUN_IMAGE
    )

    # then declare workflow
    project.set_workflow(
        name='vector-processing-workflow',
        workflow_path="<path-to-pipeline-file>",
        handler="pipeline"
    )

    project.save()

As you can see, things keep repeating. Reference using the name (string) provides no clues on whether functions in the pipeline are connected together or not.

Feature Description

We can build a DAGTree (yes, like Apache Airflow). It's elegant and more readable. I believe things get easier this way:

First 4 files for 4 functions:

def create_hadoop_table(context: mlrun.MLClientCtx):
    ...

def etl_something(context: mlrun.MLClientCtx):
    ...

def store_1_to_fs(context: mlrun.MLClientCtx):
    ...

def store_2_to_fs(context: mlrun.MLClientCtx):
    ...

# 4 functions for 4 files

then one file to deploy:

project = mlrun.get_or_create_project(name="common")

f1 = project.set_function(
      name="create_missing_hadoop_tables",
      kind="job",
      func="<path-to-create_hadoop_table-file>",  # ---> this is the reference
      handler="create_hadoop_table",
      image=MLRUN_IMAGE
  )

f2 = project.set_function(
      name="etl_from_hadoop_file",  
      kind="job",
      func="<path-to-etl_something-file>",
      handler="etl_something",
      image=MLRUN_IMAGE
  )

f3 = project.set_function(
      name="store_1_to_fs",
      kind="job",
      func=""<path-to-store_1_to_fs-file>",
      handler="store_1_to_fs",
      image=MLRUN_IMAGE
  )

f4 = project.set_function(
      name="store_2_to_fs",
      kind="job",
      func="<path-to-store_2_to_fs-file>",
      handler="store_2_to_fs",
      image=MLRUN_IMAGE
  )

# then declare workflow 
f1 >> f2 >> [f3, f4]  
# this will render the workflow

project.save()

Alternative Solutions

Apache Airflow repo should be a good example.

Additional Context

Do you think this is a good idea? I can do a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Bits shift operator (>>) for functions dependency in workflow deployment. #3305

[Feature Request]: Bits shift operator (>>) for functions dependency in workflow deployment. #3305

npsables commented Mar 20, 2023 •

edited

[Feature Request]: Bits shift operator (>>) for functions dependency in workflow deployment. #3305

[Feature Request]: Bits shift operator (>>) for functions dependency in workflow deployment. #3305

Comments

npsables commented Mar 20, 2023 • edited

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

npsables commented Mar 20, 2023 •

edited