Dynamic Task/ParallelTask/Pipeline #101

leo-schick · 2023-04-11T15:58:53Z

Currently the data pipeline DAG is defined fixed on compilation and supports only a small option of dynamics e.g. the task ParallelReadFile supports to read files (the number of files are unknown on compilation time).

I would like to have similar dynamics in other areas as well:

Dynamic nodes

The following dynamic nodes could be implemented:

Dynamic tasks

A option to give the Task a python function which is executed on pipeline runtime and returns a list of commands to execute in order.

Dynamic parallel tasks

A option to give the ParallelTask a python function which is executed on pipeline runtime and returns a list of commands / command chains to be executed in parallel.

Dynamic pipeline

A option to define a DynamicPipeline where the nodes are defined within a python function which is executed on pipeline runtime.

Implement UI awareness

The dynamic node objects (Task/ParallelTask/Pipeline) must be defined so that the python function which defines the actual commands/tasks/nodes is not run when interacting with the UI.

Implement node cost handling

These dynamic nodes should be defined so that they define sub-nodes for the dynamic node object. The pipeline execution should then intelligently retract the node cost from the database when the node had been executed in the past. E.g. a dynamic node could represent a export of a database table. By defining the sub-nodes, the pipeline execution can intelligently run the nodes with the highest node cost first to save up execution time.

Example use cases

performing actions against tables on a database (e.g. export table to datalake). We don't know on time of compilation what tables exist in the database
performing actions against a data lake / lakehouse per table on disk (e.g. connecting the table to our database engine). We don't know on time of compilation what tables exist on the data lake / lakehouse.

The text was updated successfully, but these errors were encountered:

* add callable for Task arg. commands #101 * code review suggestions * make is_dynamic_commands a getter-property

leo-schick added the feat label Apr 11, 2023

leo-schick added a commit that referenced this issue Nov 22, 2023

add callable for Task arg. commands #101

7260baf

leo-schick mentioned this issue Nov 22, 2023

add dynamic Task #106

Merged

leo-schick added a commit that referenced this issue Nov 24, 2023

add dynamic Task (#106)

c6db3b2

* add callable for Task arg. commands #101 * code review suggestions * make is_dynamic_commands a getter-property

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Task/ParallelTask/Pipeline #101

Dynamic Task/ParallelTask/Pipeline #101

leo-schick commented Apr 11, 2023 •

edited

Dynamic Task/ParallelTask/Pipeline #101

Dynamic Task/ParallelTask/Pipeline #101

Comments

leo-schick commented Apr 11, 2023 • edited

Dynamic nodes

Dynamic tasks

Dynamic parallel tasks

Dynamic pipeline

Implement UI awareness

Implement node cost handling

Example use cases

leo-schick commented Apr 11, 2023 •

edited