Can radical.entk add support for conditional scheduling/execution of tasks/pipelines #632

GKNB · 2023-01-17T12:36:39Z

One common workflow pattern in ML is, we have multiple works, and each work consists of three stages: data generation, training, and data analysis. It is natural to submit each work as a different pipeline so that they can run asynchronously. However, in adaptive learning, different works could have dependencies. For example, the data generation stage of work_2 might depend on data generation or training stage of work_1, while work_1 and work_2 are not completely serial (i.e., data generation stage of work_2 does not depend on data analysis stage of work_1).

My current solution to this problem is, I first create n pipelines, where n equals the number of works. Next, for all pipelines except the first one, I insert a "monitoring stage" which includes a "monitoring task" at the beginning. This task monitors whether the required stages in the last pipeline have finished. If those required stages are finished, this task will quit so that the pipeline can continue to do its actual work.

However, this implementation has a problem of resource wasting. In my current implementation, we need n-1 "monitoring task", and the one in pipeline_k is a simple bash script that looks for a "signal file" created by pipeline_{k-1}, if the pipeline_{k-1} has generated all required data for pipeline_k. This monitoring task, like all other tasks, will consume resources, and each monitoring task will need one cpu core. As the number of pipelines increases, it will consume more resources. A more serious problem is that it might introduce deadlock. This is because entk does not guarantee the order of launching different pipelines. In the extreme case where the number of pipelines exceeds the number of cores, we might be in a scenario where all resources are used by monitoring tasks in pipeline index 2 to n, so that the first real work in pipeline_1 can not start, suggesting that we have a deadlock.

If we are allowed to have conditional scheduling/execution of tasks/pipelines, then we don't need to introduce monitoring tasks anymore. For example, we can do something similar to CUDA stream like task.post_exec_create_event(event_name) which creates an event when a task is finished, and pipeline.wait_event(event_name) which means a pipeline will wait for an event to show up in order to continue to the next state, then we can also solve this issue.

andre-merzky · 2023-01-19T10:16:56Z

I agree that the described approach (waiting tasks which block the pipeline until some synchronization event occurs) is inelegant, inefficient and wasteful. But currently RE does indeed not provide any mechanism to express the required functionality.

Implementing this functionality in RE is, however, quite quite possible but also quite difficult. At the moment RE has exactly one point of dependency resolution: all tasks of stage_n of a specific pipeline have to be completed before any task of stage_{n+1} of the same pipeline will be eligible to run. Expanding that to stage dependencies across pipelines would require significant changes to

the API (how to express that dependency)
internal data structures (how to communicate that information to the wf_processor component)
RE wf orchestrator (to enact the whole thing)

Is this missing functionality blocking any use case at the moment?
If not, is there a use case upcoming in the near of far future which relies on this functionality?

mtitov · 2023-01-20T19:59:05Z

As we discussed during today's sprint: another option is to consider method stage.post_exec (link) - extend its functionality to trigger certain actions (including starting pipelines)

mturilli · 2023-01-30T22:14:31Z

@GKNB was this discussion good enough? Can we close this ticket and, if you will need to use the proposed method with stage.post_exec you will open a new ticket?

GKNB added layer:entk type:feature labels Jan 17, 2023

andre-merzky added topic:api topic:communication type:question labels Jan 17, 2023

mturilli assigned okilic1, mtitov, andre-merzky, mturilli and AymenFJA Jan 18, 2023

mturilli changed the title ~~Can radical.entk adds support for conditional scheduling/execution of tasks/pipelines~~ Can radical.entk add support for conditional scheduling/execution of tasks/pipelines Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can radical.entk add support for conditional scheduling/execution of tasks/pipelines #632

Can radical.entk add support for conditional scheduling/execution of tasks/pipelines #632

GKNB commented Jan 17, 2023

andre-merzky commented Jan 19, 2023

mtitov commented Jan 20, 2023

mturilli commented Jan 30, 2023

Can radical.entk add support for conditional scheduling/execution of tasks/pipelines #632

Can radical.entk add support for conditional scheduling/execution of tasks/pipelines #632

Comments

GKNB commented Jan 17, 2023

andre-merzky commented Jan 19, 2023

mtitov commented Jan 20, 2023

mturilli commented Jan 30, 2023