Skip to content

Development

Ioannis Paraskevakos edited this page Dec 7, 2020 · 4 revisions

Features

Full Async WF execution

Description

1. tasks and task dependencies can be added during execution
2. control is returned to the application during execution
3. multiple (independent) workflows can be executed concurrently

Requirements

Handle failures programmatically

Description

There is a request for EnTK to handle different types of workflow failures programmatically. Currently, EnTK only handles task failure as a global behavior. Based on a boolean value, EnTK will either resubmit for execution any task that failed or none. Breaking the global behavior and introducing a more fine-grained failure handling is desired. EnTK does not currently handle any other type of failure.

Types of failure that can happen

Task manager process gets lost or killed.

The task manager is responsible for communicating with the runtime system the tasks that are ready for execution. While starting, EnTK creates and starts a process to execute the main functionality of the task manager. When RADICAL-Pilot is used, the task manager creates a thread where a unit manager is created. This unit manager then connects with the active pilots and executes. There is a case where either the task manager process or thread is lost or killed. The thread failure also includes a failure from the unit manager.

Proposed Approach: EnTK launches a new process or thread. Eventually, all unfinished tasks that the old task manager was handling are submitted again for execution.

Resource manager / Resource failure.

The AppMan knows the Resource Manager. Currently, the AppMan checks the state of the to understand if it is alive or not. When the resource manager reports a final state, with or without failure, the application manager terminates the execution of the workflow.

Proposed Approach: Create a new resource request and change any task state that is not already final to the initial state. As soon as the resource is up again the workflow execution will continue