Dynamic Grouping of N Dask Workers #10079

BnJam · 2023-03-16T23:26:23Z

BnJam
Mar 16, 2023

Rather than using larger resources, would it be possible to aggregate provisioned resources/Workers to act as a larger single virtual Worker - this Worker would then be available to have high-resource tasks allocated to it. The idea is to save costs on cloud resources while increasing efficiency of the compute cluster.

The motivation behind this post is formed by working on large amounts of EO data and specifically writing a large raster to remote cloud storage. Having more, smaller, Workers allows developers to process more tasks at a given time, but given parts of the workflow, demand for more RAM increases (writing large rasters) and often kills Workers.

I could imagine the API looking something like this:

agg_worker_client = dask.aggregate_workers(N) # Creates a grouping operation of Workers into a logical compute into DAG
agg_worker_client.run(some_delayed_function, func_arg1, func_arg2, ...) # submit tasks to aggregated Worker
agg_worker_client.release() # Releases the group back to separate Workers

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Grouping of N Dask Workers #10079

{{title}}

Replies: 0 comments

Select a reply

Dynamic Grouping of N Dask Workers #10079

BnJam Mar 16, 2023

Replies: 0 comments

BnJam
Mar 16, 2023