mpi4py futures example workflow #347
-
Hi @dalcinl I want to execute a series of python tasks, which can take in the range from 1 to 10 minutes. Whenever two tasks are completed, I want to combine them and use the combined result as an input for the next calculation. The goal is to get the maximum throughput on a given number of cores in a given time. Here is a first prototype I have: script from time import sleep
from mpi4py import MPI
from mpi4py.futures import MPIPoolExecutor
# parameters
n = 10
# functions
def test(x):
sleep(2 * n-x)
def status(lst):
print("calc status")
return [t.done() for t in lst]
# execution
if __name__ == '__main__':
with MPIPoolExecutor() as executor:
if executor is not None:
# initial calculations
futures_lst = [executor.submit(test, i) for i in range(n)]
status_lst = status(lst=futures_lst)
count_finished = 0
# continuous submission
while not min(status_lst):
print("waiting:", status_lst)
sleep(1)
status_lst = status(lst=futures_lst)
sum_finished = sum(status_lst)
newly_finished = (sum_finished-count_finished)
if newly_finished >= 2:
count_finished = sum_finished
futures_lst += [executor.submit(test, len(futures_lst)+i) for i in range(newly_finished)]
if sum_finished >= n:
break Execute with:
As the managing of the status of the different future objects is a bit tedious and I am new to the futures interface, I was wondering if there are any built-in helpers in either the standard python library or |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@jan-janssen I converted your issue to a discussion. You are not reporting any trouble with mpi4py, but rather asking a question that requires discussion.
So you are trying to do/introduce task dependencies/graphs, and that's something I believe your code can be simplified. However, it looks to me that your prototype example is too much artificial, it is not clear why this magic number of two tasks, the results of the two tasks is not used for the submission of the two tasks. In short, the actual task graph is not clear. Therefore, I'm a bit short of material to provide good advice. By strictly looking at your code, it looks to me that you could use the
Yes, that's correct. By design the master process is not a worker. Looking at your code, looks like the master process is doing minor work handling the task submissions. If you have let say only 4 cores, I would try executing with 5 MPI processes. That's oversubscription, which could be bad if all MPI processes are doing intensive computations, but in your particular case (one of the processes just do lightweight task management), you may get better performance. |
Beta Was this translation helpful? Give feedback.
-
@dalcinl Thanks a lot - that already helps. |
Beta Was this translation helpful? Give feedback.
@jan-janssen I converted your issue to a discussion. You are not reporting any trouble with mpi4py, but rather asking a question that requires discussion.
So you are trying to do/introduce task dependencies/graphs, and that's something
concurrent.futures
(thusmpi4py.futures
) was not designed to provide by itself. Any task dependency management has to be built from your side, just as you are doing. I'm not aware of any built-in tool to help with this.I believe your code ca…