You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it is doable to make training of ensemble networks natively possible. My idea would be the following:
Create a new parameter "parallel_execution", that by default is set to 1
If the parameter is anything but 1, upon initialization of the Parameters() object, it will try to initialize a MPI library and get no. of ranks and rank
Should this not succeed, serial execution is assumed
If this succeeds, the same script will be performed nr_ranks times, in parallel without communication between the ranks
saving of parameters, network parameters etc. will be made parallel-safe, i.e. through "_rank" in the file name
At the end the user has to only write ONE script but can use it to train an ensemble of networks by simply requesting the ressources from slurm and doing:
mpirun -np nr_ranks training.py
and editing training.py to include something like
params.parallel_execution=5
The text was updated successfully, but these errors were encountered:
I am aware that the same can be done with bash scripts, but I would argue this way is more user-friendly. Also I'd like to offer MALA native solutions to problems where possible. And I believe this is possible.
I think it is doable to make training of ensemble networks natively possible. My idea would be the following:
mpirun -np nr_ranks training.py
and editing
training.py
to include something likeparams.parallel_execution=5
The text was updated successfully, but these errors were encountered: