Skip to content
Randall O'Reilly edited this page Nov 19, 2020 · 3 revisions

MPI: Message Passing Interface

MPI is a widely-supported high-speed communication protocol for distributed memory parallel processing.

N completely separate instances of the same simulation program are run in parallel, and they typically communicate weight changes and trial-level log data amongst themselves. Each proc thus trains on a subset of the total set of training patterns for each epoch. Thus, dividing the patterns across procs is the most difficult aspect of making this work. The mechanics of synchronizing the weight changes and etable data are just a few simple method calls.

Speedups approach linear, because the synchronization is relatively infrequent, especially for larger networks which have more computation per trial. The biggest cost in MPI is the latency: sending a huge list of weight changes infrequently is much faster overall than sending smaller amounts of data more frequently.

General tips for MPI usage

  • MOST IMPORTANT: all procs must remain completely synchronized in terms of when they call MPI functions -- these functions will block until all procs have called the same function. The default behavior of setting a saved random number seed for all procs should ensure this. But you also need to make sure that the same random permutation of item lists, etc takes place across all nodes. The empi.FixedTable environment does this.

  • Instead of aggregating epoch-level stats directly on the Sim, which is how the basic ra25 example works, you need to record trial level data in an etable (TrnTrlLog), then synchronize that across all procs at the end of the epoch, and run aggregation stats on that data.

Links to other MPI impls

Here's a set of random go bindings I found:

Even a from scratch implementation:

Also a few libraries for using infiniband directly:

Some discussion: https://groups.google.com/forum/#!topic/golang-nuts/t7Vjpfu0sjQ