Please refer to our paper that describes this federated learning framework.
MALT-2 is a distributed data-parallel machine learning system for Torch.
MALT-2 is a ML parallelization framework to paralleize any existing ML application. The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art accuracy.
- Easy to add to existing code general-purpose interface, requires only changing optimization type to dstsgd (distributed SGD).
- Support for multi-machine, multi-GPU training with CUDA implementations for distributed parameter averaging.
- Includes C++ and Lua interface to extend existing code. Support for Torch and NEC MiLDE.
- Easily extend your existing Torch code with minimal changes.
- Explore existing distributed GPU apps over Resnets, and large language models.
- Various optimizations such as sparse-reduce, NOTIFY_ACK to accelerate distributed model training
- Torch
- MPI (OpenMPI or MPICH) built with CUDA support. If you are using Ubuntu 16.04, you can use these packages.
- Boost (1.54 or higher)
Follow the torch, cuda and boost websites to install the respective packages. For Open-MPI follow instructions below to install MPI with CUDA.
wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.bz2
tar xfj openmpi-2.1.2.tar.bz2
cd openmpi-2.1.2; mkdir build; cd build
../configure --prefix=$HOME/usr --enable-mpi-cxx --enable-shared --enable-mpi-thread-multiple --enable-mpi-ext=affinity,cuda --with-cuda=/usr/local/cuda
make -j 8 all
make install
Note: Use similar instructions with openmpi-3.0.0.tar.bz2, but --enable-mpi-thread-multiple needs then to be removed.
- Checkout the latest version of MALT-2 from github
git clone https://github.com/malt2/malt2.git --recursive
on some machines, you might need things something like (MKL is optional):
source [torch-dir]/install/bin/torch-activate
source /opt/intel/mkl/bin/intel64/mklvars.sh intel64
If using modules, you can try:
module install icc cuda80 luajit
make
To build componenet-wise (not required if using make above):
cd dstorm
./mkit.sh GPU test
You should get a SUCCESS
as the output. Check the log files to ensure the build is successful.
The general format is:
./mkit.sh <type>
where TYPE is: or MPI (liborm + mpi) or GPU (liborm + mpi + gpu) A side effect is to create ../dstorm-env.{mk|cmake} environment files, so lua capabilities can match the libdstorm compile options.
cd orm
./mkorm.sh GPU
Building Torch packages. With Torch environment setup, install the malt-2 and dstoptim (distributed optimization packages)
cd dstorm/src/torch
rm -rf build && VERBOSE=7 luarocks make malt-2-scm-1.rockspec >& mk.log && echo YAY #build and install the malt-2 package
cd dstoptim
rm -rf build && VERBOSE=7 luarocks make dstoptim-scm-1.rockspec >&mk.log && echo YAY # build the dstoptim package
- A very basic test is to run th and then try, by hand,
require "malt2"
- With MPI, then you'll need to run via mpirun, perhaps something like:
mpirun -np 2 `which th` `pwd -P`/test.lua mpi 2>&1 | tee test-mpi.log
- if GPU,
mpirun -np 2 `which th` `pwd -P`/test.lua gpu 2>&1 | tee test-GPU-gpu.log
- NEW: a
WITH_GPU
compile can also run with MPI transport
mpirun -np 2 `which th` `pwd -P`/test.lua mpi 2>&1 | tee test-GPU-mpi.log
default transport is set to the "highest" built into libdstorm2: GPU > MPI > SHM
mpirun -np 2 `which th` `pwd -P`/test.lua 2>&1 | tee test-best.log
-
MPI only sees the hostname. By default, on every host, MPI jobs enumerate the GPUs and start running the processes. The only way to change this and run on other GPUs in a round-robin fashion is to change this enumeration for every rank using
CUDA_VISIBLE_DEVICES
. An example script is inredirect.sh
file in the top-level directory. -
To run:
mpirun -np 2 ./redirect.sh `which th` `pwd`/test.lua
This script assigns available GPUs in a round-robin fashion. Since MPI requires visibility of all other GPUs to correctly access shared memory, this script only changes the enumeration order and does not restrict visibility.
Clone the tutorials repo:
git clone https://github.com/malt2/malt2.tutorials
Run individual tutorials as per README in each sub-directory. The mpitest.sh is the general launch script. An additonal script redirect.sh is provided to distribute MPI processes over different GPUs.