MALT-2:Distributed Data-Parallel Learning for Torch

Please refer to our paper that describes this federated learning framework.

About

MALT-2 is a distributed data-parallel machine learning system for Torch.

MALT-2 is a ML parallelization framework to paralleize any existing ML application. The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art accuracy.

Easy to add to existing code general-purpose interface, requires only changing optimization type to dstsgd (distributed SGD).
Support for multi-machine, multi-GPU training with CUDA implementations for distributed parameter averaging.
Includes C++ and Lua interface to extend existing code. Support for Torch and NEC MiLDE.
Easily extend your existing Torch code with minimal changes.
Explore existing distributed GPU apps over Resnets, and large language models.
Various optimizations such as sparse-reduce, NOTIFY_ACK to accelerate distributed model training

Building MALT with Torch

Requirements

Torch
MPI (OpenMPI or MPICH) built with CUDA support. If you are using Ubuntu 16.04, you can use these packages.
Boost (1.54 or higher)

Setup

Install Torch, MPI, Boost and CUDA (if using GPU).

Follow the torch, cuda and boost websites to install the respective packages. For Open-MPI follow instructions below to install MPI with CUDA.

wget https://www.open-mpi.org/software/ompi/v2.1/downloads/openmpi-2.1.2.tar.bz2
tar xfj openmpi-2.1.2.tar.bz2
cd openmpi-2.1.2; mkdir build; cd build
../configure --prefix=$HOME/usr --enable-mpi-cxx --enable-shared --enable-mpi-thread-multiple --enable-mpi-ext=affinity,cuda --with-cuda=/usr/local/cuda
make -j 8 all
make install

Note: Use similar instructions with openmpi-3.0.0.tar.bz2, but --enable-mpi-thread-multiple needs then to be removed.

Checkout the latest version of MALT-2 from github

git clone https://github.com/malt2/malt2.git --recursive

Setup the environment variables

Source your torch/cuda/MKL environment:

on some machines, you might need things something like (MKL is optional):

source [torch-dir]/install/bin/torch-activate
source /opt/intel/mkl/bin/intel64/mklvars.sh intel64

If using modules, you can try:

module install icc cuda80 luajit

To build everything including dstorm, orm and torch, just type from the top-level directory:

make

Component-wise build

To build componenet-wise (not required if using make above):

Build the dstorm directory, run:

cd dstorm
./mkit.sh GPU test

You should get a SUCCESS as the output. Check the log files to ensure the build is successful.

The general format is:

./mkit.sh <type>

where TYPE is: or MPI (liborm + mpi) or GPU (liborm + mpi + gpu) A side effect is to create ../dstorm-env.{mk|cmake} environment files, so lua capabilities can match the libdstorm compile options.

Build the orm

cd orm
./mkorm.sh GPU

Building Torch packages. With Torch environment setup, install the malt-2 and dstoptim (distributed optimization packages)

cd dstorm/src/torch
rm -rf build && VERBOSE=7 luarocks make malt-2-scm-1.rockspec >& mk.log && echo YAY #build and install the malt-2 package
cd dstoptim
rm -rf build && VERBOSE=7 luarocks make dstoptim-scm-1.rockspec >&mk.log && echo YAY # build the dstoptim package

Test

A very basic test is to run th and then try, by hand,

require "malt2"

Run a quick test.

With MPI, then you'll need to run via mpirun, perhaps something like:

mpirun -np 2 `which th` `pwd -P`/test.lua mpi 2>&1 | tee test-mpi.log

if GPU,

mpirun -np 2 `which th` `pwd -P`/test.lua gpu 2>&1 | tee test-GPU-gpu.log

NEW: a WITH_GPU compile can also run with MPI transport

mpirun -np 2 `which th` `pwd -P`/test.lua mpi 2>&1 | tee test-GPU-mpi.log

default transport is set to the "highest" built into libdstorm2: GPU > MPI > SHM

mpirun -np 2 `which th` `pwd -P`/test.lua 2>&1 | tee test-best.log

Running over multiple GPUs.

MPI only sees the hostname. By default, on every host, MPI jobs enumerate the GPUs and start running the processes. The only way to change this and run on other GPUs in a round-robin fashion is to change this enumeration for every rank using CUDA_VISIBLE_DEVICES. An example script is in redirect.sh file in the top-level directory.
To run:

mpirun -np 2 ./redirect.sh `which th` `pwd`/test.lua

This script assigns available GPUs in a round-robin fashion. Since MPI requires visibility of all other GPUs to correctly access shared memory, this script only changes the enumeration order and does not restrict visibility.

Applications

Now we can run simple torch demos such as distributed linear-regression or imagenet.

Clone the tutorials repo:

git clone https://github.com/malt2/malt2.tutorials

Run individual tutorials as per README in each sub-directory. The mpitest.sh is the general launch script. An additonal script redirect.sh is provided to distribute MPI processes over different GPUs.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
dstoptim @ 78c5a94		dstoptim @ 78c5a94
dstorm		dstorm
malt2.torch @ be0ed9f		malt2.torch @ be0ed9f
orm		orm
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
PATENTS		PATENTS
README.md		README.md
dStorm.mk		dStorm.mk
makefile.std-obj-deps		makefile.std-obj-deps

License

malt2/malt2

Folders and files

Latest commit

History

Repository files navigation

MALT-2:Distributed Data-Parallel Learning for Torch

About

Building MALT with Torch

Requirements

Setup

Install Torch, MPI, Boost and CUDA (if using GPU).

Setup the environment variables

Source your torch/cuda/MKL environment:

To build everything including dstorm, orm and torch, just type from the top-level directory:

Component-wise build

Build the dstorm directory, run:

Build the orm

Building Torch packages. With Torch environment setup, install the malt-2 and dstoptim (distributed optimization packages)

Test

Run a quick test.

Running over multiple GPUs.

Applications

Now we can run simple torch demos such as distributed linear-regression or imagenet.

About

Topics

Resources

License

Stars

Watchers

Forks

Languages