Skip to content

Horovod Installation

afiaka87 edited this page Apr 13, 2021 · 8 revisions

Install MPI

wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz
tar -xf openmpi-4.1.0.tar.gz
cd openmpi-4.1.0
gunzip -c openmpi-4.1.0.tar.gz | tar xf -
cd openmpi-4.1.0
./configure --prefix=/usr/local
# <...lots of output...>
make all install

If installation went well - you should be able to install horovod now:

pip install horovod

Usage

  1. Run a machine with 4 GPUS
$ horovodrun -np 4 python train_dalle.py --image_text_folder=/path/to/your/dataset --distributed_backend horovod
  1. Run on 4 machines with 4 GPUs each:
$ horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train_dalle.py --image_text_folder=/path/to/your/dataset --distributed_backend horovod
  1. Horovod autotuning:
$ mpirun -x HOROVOD_AUTOTUNE=1 -x HOROVOD_AUTOTUNE_LOG=/tmp/autotune_log.csv ... train_dalle.py --image_text_folder=/path/to/your/dataset --distributed_backend horovod

Docker (Doesn't effect vast.ai)

If you are inside of a docker container - make sure to check if you have a docker0 LAN interface. If you do, you will need to follow specific instructions to ensure that this interface is ignored. See https://horovod.readthedocs.io/en/stable/mpi.html for further details.