Running speaker embeeding training on multiple GPUs on single node #13

ahilan83 · 2020-12-25T04:45:25Z

Hello,
Thanks for sharing the PYtorch code for embedding training.
If we look at thepytorch_xvectors/pytorch_run.sh,
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1
train_xent.py exp/xvector_nnet_1a/egs/
If we look at the above line,it seems like you are training the DNN on using single GPU. Is it possible to train using multiple gpus?

Further if we look at the train_utils.py script,
def prepareModel(args):
elif args.trainingMode == 'init':
net.to(device)
net = torch.nn.parallel.DistributedDataParallel(net,
device_ids=[0],
output_device=0)
if torch.cuda.device_count() > 1:
print("Using ", torch.cuda.device_count(), "GPUs!")
net = nn.DataParallel(net)

Why we are using both torch.nn.parallel.DistributedDataParallel and net = nn.DataParallel(net) ?
When I tried to train, it's training using single GPU. How it needs to modified to train on multiple gpus?

I look forward to hearing from you.

Thanks.

K. Ahilan

manojpamk · 2020-12-28T03:51:00Z

Hello,

I think the code can be run on multiple GPUs using DataParallel, but I haven't figured out how to do the same since I did not have access to a node with multiple GPUs in my university cluster.

I use DistributedDataParallel since this spawns multiple processes in a single GPU which greatly improves training time. This feature was particularly useful since I had access to a single V100 node, and each process was ~4GB.

I have included the if statement for multiple GPU check as a debug option in case I ever got access to a multiple GPU node, but that never happened 😄

I'll leave this issue open in case if someone figures out how to do this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running speaker embeeding training on multiple GPUs on single node #13

Running speaker embeeding training on multiple GPUs on single node #13

ahilan83 commented Dec 25, 2020

manojpamk commented Dec 28, 2020

Running speaker embeeding training on multiple GPUs on single node #13

Running speaker embeeding training on multiple GPUs on single node #13

Comments

ahilan83 commented Dec 25, 2020

manojpamk commented Dec 28, 2020