Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running speaker embeeding training on multiple GPUs on single node #13

Open
ahilan83 opened this issue Dec 25, 2020 · 1 comment
Open

Comments

@ahilan83
Copy link

Hello,
Thanks for sharing the PYtorch code for embedding training.
If we look at thepytorch_xvectors/pytorch_run.sh,
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1
train_xent.py exp/xvector_nnet_1a/egs/
If we look at the above line,it seems like you are training the DNN on using single GPU. Is it possible to train using multiple gpus?

Further if we look at the train_utils.py script,
def prepareModel(args):
elif args.trainingMode == 'init':
net.to(device)
net = torch.nn.parallel.DistributedDataParallel(net,
device_ids=[0],
output_device=0)
if torch.cuda.device_count() > 1:
print("Using ", torch.cuda.device_count(), "GPUs!")
net = nn.DataParallel(net)

Why we are using both torch.nn.parallel.DistributedDataParallel and net = nn.DataParallel(net) ?
When I tried to train, it's training using single GPU. How it needs to modified to train on multiple gpus?

I look forward to hearing from you.

Thanks.

K. Ahilan

@manojpamk
Copy link
Owner

Hello,

I think the code can be run on multiple GPUs using DataParallel, but I haven't figured out how to do the same since I did not have access to a node with multiple GPUs in my university cluster.

I use DistributedDataParallel since this spawns multiple processes in a single GPU which greatly improves training time. This feature was particularly useful since I had access to a single V100 node, and each process was ~4GB.

I have included the if statement for multiple GPU check as a debug option in case I ever got access to a multiple GPU node, but that never happened 😄

I'll leave this issue open in case if someone figures out how to do this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants