GPU-Util is low When use multi-GPUs #26

LTlitong · 2020-05-13T08:46:44Z

Hello,

I want to train on multi-GPUs, and I try 8, 4 and 2 gpus. But the GPU-Util of some gpus are low, almost 0%. An epoch training time on 8 gpus is almost 20 minutes longer than on a single gpu.

Your code sets the GPU default num as 4. But when I try 4 cards, there is also one card's GPU-Util always 0%. There is no 0% GPU-Util on the two cards, but the GPU-Util of one of the cards is still 20%.
This is GPU Usage when training on 4 cards:

I am not very clear about shard. I want to ask whether need to modify the code to train on multi-GPUs and accelerate the training ?

Looking forward to your reply！

ehsk · 2020-05-14T01:18:43Z

You mentioned you ran the code with 1 or 2 GPUs. Did you have this problem in those runs too? I suggest turning on log_device in the config file and compare the single GPU run with 4/8 GPUs run.

I haven't had this problem before, although GPU-util was around 50-60% for all GPUs.

LTlitong · 2020-05-14T13:16:23Z

Thanks for your reply！

The GPU-util was 70-80% when run with 1 GPU. And it was 50% and 20% respectively when run with 2 GPUs. But there is always a gpu which GPU-util is 0% all the time. I turn on log_device to get the device mapping, and I have sent you an email.
Moreover, I also wanna ask whether your experiment results in paper are averaged over 3 datasets(3/4/5 turn Reddit)? Because I run all epochs but the result is different from the paper. Could you please provide your results on each dataset?

ehsk · 2020-05-21T16:49:06Z

Sorry for the late reply.

Have you set CUDA_VISIBLE_DEVICES? Based on the log you sent, no tensor was assigned to one of the GPUs.
All the results in the paper are reported based on the 3-turn dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU-Util is low When use multi-GPUs #26

GPU-Util is low When use multi-GPUs #26

LTlitong commented May 13, 2020

ehsk commented May 14, 2020

LTlitong commented May 14, 2020

ehsk commented May 21, 2020

GPU-Util is low When use multi-GPUs #26

GPU-Util is low When use multi-GPUs #26

Comments

LTlitong commented May 13, 2020

ehsk commented May 14, 2020

LTlitong commented May 14, 2020

ehsk commented May 21, 2020