Can we run data parallelism together with pipeline parallelism on GPU? #272

StevenShi-23 · 2022-01-12T03:46:10Z

Hi lingvo contributors,

Thanks for the prompt response to my previous ticket on docker version.

I want to run gpipe together with data parallelism on an 8x GPU server. I searched around and found that num_splits_per_client at program.py:80 seemed to determine the level of DP for TPU trainer.

    self.data_parallelism = p.num_splits_per_client

I set it to 2 expecting that it will run on two DP workers, each of them having 4 pipeline stages with GPipe. I used 1 GPU per stage, therefore I used 8 GPUs in total (2 DP x 4 PP). However, I observed that only the first 4 GPUs were active while the last 4 GPUs were idling, and the throughput was similar to that of 4-GPU non-DP baseline. I suspect that this parameter is not taking effect. A screenshot of GPU trace is attached below.

I would like to ask what is the correct way to set the parameters of DP using lingvo? And if possible, would you please provide some examples on using DP+Pipeline, perhaps in the run_distributed.py format from /docker?

Thank you!

The text was updated successfully, but these errors were encountered:

StevenShi-23 · 2022-01-12T03:58:17Z

GPU trace:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we run data parallelism together with pipeline parallelism on GPU? #272

Can we run data parallelism together with pipeline parallelism on GPU? #272

StevenShi-23 commented Jan 12, 2022

StevenShi-23 commented Jan 12, 2022

Can we run data parallelism together with pipeline parallelism on GPU? #272

Can we run data parallelism together with pipeline parallelism on GPU? #272

Comments

StevenShi-23 commented Jan 12, 2022

StevenShi-23 commented Jan 12, 2022