Skip to content

Why MultiGPU dp seems slower? #1005

Discussion options

You must be logged in to vote

you should double your batch size.
dp still has overhead in communication, so it won't be linear scaling.

also try ddp

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by Borda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1005 on December 23, 2020 19:12.