Skip to content
This repository has been archived by the owner on Aug 30, 2023. It is now read-only.

Unable to train this on multiple GPU #24

Open
fliptrail opened this issue May 3, 2020 · 2 comments
Open

Unable to train this on multiple GPU #24

fliptrail opened this issue May 3, 2020 · 2 comments

Comments

@fliptrail
Copy link

Hello,
As the title suggests, I am unable to train this model on multiple gpu configuration. I am trying to train it on 4 RTX 2080 Ti.
It loads up the model only on the 1st GPU utilizing a memory of around 10.5 GB/11 GB
For the remaining GPU's, it is only utilizing a memory of 155 MB/11 GB.
Also, the training speed is independent of the number of GPU's selected by me using CUDA_VISIBLE_DEVICES. So, apparently it is only using the 1st GPU.
I tried diving in the code to find out the exact function multi_gpu_model, but everything seemed fine to me.
So, can you confirm or tell me how to train this implementation over multiple GPU's?

@fliptrail
Copy link
Author

I am encountering this exact issue on Tensorflow=2.0.0
tensorflow/tensorflow#30321
Possible solution is given here.

@ParikhKadam
Copy link
Owner

Yes.. the possible solution is in above mentioned link. Read more about "model parallelism vs data parallelism".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants