Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel GPU training #6

Open
james-daily opened this issue Jul 24, 2020 · 1 comment
Open

Parallel GPU training #6

james-daily opened this issue Jul 24, 2020 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@james-daily
Copy link

james-daily commented Jul 24, 2020

Is parallel GPU training support possible? We would like to try this with a fairly large (multi-GB) dataset, but to make training time reasonable it would need to be done in parallel. Single node parallelism with DataParallel() would probably work for our use case, although the PyTorch documentation suggests that DistributedDataParallel() is preferred even for a single node.

Part of the motivation for this is that a large dataset needs a lot of memory, which in a cloud environment means a large, multi-GPU instance. It is very expensive to run such a large instance for weeks with all but one of the GPUs idle.

@vinid
Copy link
Contributor

vinid commented Jul 27, 2020

Hi!

Currently, we do not support parallel GPU training, I'm sorry. DataParallel shouldn't be too difficult to setup, but I have to explore better the thing since I do not have much experience in writing multi-gpus pytorch programs.

I'll give it a try next week, and see if (and how) it works.

@vinid vinid added enhancement New feature or request help wanted Extra attention is needed labels Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants