Data parallelism across multiple GPUs #121

dennybritz · 2017-03-28T18:34:15Z

Allow the user to replicate the model on multiple GPUs. Still WIP and untested.

dennybritz · 2017-03-28T23:24:50Z

This code works, but it is currently very slow. Need to very the op placement on different GPUs to figure out why it is slow.

dennybritz · 2017-03-28T23:25:01Z

Ref #44

dennybritz · 2017-03-29T18:24:46Z

A few things:

Variables need to be placed on CPU
Optimizer ops and preprocessing need to be on CPU

It's probably cleaner to put this into the Estimator class. For example, subclass estimator and add support for model replicas.

bhack · 2017-04-11T20:19:11Z

See also tensorflow/tensorflow#2126

SvensBigData · 2017-05-04T15:19:20Z

I get the following error on this branch:

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'save/ShardedFilename_1': Could not satisfy explicit device specification '/device:GPU:1' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Identity: CPU
ShardedFilename: CPU
[[Node: save/ShardedFilename_1 = ShardedFilename[_device="/device:GPU:1"](save/StringJoin, save/ShardedFilename_1/shard, save/num_shards)]]

hrishikeshvganu · 2017-05-10T15:20:32Z

@dennybritz : wanted to know if the branch is usable now. If there are some specific ToDos I can help with the implementation

dennybritz added 7 commits March 28, 2017 11:33

First stab at data parallelism across multiple GPUs

6496c92

Use get_variable

db06226

Use template build for parallel model

b7e70b4

Do not test multi-device support here - it takes too long

2c8153c

Allow device placement logging, optionally colocate gradients

82efd61

Slice labels in each replica

9d030f1

Fix variable scopes for backwards compatibility

98f7501

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data parallelism across multiple GPUs #121

Data parallelism across multiple GPUs #121

dennybritz commented Mar 28, 2017

dennybritz commented Mar 28, 2017

dennybritz commented Mar 28, 2017

dennybritz commented Mar 29, 2017

bhack commented Apr 11, 2017

SvensBigData commented May 4, 2017

hrishikeshvganu commented May 10, 2017

Data parallelism across multiple GPUs #121

Are you sure you want to change the base?

Data parallelism across multiple GPUs #121

Conversation

dennybritz commented Mar 28, 2017

dennybritz commented Mar 28, 2017

dennybritz commented Mar 28, 2017

dennybritz commented Mar 29, 2017

bhack commented Apr 11, 2017

SvensBigData commented May 4, 2017

hrishikeshvganu commented May 10, 2017