Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can it use multi gpus to train ? #8

Open
wangwang110 opened this issue Dec 18, 2019 · 3 comments
Open

can it use multi gpus to train ? #8

wangwang110 opened this issue Dec 18, 2019 · 3 comments

Comments

@wangwang110
Copy link

No description provided.

@Serenade-J
Copy link

Yes.
My method is using tf.contrib.distribute in tensorflow-gpu 1.13
I met some problems with this method, spent several days and finally successfully train PIE on multi GPUs. So you can use other methods if you find them convenient.
Below is part of my code in word_edit_model.py, coping them may lead bugs because they are not the whole codes I changed in PIE.

from tensorflow.python.estimator.run_config import RunConfig
from tensorflow.python.estimator.estimator import Estimator
from tensorflow.contrib.distribute import AllReduceCrossDeviceOps
# ...
dist_strategy = tf.contrib.distribute.MirroredStrategy(
      num_gpus=FLAGS.n_gpus,
      cross_device_ops=AllReduceCrossDeviceOps('nccl', num_packs=FLAGS.n_gpus),
      # cross_device_ops=AllReduceCrossDeviceOps('hierarchical_copy'),
  )
  session_config = tf.ConfigProto(
      inter_op_parallelism_threads=0,
      intra_op_parallelism_threads=0,
      allow_soft_placement=True,
      gpu_options=tf.GPUOptions(allow_growth=True))

  run_config = RunConfig(
      train_distribute=dist_strategy,
      eval_distribute=dist_strategy,
      model_dir=FLAGS.output_dir,
      session_config=session_config,
      save_checkpoints_steps=FLAGS.save_checkpoints_steps,
      keep_checkpoint_max=15,
      )

@Serenade-J
Copy link

All the documents I referred to (for training PIE with multi GPUs) can be found online.

@binhetech
Copy link

All the documents I referred to (for training PIE with multi GPUs) can be found online.

Hi, could you share the whole codes you changed in word_edit_model.py? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants