Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Index out of range on multi-GPU (8 gpus ) after first epoch #201

Open
akshitac8 opened this issue Feb 26, 2019 · 0 comments
Open

Index out of range on multi-GPU (8 gpus ) after first epoch #201

akshitac8 opened this issue Feb 26, 2019 · 0 comments

Comments

@akshitac8
Copy link

akshitac8 commented Feb 26, 2019

Expected results

Successful Training

Actual results

Detailed steps to reproduce

After Running the main and on completion of first epoch, I get an index out of range error with drop_last = False on

mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])

I tried to trace the error reason and came to know that after first epoch last 3 device ids i.e, 5,6,7 which is very weird behaviour.
E.g.:

CUDA_VISIBLE_DEVICES=4,5,6,7 python tools/train_net_step.py --dataset dota_patches --cfg configs/baselines/e2e_mask_rcnn_X-101-64x4d-FPN_2x.yaml --bs 8 --nw 8

System information

  • Operating system: ubuntu16.04
  • CUDA version: 9.0
  • cuDNN version: 7.0
  • GPU models (for all devices if they are not all the same):?
  • python version: 3.6
  • pytorch version: 0.4.0
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant