Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference gives different results when using multiple gpus (distributed mode) vs just one gpu (not distributed mode) #147

Open
ThuongTNguyen opened this issue Dec 11, 2023 · 0 comments

Comments

@ThuongTNguyen
Copy link

Hi, I want to report an issue observed when running inference for a classification task.

Description

When running inference (either do_eval=True or do_predict=True), the results are different whether it's in distributed mode (multiple gpus ) or not (one gpu).

When doing evaluation, data is prepared sequentially in batches using SequentialSampler, BatchSampler, and DistributedBatchSampler

eval_sampler = SequentialSampler(len(eval_item.data))

and then sent to gpus. But once logits are computed, there is a step to gather results across devices - merge_distributed.
predicts = merge_distributed(predicts, len(eval_item.data))

After this step, the order of data instances is no longer similar to the one in the original input file (dev.tsv or test.tsv) in case of distributed modes with multiple gpus, resulting in a different accuracy.

Steps to reproduce

  • Use cola.sh to evaluate --init_model deberta-v3-large
  • Set other paramaters: --do_eval, --eval_batch_size 4, using 3 gpus CUDA_VISIBLE_DEVICES=7,5,6 or 1 gpu CUDA_VISIBLE_DEVICES=7
  • Check oder of instances after logit calculation is done and collected from all gpus. For example, print the first 10 instances in predicts and labels after the above merge_distributed
    • 1 gpu case - as expected according to the input file:

      • labels = [1 1 1 1 0 0 0 1 1 1]
      • predicts = [[-0.04428 -0.167 ]
        [-0.05667 -0.1713 ]
        [-0.0438 -0.1727 ]
        [-0.0396 -0.1794 ]
        [-0.03604 -0.1823 ]
        [-0.0433 -0.1809 ]
        [-0.01921 -0.1947 ]
        [-0.04788 -0.1741 ]
        [-0.05774 -0.1755 ]
        [-0.05173 -0.1692 ]]
      • accuracy = 0.3087248322147651
      • eval_loss = 0.7155781315204284
    • 3 gpu case:

      • labels = [1 1 1 1 1 1 0 0 0 1]
      • predicts =[[-0.04428 -0.167 ]
        [-0.05667 -0.1713 ]
        [-0.0438 -0.1727 ]
        [-0.0396 -0.1794 ]
        [-0.04428 -0.167 ]
        [-0.04428 -0.167 ]
        [-0.03604 -0.1823 ]
        [-0.0433 -0.1809 ]
        [-0.01921 -0.1947 ]
        [-0.04788 -0.1741 ]]
      • accuracy = 0.3231064237775647
      • eval_loss = 0.7120015648589737

Additional information

My system setup is:

  • PyTorch 1.10.0+cu113
  • 8 GPUs: NVIDIA GeForce GTX 1080 Ti
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant