New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

accelerate sim_matrix process in multi-GPU #113

Open

zsnoob wants to merge 2 commits into ArrowLuo:master from zsnoob:master

zsnoob commented Nov 24, 2023

I edit two main things:

Deleting the "loss.mean()" that do nothing. DDP provides automatically gradient synchronization.
Refer to this comment, Batch Sharding Details openai/CLIP#132 (comment) we will do every similarity calculation locally. This will use all negative samples in global batch and positive samples in local batch, so local sim_matrix will be shaped in (batch_size / n_gpu, batch_size).

But this approach will cause another question that if the loss function always put the first local-batch-size columns as the diagonal elements, and local batch is not the first in global, the correct postive samples will locating at column range: local_rank * local_batch_size - (local_rank + 1) * local_batch_size. So i give the second parameter in torch.diag() which means the first positive sample's column.

By experiments, the model can converge as usual, and more efficient.


          accelerate sim_matrix process in multi-GPU

d55432f

zsnoob closed this

zsnoob reopened this

Author

zsnoob commented Nov 24, 2023

Also mentioned in this issue, #101 (comment).


          fix all gather

b8d3f70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment