Multiple GPUs #726

gitkol · 2024-03-19T21:11:55Z

Hi,

Can rest.py run on multiple GPUs?

Thanks,

Istvan

ijpulidos · 2024-03-20T01:08:44Z

Hello! What do you mean by rest.py? Can you be more specific as to what your issue is?

gitkol · 2024-03-20T01:34:57Z

I am sorry, I mean the openmmtools script for replica exchange solute tempering. It works very well, but since every replica is running on the same GPU, it is slow. I was wondering if workstations with more than one GPU could be used to distribute the replicas. It is easy to run OpenMM jobs in such manner by specifying multiple device indices for the simulation platform, but I couldn’t find a similar option within openmmtools. Hope this makes my question more clear. Thank you very much, Istvan

…

On Tue, Mar 19, 2024 at 9:09 PM Iván Pulido ***@***.***> wrote: Hello! What do you mean by rest.py? Can you be more specific as to what your issue is? — Reply to this email directly, view it on GitHub <#726 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKDNJ2TEYT54GZXV6ZA4R7TYZDOTFAVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGQ4DEMJTGM> . You are receiving this because you authored the thread.Message ID: ***@***.***>

gitkol · 2024-03-20T03:13:25Z

I posted this question first on the OpenMM cookbook GitHub but Peter Eastman said that it belonged here. On Tue, Mar 19, 2024 at 9:34 PM Istvan Kolossvary ***@***.***> wrote:

…

I am sorry, I mean the openmmtools script for replica exchange solute tempering. It works very well, but since every replica is running on the same GPU, it is slow. I was wondering if workstations with more than one GPU could be used to distribute the replicas. It is easy to run OpenMM jobs in such manner by specifying multiple device indices for the simulation platform, but I couldn’t find a similar option within openmmtools. Hope this makes my question more clear. Thank you very much, Istvan On Tue, Mar 19, 2024 at 9:09 PM Iván Pulido ***@***.***> wrote: > Hello! What do you mean by rest.py? Can you be more specific as to what > your issue is? > > — > Reply to this email directly, view it on GitHub > <#726 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AKDNJ2TEYT54GZXV6ZA4R7TYZDOTFAVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBYGQ4DEMJTGM> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

xiaowei-xie2 · 2024-05-18T00:38:49Z

I have the same question. I am able to run multiple solute tempering REMD simulations in parallel with mpirun according to this issue(#648), but I don't know how to distribute replicas among multiple GPUs so that they contribute to the same REMD simulation.

gitkol · 2024-05-20T15:26:45Z

Yes, that was exactly my point. When I run such a simulation using openmmtools, the only option seems to be that all, say, 8 jobs run on the same GPU and that makes it prohibitively slow. I have not received any reply from the developers. (Plain OpenMM jobs can readily run on multiple GPUs.)

…

On Fri, May 17, 2024 at 8:39 PM Xiaowei Xie ***@***.***> wrote: I have the same question. I am able to run multiple solute tempering REMD simulations in parallel with mpirun according to this issue(#648 <#648>), but I don't know how to distribute replicas among multiple GPUs so that they contribute to the same REMD simulation. — Reply to this email directly, view it on GitHub <#726 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKDNJ2URGTQAN23ADOYXRSTZC2PK5AVCNFSM6AAAAABE6KT2HSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJYGUZTANBYGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

xiaowei-xie2 · 2024-05-20T22:57:59Z

Hi @gitkol, I think I figured it out. Do you have mpi4py installed correctly? For me, what I found was that, if I don't have mpi4py installed, mpirun would run multiple copies of the same REMD (each GPU would run a whole REMD but multiple GPUs running at the same time), but when I have mpi4py installed multiple GPUs will contribute to a single REMD. Here is an example of job files that worked for me (if it's helpful). On my system, using 4GPUs resulted in 2x speed up compared to 1 GPU (not 4x).
test_rest_14.tar.gz

ijpulidos · 2024-05-21T20:43:02Z

@xiaowei-xie2 is correct, having mpi4py is important in this case. Thank you for providing a test script that we can use to reproduce your results.

There's always some part of the code that cannot be fully parallelized, for example when communicating between the different GPUs. It would be interesting to see if we can accomplish some profiling to check where the overhead is. Thanks!

xiaowei-xie2 · 2024-05-21T22:39:46Z

Hi @ijpulidos, thank you for the insight. Yes I totally understand using n GPUs won't necessarily result in n times speed up (sometimes not even any speed up), so I am actually satisfied with the current performance. But yes it would be nice to see where the overhead is!

I am also curious does the current repo support parallelizing across multiple GPUs across multiple nodes?

ijpulidos · 2024-05-23T18:17:08Z

@xiaowei-xie2 It does support that, since everything is handled by the MPI environment. That also means it's highly dependent on the MPI setup of the system. Depending on the connectivity of your HPC system and the system being simulated it might make sense, or not, to do this.

We should try to come up with an example on how to accomplish this that people can use and add it to the docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple GPUs #726

Multiple GPUs #726

gitkol commented Mar 19, 2024

ijpulidos commented Mar 20, 2024

gitkol commented Mar 20, 2024 via email

gitkol commented Mar 20, 2024 via email

xiaowei-xie2 commented May 18, 2024

gitkol commented May 20, 2024 via email

xiaowei-xie2 commented May 20, 2024

ijpulidos commented May 21, 2024

xiaowei-xie2 commented May 21, 2024

ijpulidos commented May 23, 2024

Multiple GPUs #726

Multiple GPUs #726

Comments

gitkol commented Mar 19, 2024

ijpulidos commented Mar 20, 2024

gitkol commented Mar 20, 2024 via email

gitkol commented Mar 20, 2024 via email

xiaowei-xie2 commented May 18, 2024

gitkol commented May 20, 2024 via email

xiaowei-xie2 commented May 20, 2024

ijpulidos commented May 21, 2024

xiaowei-xie2 commented May 21, 2024

ijpulidos commented May 23, 2024