Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on multiple GPUs #74

Open
V-Marco opened this issue Oct 13, 2023 · 1 comment
Open

Running on multiple GPUs #74

V-Marco opened this issue Oct 13, 2023 · 1 comment

Comments

@V-Marco
Copy link

V-Marco commented Oct 13, 2023

Hello,

Can NEST GPU automatically utilize multiple GPUs when running with SLURM? I have a two Tesla T4 setup on a single node configured with gres, and I found that for a single simulation NEST GPU only uses one of them even when the load goes up to 100%. I was wondering if my SLURM configuration is wrong, or if NEST GPU is designed to run a single simulation on a single GPU only.

@JoseJVS
Copy link
Collaborator

JoseJVS commented Oct 16, 2023

Hi,

It would be very helpful if you could send us the python script of the simulation as well as the batch script (or SLURM arguments) you are using to schedule the job.

However, here are some initial insights, to use multiple GPUs you need to use multiple MPI processes, currently we use the CUDA_VISIBLE_DEVICES environment variable to assign different GPUs for each MPI process, if you are using OpenMPI you can achieve this with export CUDA_VISIBLE_DEVICES=$OMPI_COMM_WORLD_LOCAL_RANK (other MPI implementations may use other variable to store the local rank).

Another possible issue might be that you are not using the RemoteCreate and RemoteConnect functions to instantiate your network, these are necessary in a multi-process environment to correctly allocate nodes and connections to different processes. You can find an example of such instantiation in the HPC Benchmark model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants