Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simulation environment of multi node #18

Open
jiaduxie opened this issue Sep 10, 2020 · 125 comments
Open

simulation environment of multi node #18

jiaduxie opened this issue Sep 10, 2020 · 125 comments

Comments

@jiaduxie
Copy link

Do you have a recommended configuration tutorial for the multi node nest simulation environment? It can also be the brief steps of environment configuration and the required installation package.

@jiaduxie jiaduxie changed the title multi node nest simulation environment simulation environment of multi node Sep 10, 2020
@jarsi
Copy link
Collaborator

jarsi commented Sep 10, 2020

Hi jiaduxie,

could you provide a little bit more details about the problems you are running in?

The Readme.md of this repository has already some points on how to set up the environment and run the model.

The exact steps you have to take depend on the cluster you are using. What kind of job scheduler does it use? Is Python 3 available? Does it have MPI support? Have you already installed NEST?

A rough sketch:

  1. Manually compile NEST on your cluster, make sure python and MPI is supported.
    Do not use the conda version (no MPI nor OpenMP support). Use an official release (NEST master has features which are not implemented in this repository yet). Depending on your cluster you need to load some packages (eg. Python, MPI...)

  2. Make sure to install all python packages listed in requirements.txt.
    Run:
    pip3 install -r requirements.txt
    If the cluster does not allow this, try:
    pip3 install --user -r requirements.txt

  3. Inform the job scheduler on your system and how to run the job
    You will need to copy the file config_template.py to config.py. Change base_path to the absolute path of the multi-area repository. Change data_path to the path where you want to store the output of the simulation. Adapt the jobscript_template to your system. If it is SLURM, then there is already an example in place which you will could uncomment and try to use. Make sure to also load all packages that a multi-node simulation requires (eg. MPI). Change submit_cmd from None to the command your job scheduler uses. For SLURM it is sbatch.

  4. Try to run run_example_fullscale.py
    Now you should be able to run
    python run_example_fullscale.py.
    This will set up the simulation environment, do all the preprocessing and finally submit the job to the cluster. Depending on your cluster you might want to change num_processes and local_num_threads.

If all of this works, you should be ready to run your own experiments.

I hope this helps!
Best,
Jari

@jiaduxie
Copy link
Author

Oh,thinks Jari. I run the model in NEST of conda environment .But I installed the conda version of nest with MPI version,it also no?I try install NEST from source code,The MPI in NEST manually installed by myself conflict with the local MPI of the server?I'll ask you if you have any questions.

@jarsi
Copy link
Collaborator

jarsi commented Sep 10, 2020

I am only aware of a conda version of NEST which does not have MPI support. But maybe it exists.

To check whether your NEST version supports MPI and OpenMP, could you run in your environment the following command and post the output:

python -c "import nest; nest.Simulate(1.)"

My conda installed NEST gives me in the start_updating_ information that neither MPI nor OpenMP is available:

Sep 10 16:07:01 SimulationManager::start_updating_ [Info]:
Number of local nodes: 0
Simulation time (ms): 1
Not using OpenMP
Not using MPI

Concerning manual compilation. How did you try to compile NEST? Could you post what steps you have tried so far?

@jiaduxie
Copy link
Author

I haven't started trying to compile manually.
I run the following in my conda environment,and the output is as follows:

$python -c "import nest; nest.Simulate(1.)"

Creating default RNGs
Creating new default global RNG
-- N E S T --
Copyright (C) 2004 The NEST Initiative
Version: nest-2.18.0
Built: Jan 27 2020 12:49:17

This program is provided AS IS and comes with
NO WARRANTY. See the file LICENSE for details.

Problems or suggestions?
Visit https://www.nest-simulator.org
Type 'nest.help()' to find out more about NEST.

Sep 10 22:20:26 NodeManager::prepare_nodes [Info]:
Preparing 0 nodes for simulation.

Sep 10 22:20:26 SimulationManager::start_updating_ [Info]:
Number of local nodes: 0
Simulation time (ms): 1
Number of OpenMP threads: 1
Number of MPI processes: 1

Sep 10 22:20:26 SimulationManager::run [Info]:
Simulation finished.

@jarsi
Copy link
Collaborator

jarsi commented Sep 10, 2020

It seems alright. Have you installed the packages from requirements.txt? Have you tired running a simulation?

@jiaduxie
Copy link
Author

Yes,I have installed the packages from requirements.txt?Can you help me see if the command to execute multi-node simulation is like this?:
mpirun -hostfile hostfile python run_example_downscaled.py

The hostfile is following:
work0 slots = 2
work1 slots = 2

@jarsi
Copy link
Collaborator

jarsi commented Sep 25, 2020

I have no experience with hostfiles, but it looks reasonable to me. Have you adjusted num_processes and local_num_threads in the sim_dict? Have you tried running it? Did it work?

The run_example_downscaled.py is meant to be run on a local machine, for example a laptop. If you would like to experiment on a compute cluster you should exchange M.simulation.simulate() with start_job(M.simulation.label, submit_cmd, jobscript_template) (see run_example_fullscale.py) and additionally import:

from start_jobs import start_job
from config import submit_cmd, jobscript_template

In this case you need to invoke the script serially:

python run_example.py

The parallelized part is then specified in the jobscript_template in config.py.

@jiaduxie
Copy link
Author

jiaduxie commented Oct 9, 2020

Hei,jarsi .If run a complete model on a cluster of two servers, about how much memory each machine needs to support?

@jarsi
Copy link
Collaborator

jarsi commented Oct 9, 2020

The model consumes approximately 1 TB of memory. So with two servers each server would need to provide 500 GB.

@jiaduxie
Copy link
Author

jiaduxie commented Oct 9, 2020

Okay, thank you. Then,when you run the entire model, you use several servers and how much memory each is.

@jiaduxie
Copy link
Author

Hei,jarsi.In what system are you running multiple nodes in parallel.My system is ubuntu, slurm configuration is not good.Do you have any guidance on configuring the environment?

@jarsi
Copy link
Collaborator

jarsi commented Oct 12, 2020

Hi, we do not set up the systems ourselves. We use for example JURECA from the Forschungszentrum Juelich It has everything we need already installed. What kind of system are you using?

@jiaduxie
Copy link
Author

I am a server under linux system, the release version is ubuntu.In addition to running under JURECA, do you have your own running on a general server?

@jiaduxie
Copy link
Author

Hi,jarsi,I am now simulating a small network for testing on two machines, and run it with the following command. It seems that the two machines run by themselves without interaction.

`mpirun.mpich -np 2 -host work0,work1 python ./multi_test.py`

In addition,Have you run his model of multi-area-model in your own cluster environment?

@jarsi
Copy link
Collaborator

jarsi commented Oct 28, 2020

This is weird. Have you adjusted the num_processes or local_num_threads variable in the sim_params dictionary? An example of how to do this is shown in the run_example_fullscale.py file. In your case you should set num_processes=2. These variables are needed in order to inform NEST about distributed computing.

Maybe you could also post what is in your multi_test.py file?

I have run the model on a local cluster. I usually just need to modify the run_example_fullscale.py and config.py to my own needs.

@jiaduxie
Copy link
Author

multi_test.py:

from nest import *
SetKernelStatus({"total_num_virtual_procs": 4})
pg = Create("poisson_generator", params={"rate": 50000.0})
n = Create("iaf_psc_alpha", 4)
sd = Create("spike_detector", params={"to_file": True})
Connect(pg, [n[0]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[0]], [n[1]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[1]], [n[2]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect([n[2]], [n[3]], syn_spec={'weight': 1000.0, 'delay': 1.0})
Connect(n, sd)
Simulate(100.0)

@jarsi
Copy link
Collaborator

jarsi commented Oct 28, 2020

This is difficult for me to debug. On my machine I can run this without running into errors. It works with the conda installed nest (conda create --name nest_conda -c conda-forge 'nest-simulator=*=mpi_openmpi*' python) and with nest compiled from source. I suspect there might be a problem with the host file. Unfortunately I do not know a lot about those, usually the system administrators take care of this.

On you machine, are you using any resource manager such as e.g., SLURM, PBS/Torque, LSF, etc. Or are you responsible for defining everything correctly using hostfiles? What kind of system are you using?

@jiaduxie
Copy link
Author

The cluster environment I use is composed of nine ordinary server machines. The system is Linux, and the release version number is debain.You run this model on a supercomputer, right? Have you ever run in your own environment? Is it necessary to install SLURM resource scheduling system?I also had a lot of problems in the process of installing SLURM, so I won't install it.

@jarsi
Copy link
Collaborator

jarsi commented Oct 28, 2020

It is not necessary to install SLURM. But I have most experience with it as all clusters I have used so far had SLURM installed. Installing a resource manager is not trivial and should be the job of a system admin, not the user. Do you have a system administrator you could ask for help? How do other people run distributed jobs on this cluster?

Could you also try the following commands and report whether something changes:
mpiexec -np 2 -host work0,work1 python ./multi_test.py

mpirun -np 2 -host work0,work1 python ./multi_test.py

@jiaduxie
Copy link
Author

Because my cluster environment here is composed of general servers, there is no resource scheduling system such as SLURM installed. It seems that the command you said can not complete the simulation well.

(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ mpiexec -np 2 -host work0,work1 python multi_test.py
bash: orted: command not found

ORTE was unable to reliably start one or more daemons.
This usually is caused by:

  • not finding the required libraries and/or binaries on
    one or more nodes. Please check your PATH and LD_LIBRARY_PATH
    settings, or configure OMPI with --enable-orterun-prefix-by-default

  • lack of authority to execute on one or more specified nodes.
    Please verify your allocation and authorities.

  • the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
    Please check with your sys admin to determine the correct location to use.

  • compilation of the orted with dynamic libraries when static are required
    (e.g., on Cray). Please check your configure cmd line and consider using
    one of the contrib/platform definitions for your system type.

  • an inability to create a connection back to mpirun due to a
    lack of common network interfaces and/or no route found between
    them. Please check network connectivity (including firewalls
    and network routing requirements).


(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ mpirun -np 2 -host work0,work1 python ./multi_test.py
bash: orted: command not found

ORTE was unable to reliably start one or more daemons.
This usually is caused by:

  • not finding the required libraries and/or binaries on
    one or more nodes. Please check your PATH and LD_LIBRARY_PATH
    settings, or configure OMPI with --enable-orterun-prefix-by-default

  • lack of authority to execute on one or more specified nodes.
    Please verify your allocation and authorities.

  • the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
    Please check with your sys admin to determine the correct location to use.

  • compilation of the orted with dynamic libraries when static are required
    (e.g., on Cray). Please check your configure cmd line and consider using
    one of the contrib/platform definitions for your system type.

  • an inability to create a connection back to mpirun due to a
    lack of common network interfaces and/or no route found between
    them. Please check network connectivity (including firewalls
    and network routing requirements).


@jarsi
Copy link
Collaborator

jarsi commented Oct 28, 2020

Just to make sure, you are using nest installed via conda, right?

What do the following commands give you:
conda list
which mpirun
which mpiexec
which mpirun.mpich

@jiaduxie
Copy link
Author

Yes, I installed nest under conda.I seem to have installed it

(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ conda list
llvm-meta 7.0.0 0 conda-forge
matplotlib 3.3.0 pypi_0 pypi
mpi 1.0 openmpi conda-forge
mpi4py 3.0.3 py38h246a051_1 conda-forge
ncurses 6.2 he1b5a44_1 conda-forge
nest-simulator 2.18.0 mpi_openmpi_py38h72811e1_7 conda-forge
nested-dict 1.61 pypi_0 pypi
numpy 1.19.1 py38h8854b6b_0 conda-forge
openmp 7.0.0 h2d50403_0 conda-forge
openmpi 4.0.4 hdf1f1ad_0 conda-forge
openssh 8.3p1 h5957347_0 conda-forge
openssl 1.1.1g h516909a_1 conda-forge
pandas 1.1.0 py38h950e882_0 conda-forge

(pynest_mpi) work@lyjteam-server:/xjd/nest_multi_test$ which mpirun
/home/work/anaconda3/envs/pynest_mpi/bin/mpirun
(pynest_mpi) work@lyjteam-server:
/xjd/nest_multi_test$ which mpiexec
/home/work/anaconda3/envs/pynest_mpi/bin/mpiexec
(pynest_mpi) work@lyjteam-server:~/xjd/nest_multi_test$ which mpirun.mpich

@jarsi
Copy link
Collaborator

jarsi commented Oct 28, 2020

Ok thanks, the output of the last command is missing.

Using conda list you can see that nest is linked against open MPI. This is one of many MPI libraries. the command mpirun.mpich, to my understanding, instructs mpi to use the mpich version of MPI. This is different from the open MPI version that nest is linked against. These two versions are not compatible, as we can also see when you use mpirun.mpich. Both mpiexec and mpirun are installed inside of your conda environment and should be compatible with nest. I don't understand why you get the error message when using these.

@jarsi
Copy link
Collaborator

jarsi commented Oct 28, 2020

Maybe you could also check the output of:
mpirun --version
mpirun.mpich --version

@jiaduxie
Copy link
Author

(pynest_mpi) work@lyjteam-server:/xjd/nest_multi_test$ which mpirun.mpich
/usr/bin/mpirun.mpich
(pynest_mpi) work@lyjteam-server:
/xjd/nest_multi_test$ mpirun --version
mpmpirun (Open MPI) 4.0.4
Report bugs to http://www.open-mpi.org/community/help/
(pynest_mpi) work@lyjteam-server:/xjd/nest_multi_test$ mpirun.mpich --version
HYDRA build details:
Version: 3.3a2
Release Date: Sun Nov 13 09:12:11 MST 2016
CC: gcc -Wl,-Bsymbolic-functions -Wl,-z,relro
CXX: g++ -Wl,-Bsymbolic-functions -Wl,-z,relro
F77: gfortran -Wl,-Bsymbolic-functions -Wl,-z,relro
F90: gfortran -Wl,-Bsymbolic-functions -Wl,-z,relro
Configure options: '--disable-option-checking' '--prefix=/usr' '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-libfabric' '--enable-shared' '--enable-fortran=all' '--disable-rpath' '--disable-wrapper-rpath' '--sysconfdir=/etc/mpich' '--libdir=/usr/lib/x86_64-linux-gnu' '--includedir=/usr/include/mpich' '--docdir=/usr/share/doc/mpich' '--with-hwloc-prefix=system' '--enable-checkpointing' '--with-hydra-ckpointlib=blcr' 'CPPFLAGS= -Wdate-time -D_FORTIFY_SOURCE=2 -I/build/mpich-O9at2o/mpich-3.3
a2/src/mpl/include -I/build/mpich-O9at2o/mpich-3.3a2/src/mpl/include -I/build/mpich-O9at2o/mpich-3.3a2/src/openpa/src -I/build/mpich-O9at2o/mpich-3.3a2/src/openpa/src -D_REENTRANT -I/build/mpich-O9at2o/mpich-3.3a2/src/mpi/romio/include' 'CFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'CXXFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong -Wformat -Werror=format-security -O2' 'FFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong -O2' 'FCFLAGS= -g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong -O2' 'build_alias=x86_64-linux-gnu' 'MPICHLIB_CFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'MPICHLIB_CXXFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong -Wformat -Werror=format-security' 'MPICHLIB_FFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong' 'MPICHLIB_FCFLAGS=-g -O2 -fdebug-prefix-map=/build/mpich-O9at2o/mpich-3.3a2=. -fstack-protector-strong' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro' 'FC=gfortran' 'F77=gfortran' 'MPILIBNAME=mpich' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'LIBS=' 'MPLLIBNAME=mpl'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs cobalt
Checkpointing libraries available: blcr
Demux engines available: poll select

@jarsi
Copy link
Collaborator

jarsi commented Oct 28, 2020

I think the problem is that once the jobs start to run on a node the mpi library cannot be found. This is because the PATH and LD_LIBRARY_PATH are not exported. Could you try the following:

mpirun --prefix /home/work/anaconda3/envs/pynest_mpi/bin -np 2 -host work0,work1 python ./multi_test.py

@jarsi
Copy link
Collaborator

jarsi commented Oct 30, 2020

Hi, have you made progress?

I think the problems you are seeing are related to your mpi libraries. As the conda nest is compiled against openMPI, you must also use openMPI and not mpich. This means that mpirun should be the command you should use. But we are seeing that this does not work. My guess is, that once nest starts to run on the nodes it does not find the correct MPI library, gets confused and the nest instances run independently because they do not know how to use MPI. According to the OpenMPI FAQ you can try several things.

  1. Specify which mpi library to use via --prefix. I think in my previous message there might have been an error in the prefix.
  • mpirun --prefix /home/work/anaconda3/envs/pynest_mpi -np 2 -host work0,work1 python ./multi_test.py
  1. Specify which mpi library to use via using the complete openMPI path
  • /home/work/anaconda3/envs/pynest_mpi/bin/mpirun -np 2 -host work0,work1 python ./multi_test.py
  1. Add the following to ~/.profile
export PATH=/home/work/anaconda3/envs/pynest_mpi/bin:$PATH
export LD_LIBRARY_PATH=/home/work/anaconda3/envs/pynest_mpi/:$LD_LIBRARY_PATH

Does any of these approaches work or change the error message?

@jiaduxie
Copy link
Author

I've tried it and it's still not good. Did you use conda to install nest or compile from source code?

@jarsi
Copy link
Collaborator

jarsi commented Nov 12, 2020

The total model has approximately 4 million neurons. The formula for downscaling is N_scaling * 4 million = 0.243* 4 million = 0.972 million.

I also posted a modified version of this script above. It addresses this.

@jiaduxie
Copy link
Author

According to this ratio, the number of synapses is 1 billion, right?

@jiaduxie
Copy link
Author

I checked the json file and calculated that the number of neurons is 1 million and the number of synapses is 1 billion

@jiaduxie
Copy link
Author

What problem did your revised version solve?

@jarsi
Copy link
Collaborator

jarsi commented Nov 12, 2020

You asked how I would start the simulation. This is the way I think you should do it. But it is just a suggestion.

It prepares the simulation with one process and one thread. It prepares the simulation such that multiple mpi processes can work easily on the data (eg every process has its own configuration file. This solves concurrent data access problems). When everything is finished the job is submitted and all processes can start their work.

@jiaduxie
Copy link
Author

I also need to use the script to submit the compressed version now? Just as you said below:

# Template for job scripts
jobscript_template = '''
# Instruction for the queuing system

. /home/users/miniconda3/etc/profile.d/conda.sh
conda activate multi_area_model

mpirun -np {num_processes} python {base_path}/run_simulation.py {label} {network_label}'''

# Command to submit jobs on the local cluster
submit_cmd = 'bash'

@jarsi
Copy link
Collaborator

jarsi commented Nov 12, 2020

Ideally you do not need to worry about this at all. This part: start_job(M.simulation.label, submit_cmd, jobscript_template) in the script I posted above should do the work for you.

@jiaduxie
Copy link
Author

Yes,I think you modified the file multiarea_helpers.py a month ago. I don't know what the modification of this file will change?

@jarsi
Copy link
Collaborator

jarsi commented Nov 12, 2020

It is explained in the corresponding pull request. The inhibitory synaptic weights were scaled wrongly.

@jiaduxie
Copy link
Author

Then the result I run is different from the previous one? I'm doing an experiment, but there is a previous one, and the number of neuron pulses activated is different from the previous one.

@jiaduxie
Copy link
Author

Hi,jarsiI want to recompile and install NEST now. What aspects should I pay attention to? Does the python version need to specify which version to run the multi-area-model model? There is also a version of mpi.the guide

@jarsi
Copy link
Collaborator

jarsi commented Nov 13, 2020

Ideally the following commands are sufficient.

cmake -DCMAKE_INSTALL_PREFIX:PATH=</install/path> -Dwith-mpi=ON </path/to/NEST/src>
make
make install

If it compiles you can check if it works via make installcheck. I think the existing guides are quite verbose and good. Make sure you use python 3.

@jiaduxie
Copy link
Author

I have compiled it once, and I want to delete it and install it again. Is there any good way?

@jiaduxie
Copy link
Author

Hi,jarsi.I have not configured a multi-node simulation environment. This work is very important to me. Can you give me the contact information of the supercomputer center, such as email?

@jiaduxie
Copy link
Author

Hi,jarsi.Can you provide me with an email from the manager of the Forschungszentrum Juelich supercomputer center?I want them to help me with environment configuration.

@jarsi
Copy link
Collaborator

jarsi commented Dec 18, 2020

These system administrators are responsible for Juelich machines. They won't have time to take care of machines they are not responsible for.

Have you extensively googled your errors? Have you asked your colleagues how they simulate on multiple nodes? Can you talk to the person who installed the cluster? If all of this does not help you can ask for help with a detailed explanation for example on stackoverflow.com

Have you tried compiling NEST without conda, only with system libraries and system python? Have you successfully ran a mpi test program (not NEST) across nodes and can prove that this works and that your problem is (or is not) nest related? l Are you affiliated with any organization (e.g. university) or project (e.g. human brain project) which might provide compute resources?

@jiaduxie
Copy link
Author

Hi,jarsi.The input value of I_e is zero when running the complete mode(approx. 4.13 million neurons and 24.2 billion synapses)l? Is there only a few activation pulses after running?

@jarsi
Copy link
Collaborator

jarsi commented Dec 29, 2020

in multi_area_model.py add_DC_drive is set to 0. If either K_scaling or N_scaling is not equal to 1 add_DC_drive is adjusted to make up for spikes that would be there in the fullscale model. The value from add_DC_drive is then used for I_e. So in a fullscale scenario I would expect I_e to be 0.

Is it working now?

@jiaduxie
Copy link
Author

No,multi-node has not run the simulation now.I want to use my own simulator to realize this macaque brain model.I think your implementation uses a Poisson distribution generator as an external input. I now use add_DC_drive as an external input (DC is added to I_e), instead of using a Poisson distribution generator as an external input('poisson_input': False). Is the implementation model correct?So in a fullscale scenario I want to add_DC_drive as I_e,Is this correct?

@jiaduxie
Copy link
Author

jiaduxie commented Jan 4, 2021

Hi,jarsi.Now,I use add_DC_drive as an external input (DC is added to I_e), instead of using a Poisson distribution generator as an external input('poisson_input': False).It's ok?

            if not self.network.params['input_params']['poisson_input']:
                K_ext = self.external_synapses[pop]
                W_ext = self.network.W[self.name][pop]['external']['external']
                tau_syn = self.network.params['neuron_params']['single_neuron_dict']['tau_syn_ex']
                DC = K_ext * W_ext * tau_syn * 1.e-3 * \
                    self.network.params['input_params']['rate_ext']
                I_e += DC
            nest.SetStatus(gid, {'I_e': I_e})

@jiaduxie
Copy link
Author

Hello,jarsi.Now I can use the program to run multiple nodes and run through the code under the nest platform. But I can't run your code. I installed slurm. Can you help me?

@jiaduxie
Copy link
Author

runing :

(pynest) [root@ctlwork01 multi-area-model] # python run_example_fullscale.py 
[INFO] [2021.1.28 10:26:40 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:217 @ Network::create_rngs_] : Creating default RNGs
[INFO] [2021.1.28 10:26:40 /home/conda/feedstock_root/build_artifacts/nest-simulator_1580129123254/work/nestkernel/rng_manager.cpp:260 @ Network::create_grng_] : Creating new default global RNG

              -- N E S T --
  Copyright (C) 2004 The NEST Initiative

 Version: nest-2.18.0
 Built: Jan 27 2020 12:49:17

 This program is provided AS IS and comes with
 NO WARRANTY. See the file LICENSE for details.

 Problems or suggestions?
   Visit https://www.nest-simulator.org

 Type 'nest.help()' to find out more about NEST.

Initializing network from dictionary.
RAND_DATA_LABEL 3446
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning:Mean of empty slice.
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning:invalid value encountered in double_scalars
No R installation, taking hard-coded SLN fit parameters.


========================================
Customized parameters
--------------------
{'K_scaling': 1.0,
 'N_scaling': 1.0,
 'connection_params': {'K_stable': '/home/work/multi-area-model_9nodes/multi-area-model/K_stable.npy',
                       'av_indegree_V1': 3950.0,
                       'fac_nu_ext_5E': 1.125,
                       'fac_nu_ext_6E': 1.41666667,
                       'fac_nu_ext_TH': 1.2,
                       'g': -11.0},
 'input_params': {'rate_ext': 10.0},
 'neuron_params': {'V0_mean': -150.0, 'V0_sd': 50.0}}
========================================
/home/work/anaconda3/envs/pynest/lib/python3.8/site-packages/dicthash/dicthash.py:47: UserWarning:Float too small for safe conversion tointeger. Rounding down to zero.
Simulation label: a28ad18ff0c6b060b8e47187d3592b21
Copied files.
Initialized simulation class.

Jan 28 10:26:47 ModelManager::clear_models_ [Info]: 
    Models will be cleared and parameters reset.

Jan 28 10:26:47 Network::create_rngs_ [Info]: 
    Deleting existing random number generators

Jan 28 10:26:47 Network::create_rngs_ [Info]: 
    Creating default RNGs

Jan 28 10:26:47 Network::create_grng_ [Info]: 
    Creating new default global RNG
Iteration: 0
Mean-field theory predicts an average rate of 3.729 spikes/s across all populations.
Submitted batch job 225

sacct job -> output

(pynest) [root@ctlwork01 multi-area-model] # sacct -j 225
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
225          multi-are+ computerP+       root        176     FAILED      1:0 
225.batch         batch                  root        176     FAILED      1:0 

@jarsi
Copy link
Collaborator

jarsi commented Jan 28, 2021

Normally slurm jobs produce a stdout and stderr file. See this and the following line. There you find more information why your job failed. Could you please post the output?

@jiaduxie
Copy link
Author

There is no output. The following two labels are printed by me. There is no error output in the. O file, so I can't find the cause of the error.

echo {label} 
echo {network_label}
(pynest) [root@cmpwork002 e1d019b3cdbbaef181f9679c7ecc984b] # cat e1d019b3cdbbaef181f9679c7ecc984b.217.o 
e1d019b3cdbbaef181f9679c7ecc984b
da4e0764b4a3d0c8a3d3687dfa9c5ae4

@jiaduxie
Copy link
Author

jiaduxie commented Feb 1, 2021

Hi,jarsi.Is my file configuration and operation right?
run command is :python run_example_fuallscale.py
my config.py is:

# Absolute path of repository
base_path = '/home/work/multi-area-model_9nodes/multi-area-model'
# Place to store simulations
data_path = '/home/work/multi-area-model_9nodes/multi-area-model/simulations'
# Template for jobscripts #SBATCH --ntasks={num_processes}    #SBATCH --ntasks-per-node=9  ,cmpwork003,cmpwork004,cmpwork005,cmpwork006,cmpwork007,cmpwork008,cmpwork009,cmpwork010
jobscript_template = """#!/bin/bash 
#SBATCH -J multi
#SBATCH -o {sim_dir}/{label}.%j.o
#SBATCH -e {sim_dir}/{label}.%j.e
#SBATCH --partition=computerPartiton
#SBATCH --exclusive
#SBATCH -N 9
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=80
#SBATCH -w cmpwork002

echo {label} 
echo {network_label}

/home/work/anaconda3/envs/pynest/bin/mpirun -n 9 -host cmpwork002,cmpwork003,cmpwork004,cmpwork005,cmpwork006,cmpwork007, cmpwork008,cmpwork009,cmpwork010 -mca btl_tcp_if_include enp39s0f0 python run_simulation.py {label} {network_label}"""

# Command to submit jobs on the local cluster  -mca btl_tcp_if_include enp39s0f0   #SBATCH --ntasks=1
submit_cmd = 'sbatch'

run_example_fuallscale.py is:

import numpy as np
import os

from multiarea_model import MultiAreaModel
from start_jobs import start_job
from config import submit_cmd, jobscript_template
from config import base_path

"""
Example script showing how to simulate the multi-area model
on a cluster.

We choose the same configuration as in
Fig. 3 of Schmidt et al. (2018).

"""

"""
Full model. Needs to be simulated with sufficient
resources, for instance on a compute cluster.
"""
d = {}
conn_params = {'g': -11.,
               'K_stable': os.path.join(base_path, 'K_stable.npy'),
               'fac_nu_ext_TH': 1.2,
               'fac_nu_ext_5E': 1.125,
               'fac_nu_ext_6E': 1.41666667,
               'av_indegree_V1': 3950.}
input_params = {'rate_ext': 10.}
neuron_params = {'V0_mean': -150.,
                 'V0_sd': 50.}
network_params = {'N_scaling': 1.,
                  'K_scaling': 1.,
                  'connection_params': conn_params,
                  'input_params': input_params,
                  'neuron_params': neuron_params}

sim_params = {'t_sim': 1000.,
              'num_processes': 9,
              'local_num_threads': 80,
              'recording_dict': {'record_vm': False}}

theory_params = {'dt': 0.1}

M = MultiAreaModel(network_params, simulation=True,
                   sim_spec=sim_params,
                   theory=True,
                   theory_spec=theory_params)
p, r = M.theory.integrate_siegert()
print("Mean-field theory predicts an average "
      "rate of {0:.3f} spikes/s across all populations.".format(np.mean(r[:, -1])))
start_job(M.simulation.label, submit_cmd, jobscript_template)

@jiaduxie
Copy link
Author

jiaduxie commented Feb 1, 2021

e1d019b3cdbbaef181f9679c7ecc984b.238.e is :

(pynest) [root@cmpwork002 e1d019b3cdbbaef181f9679c7ecc984b] # cat e1d019b3cdbbaef181f9679c7ecc984b.238.e
+ /home/work/anaconda3/envs/pynest/bin/mpirun -n 9 -host cmpwork002,cmpwork003,cmpwork004,cmpwork005,cmpwork006,cmpwork007,cmpwork008,cmpwork009,cmpwork010 -mca btl_tcp_if_include enp39s0f0 python -u /home/work/multi-area-model_9nodes/multi-area-model/run_simulation.py e1d019b3cdbbaef181f9679c7ecc984b da4e0764b4a3d0c8a3d3687dfa9c5ae4

@atiye-nejad
Copy link

Hello,
I have such problem with mpi. I am using nest through conda, and when I run my code I faced with this error:
You seem to be using NEST via an MPI launcher like mpirun, mpiexec or srun although NEST was not compiled with MPI support. Please see the NEST documentation about parallel and distributed computing. Exiting.
I write these:
which mpirun which mpiexec which mpirun.mpich and get this result:
/home/atine/miniconda3/envs/nest_env/bin/mpirun /home/atine/miniconda3/envs/nest_env/bin/mpiexec /usr/bin/mpirun.mpich
mpirun --version gives me mpirun (Open MPI) 4.1.5
mpirun.mpich --version gives me
HYDRA build details: Version: 4.0 Release Date: Fri Jan 21 10:42:29 CST 2022 CC: gcc -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro Configure options: '--with-hwloc-prefix=/usr' '--with-device=ch4:ofi' 'FFLAGS=-O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -fallow-invalid-boz -fallow-argument-mismatch' '--prefix=/usr' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' Process Manager: pmi Launchers available: ssh rsh fork slurm ll lsf sge manual persist Topology libraries available: hwloc Resource management kernels available: user slurm ll lsf sge pbs cobalt Demux engines available: poll select

Can you help me to solve the problem?thanks

@steffengraber
Copy link
Collaborator

Hi @atiye-nejad the conda package is not built with mpi support. To activate it, I recommend you to install NEST from source. Please take a look at the documentation here:
https://nest-simulator.readthedocs.io/en/stable/installation/developer.html#dev-install

@atiye-nejad
Copy link

but because of some problems especially with nestml, nest experts recommended me installing nest through conda.

@heplesser
Copy link
Collaborator

@atiye-nejad It is generally not a good idea to raise new issues at the end of an existing (and different issue). That makes it difficult to follow up properly, and experts might not even notice your post. I saw that you asked a very closely related question on the NEST User mailing list and I suggest we continue the discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants