Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SLURM batch script #157

Open
Babalion opened this issue Dec 14, 2022 · 2 comments
Open

Add SLURM batch script #157

Babalion opened this issue Dec 14, 2022 · 2 comments

Comments

@Babalion
Copy link

Is your feature request related to a problem?

The idea was already mentioned in #52 but there a complete redesign of the MPI launcher was discussed.
My feature request is shorter and more goal-oriented.

I would suggest adding a bash script that simply launches an MPI job in syncro mode in a multi-CPU environment.
Maybe the code in quimb-mpi-slurm has to be changed slightly.

Describe the solution you'd like

I've already tested a lot but still had no success.

What actually works is running a job in non-MPI mode on one CPU with 48 threads.
The SLURM batch script looks then like this:

#!/bin/bash
# Number of nodes to allocate
#SBATCH --nodes=1
# Number of MPI instances (ranks) to be executed per node
#SBATCH --ntasks-per-node=1
# Number of threads per MPI instance
#SBATCH --cpus-per-task=48
# Allocate 8 GB memory per node
#SBATCH --mem=8gb
# Maximum run time of job
#SBATCH --time=24:00:00
# Give job a reasonable name
#SBATCH --job-name=mps_for_plots
# File name for standard output (%j will be replaced by job id)
#SBATCH --output=mps_for_plots-%j.out
# File name for error output
#SBATCH --error=mps_for_plots-%j.err


#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export HOME=~

module load compiler/intel/19.1.2
module load mpi/impi
module load devel/valgrind
module load numlib/mkl/2020.2
srun $(ws_find conda)/conda/envs/quimbPet/bin/python ~/MasterThesis/012-facilitationWithPhonons/mpsPhonons.py

But instead I want to execute the job not only on one CPU, but on different CPUs (nodes).

Describe alternatives you've considered

My script looks as follows atm:
Also, the commented-out lines at the end were not successful, where I already tried to change the start code to quimb-mpi-slurm.

#!/bin/bash
# Number of nodes to allocate
#SBATCH --nodes=1
# Number of MPI instances (ranks) to be executed per node
# #SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
# Number of threads per MPI instance
#SBATCH --cpus-per-task=2
# Allocate ... memory per node
# #SBATCH --mem-per-cpu=8gb
#SBATCH --mem=1gb

# Maximum run time of job
#SBATCH --time=02:00:00
# Give job a reasonable name
#SBATCH --job-name=phonon_mps
# File name for standard output (%j will be replaced by job id)
#SBATCH --output=phonon_mps-%j.out
# File name for error output
#SBATCH --error=phonon_mps-%j.err
# Send status mails to user
### SBATCH --mail-type=ALL
### SBATCH --mail-user=chris.nill@student.uni-tuebingen.de

#export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# clean up all modules
module purge

module load compiler/intel
module load mpi/impi
#module load compiler/gnu
#module load mpi/openmpi
module load devel/valgrind
module load numlib/mkl/2020.2

# activate conda
source $( ws_find conda )/conda/etc/profile.d/conda.sh
conda activate quimbPet


srun $(ws_find conda)/conda/envs/quimbPet/bin/python ~/mpsPhonons.py

# or:
# $(ws_find conda)/conda/envs/quimbPet/bin/quimb-mpi-slurm  -l "srun --mpi=pmix_v3" --np ${SLURM_NPROCS} --syncro ~/mpsPhonons.py
# $(ws_find conda)/conda/envs/quimbPet/bin/quimb-mpi-python  -l "mpiexec" --syncro ~/mpsPhonons.py

Code of quimb-mpi-slurm:

#!/bin/bash


POSITIONAL=()
while [[ $# -gt 0 ]]
do
key="$1"

case $key in
    -h|--help)
    echo "Run a python script that uses quimb, eagerly launching
with mpi, rather than dynamically spawning MPI processes.

    Usage:
        quimb-mpi-python [OPTIONS]... [SCRIPT]...

    Options:
        -n, --np <NUM_PROCS>
            How many mpi processes to use, defaults to
            letting the MPI launcher decide.
        -l, --launcher <MPI_LAUNCHER>
            How to launch the python process, defaults
            to 'mpiexec'. Can add mpi options here.
        -s, --syncro
            Launch in syncro mode, where all processes
            run the script, splitting work up only when
            a MPIPool is encountered.
        -h, --help
            Show this help.

Note that in syncro mode, *all* functions called outside
of the mpi pool must be pure to ensure syncronization.
    "
    exit 0
    ;;
    -n|--np)
    num_procs="$2"
    shift
    shift
    ;;
    -s|--syncro)
    export QUIMB_SYNCRO_MPI=YES
    shift
    ;;
    -l|--launcher)
    mpi_launcher="$2"
    shift
    shift
    ;;
    "-")
    shift
    break
    ;;
    *)    # unknown option
    POSITIONAL+=("$1") # save it in an array for later
    shift # past argument
    ;;
esac
done
set -- "${POSITIONAL[@]}" # restore positional parameters

mpi_launcher=${mpi_launcher:-"mpiexec"}

# set up environment
export OMP_NUM_THREADS=1
export _QUIMB_MPI_LAUNCHED="MANUAL"

if [ $QUIMB_SYNCRO_MPI ]; then  # use simplistic syncronized pool
    if [ $num_procs ]; then
        echo "Launching quimb in Syncro mode with ${mpi_launcher} and ${num_procs} processes."
        srun --mpi=pmix_v3 python "$@"
        #srun -n ${SLURM_NTASKS} python "$@"
    else
        echo "Launching quimb in Syncro mode with ${mpi_launcher}".
        ${mpi_launcher} python "$@"
    fi
else  # run script with mpi through mpi4py module
    if [ $num_procs ]; then
        echo "Launching quimb in mpi4py.futures mode with ${mpi_launcher} and ${num_procs} processes."
        ${mpi_launcher} --np "${num_procs}" python -m mpi4py.futures "$@"
    else
        echo "Launching quimb in mpi4py.futures mode with ${mpi_launcher}."
        ${mpi_launcher} python -m mpi4py.futures "$@"
    fi
fi

Additional context

No response

@jcmgray
Copy link
Owner

jcmgray commented Jan 19, 2023

Hi @Babalion, sorry to be slow getting to this. I do now think it would make sense for quimb to basically just have two modes:

  1. threaded
  2. 'synchro' style / i.e. usual MPI.

Getting rid of the other modes would hopefully negate the need for any quimb-mpi-python launcher at all. Then it would just be a matter of documenting how to fit it into usual MPI scripts/workflows, and taking some care with environment variables that control threading level.

What was not working / did you change in the launcher above?

@Babalion
Copy link
Author

Yes indeed, the documentation for launching a script in MPI mode could definitely be improved.
Your idea sounds good and furthermore may also solve #52.

I still had no success to run the script on the SLURM cluster in MPI mode.
Maybe we could reduce the quimb-mpi-python to a minimal working script only supporting the --syncro mode?
Then testing and debugging would be way easier for me.

Moreover, is it in general possible to utilize MPI for the TEBD algorithm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants