Skip to content

Parallel Analysis Using PBS Job Scheduler

LNov edited this page May 23, 2019 · 13 revisions

This tutorial is specific to the PBS Job Scheduler but can be used as a template and adapted to work with other job scheduling systems.

Dynamically generate an example PBS script named parallel_analysis_using_PBS_example.pbs:

network_size = 10

# Define PBS script
bash_lines = '\n'.join([
    '#! /bin/bash',
    # set project name
    '#PBS -P ProjectName',
    # set job name
    '#PBS -N JobName',
    # choose number of cores and memory
    '#PBS -l select=1:ncpus=1:mem=1GB',
    # set walltime hh:mm:ss
    '#PBS -l walltime=01:00:00',
    # set job array numbers to match network size
    '#PBS -J 0-{}'.format(network_size),
    # load Python
    'module load python/3.7.3',
    # if necessary, activate local environment where IDTxl is installed
    'source /ProjectName/idtxl_env/bin/activate',
    # run analysis on single target
    'python analyse_single_target.py $PBS_ARRAY_INDEX'
    ])

# Generate and save PBS script file
bash_script_name = 'parallel_analysis_using_PBS_example.pbs'
with open(bash_script_name, 'w', newline='\n') as bash_file:
    bash_file.writelines(bash_lines)

The job array can be submitted directly on the cluster from the command line interface using the command qsub parallel_analysis_using_PBS_example.pbs. It is also possible to submit jobs dynamically from python as follows:

from subprocess import call
call(('qsub {0}').format(bash_script_name), shell=True, timeout=None)

The PBS script will call the python script analyse_single_target.py multiple times (one time for each target). On each call, the target number will be passed as an argument. This is a template for the python script analyse_single_target.py:

# analyse_single_target.py

import sys
from idtxl.multivariate_te import MultivariateTE
from idtxl.data import Data
import pickle

# Read parameters from shell call
target_id = int(sys.argv[1])

# Load time series
time_series = ...

# Initialise Data object and set dim_order to reflect your data
dat = Data(time_series, dim_order='psr')

# Initialise analysis object and define settings
network_analysis = MultivariateTE()
settings = ...

# Run analysis
res = network_analysis.analyse_single_target(settings, dat, target_id)

# Save results dictionary using pickle
path = 'my_directory/res.{}.pkl'.format(str(target_id))
pickle.dump(res, open(path , 'wb'))

The single target results can then be combined as shown in the Combine Single Target tutorial.