Skip to content
sartiran edited this page May 6, 2014 · 8 revisions

Pre-Setup (to be done on all the machines)

OS ubuntu Ubuntu 12.04.4 LTS

install generically needed packages:

sudo apt-get install g++ gpp kcc libssl-dev libxml2-dev libtool openssh-server munge make libmunge-dev libpam0g-dev

Installing/Configuring the head node

get the packages:

I've some already compiled torque packages that you can get here (If you want to compile and package it yourself, see) and a maui tarball that you can get here.

create a base home directory for users

sudo mkdir <users homes>

configure the users

depends on what you have (ldap, nis, ...). The important is to have the same users with the same uid/gid on all the machines. The home directory of each user should be under <users homes> (e.g. <users homes>/<users>)

install and configure nfs server

for exporting in rw the directory <users homes> to all the nodes'

sudo apt-get install nfs-kernel-server

add the rw export of <users homes> to all the worker nodes

sudo vi /etc/exports
sudo exportfs -a

install the torque server

get the tarball

mkdir torque
cd torque/
tar -xzf ../torque-ubuntu-12.04.4.tar.gz

run the installation scripts

sudo ./torque-package-server-linux-x86_64.sh --install
sudo ./torque-package-pam-linux-x86_64.sh --install
sudo ./torque-package-devel-linux-x86_64.sh --install
sudo ./torque-package-clients-linux-x86_64.sh --install
sudo ./torque-package-drmaa-linux-x86_64.sh --install
sudo ./torque-package-doc-linux-x86_64.sh --install

copy the init script

sudo cp contrib/init.d/debian.pbs_server /etc/init.d/pbs_server

need to correct the DAEMON and PBS_HOME paths

sudo vi /etc/init.d/pbs_server
sudo vi /etc/ld.so.conf.d/torque.conf
sudo cat /etc/ld.so.conf.d/torque.conf
/usr/lib
/usr/lib64
sudo ldconfig

sudo /usr/sbin/create-munge-key
sudo service munge start

some base configuration

Put here the name of the server

sudo vi /var/lib/torque/server_name 

put here the list of the nodes, one node per line, in the format <full hostname> np=<number of slots>

sudo vi /var/lib/torque/server_priv/nodes 

launch the setup script and start the daemons

sudo ./torque.setup torque-admin
sudo service pbs_server start

the setup script will create a very basic configuration (that you can customize afterward). To verify that it works:

qstat -q

server: <your server>
Queue            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
batch              --      --       --      --    0   0 --   E R
                                               ----- -----
                                                  0     0

pbsnodes

<node name>
     state = down
     np = <number of procs>
     ntype = cluster
     gpus = 0
....

install the maui server

sudo service pbs_server stop

tar -xzf maui-3.3.1.tar.gz
cd maui-3.3.1
./configure --prefix=/usr --with-pbs=/usr --with-spooldir=/var/spool/maui
make
sudo make install

get the iniit scirpt here and put it in /etc/init.d/maui. Change some paths in the init script (to point to the executable) and make it executable.

Start the pbs_server and then the maui server.

Installing and configuring the slave node

mount the <users homes>

I use autofs, for personal taste, but one can directly put the mount in fstab

sudo apt-get install nfs-common autofs
sudo vi /etc/auto.master
cat /etc/auto.master
#
# Sample auto.master file
# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# For details of the format look at autofs(5).
#
/misc   /etc/auto.misc
#
# NOTE: mounts done from a hosts map will be mounted with the
#       "nosuid" and "nodev" options unless the "suid" and "dev"
#       options are explicitly given.
#
/net    -hosts
#
# Include central master map if it can be found using
# nsswitch sources.
#
# Note that if there are entries for /net or /misc (as
# above) in the included master map any keys that are the
# same will not be seen as the first read key seen takes
# precedence.
#
+auto.master
/nfs       /etc/auto.nfs  -rw,noatime,hard

sudo vi /etc/auto.nfs
cat /etc/auto.nfs
<users homes>   -rw,soft,vers=3,rsize=32768,wsize=32768 <head node>:/<users homes>

sudo service autofs restart

sudo ln -s /nfs/users_homes /users_homes

create the same users than on the headnode

with same uid/gid

install the pbs_mom

mkdir torque
cd torque/
tar -xzf ../torque-ubuntu-12.04.4.tar.gz

sudo ./torque-package-mom-linux-x86_64.sh --install;
sudo ./torque-package-clients-linux-x86_64.sh --install;
sudo ./torque-package-pam-linux-x86_64.sh --install;
sudo ./torque-package-drmaa-linux-x86_64.sh --install;
sudo ./torque-package-devel-linux-x86_64.sh --install;
sudo ./torque-package-doc-linux-x86_64.sh --install

Add the same stuff as before to the ld config file

sudo  vi /etc/ld.so.conf.d/torque.conf
sudo ldconfig

Copy the init script and change the DAEMON and PBS_HOME paths

sudo cp contrib/init.d/debian.pbs_mom /etc/init.d/pbs_mom
sudo chmod +w /etc/init.d/pbs_mom
sudo vi /etc/init.d/pbs_mom

put here the name of the head node

sudo vi /var/lib/torque/server_name

sudo service pbs_mom start

Now one can check that the node is good running the following command on the headnode

pbsnodes
<node name>
     state = free
     np = <number of procs>
     ntype = cluster
     status =       
....

copy munge key from the headnode

sudo cp munge.key /etc/munge/
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo service munge start

verify that you can run qstat on the node

test submission and execution

to allow submission from a node you have to run on the headnode

qmgr -c 'set server authorized_users = *@'

the login as a user on the node. Create a simple script script.sh

pwd ls -l id uname -a

then run

chmod +x script.sh qsub -keo ./script.sh .llrtest02.in2p3.fr

you should see files script.sh.(o|e)<job id> in your home with the stdout and stderr of your job

Compile and package torque

download tarball (need to be registered)

here

untar in the home of the head node machine (or one of the computing nodes if the OS is the same)

tar -xzf torque-2.5.13.tar.gz
cd torque-2.5.13

run configure/make

./configure --prefix=/usr --mandir=/usr/share/man --libdir=/usr/lib64 --includedir=/usr/include --with-server-home=/var/lib/torque --with-default-server=localhost --enable-munge-auth --enable-munge-library --with-tcp-retry-limit=2 --disable-gui --with-tcl=no --with-tk=no --with-pam=yes --enable-syslog --enable-rpp --with-rcp=scp --enable-drmaa
make
make package

create a tar with everything you need

tar -czf torque-ubuntu-12.04.4.tar.gz torque-package-* torque.setup contrib

and put it somewhere we can download on all the machines.