-
Notifications
You must be signed in to change notification settings - Fork 0
Setup Procedure
OS ubuntu Ubuntu 12.04.4 LTS
install generically needed packages:
sudo apt-get install g++ gpp kcc libssl-dev libxml2-dev libtool openssh-server munge make libmunge-dev libpam0g-dev
I've some already compiled torque packages that you can get here (If you want to compile and package it yourself, see) and a maui tarball that you can get here.
sudo mkdir <users homes>
depends on what you have (ldap, nis, ...). The important is to have the same users with the same uid/gid on all the machines.
The home directory of each user should be under <users homes>
(e.g. <users homes>/<users>
)
for exporting in rw the directory <users homes>
to all the nodes'
sudo apt-get install nfs-kernel-server
add the rw export of <users homes>
to all the worker nodes
sudo vi /etc/exports
sudo exportfs -a
get the tarball
mkdir torque
cd torque/
tar -xzf ../torque-ubuntu-12.04.4.tar.gz
run the installation scripts
sudo ./torque-package-server-linux-x86_64.sh --install
sudo ./torque-package-pam-linux-x86_64.sh --install
sudo ./torque-package-devel-linux-x86_64.sh --install
sudo ./torque-package-clients-linux-x86_64.sh --install
sudo ./torque-package-drmaa-linux-x86_64.sh --install
sudo ./torque-package-doc-linux-x86_64.sh --install
copy the init script
sudo cp contrib/init.d/debian.pbs_server /etc/init.d/pbs_server
need to correct the DAEMON and PBS_HOME paths
sudo vi /etc/init.d/pbs_server
sudo vi /etc/ld.so.conf.d/torque.conf
sudo cat /etc/ld.so.conf.d/torque.conf
/usr/lib
/usr/lib64
sudo ldconfig
sudo /usr/sbin/create-munge-key
sudo service munge start
Put here the name of the server
sudo vi /var/lib/torque/server_name
put here the list of the nodes, one node per line, in the format <full hostname> np=<number of slots>
sudo vi /var/lib/torque/server_priv/nodes
sudo ./torque.setup torque-admin
sudo service pbs_server start
the setup script will create a very basic configuration (that you can customize afterward). To verify that it works:
qstat -q
server: <your server>
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
batch -- -- -- -- 0 0 -- E R
----- -----
0 0
pbsnodes
<node name>
state = down
np = <number of procs>
ntype = cluster
gpus = 0
....
sudo service pbs_server stop
tar -xzf maui-3.3.1.tar.gz
cd maui-3.3.1
./configure --prefix=/usr --with-pbs=/usr --with-spooldir=/var/spool/maui
make
sudo make install
get the iniit scirpt here and put it in /etc/init.d/maui
. Change some paths in the init script (to point to the executable) and make it executable.
Start the pbs_server and then the maui server.
I use autofs, for personal taste, but one can directly put the mount in fstab
sudo apt-get install nfs-common autofs
sudo vi /etc/auto.master
cat /etc/auto.master
#
# Sample auto.master file
# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# For details of the format look at autofs(5).
#
/misc /etc/auto.misc
#
# NOTE: mounts done from a hosts map will be mounted with the
# "nosuid" and "nodev" options unless the "suid" and "dev"
# options are explicitly given.
#
/net -hosts
#
# Include central master map if it can be found using
# nsswitch sources.
#
# Note that if there are entries for /net or /misc (as
# above) in the included master map any keys that are the
# same will not be seen as the first read key seen takes
# precedence.
#
+auto.master
/nfs /etc/auto.nfs -rw,noatime,hard
sudo vi /etc/auto.nfs
cat /etc/auto.nfs
<users homes> -rw,soft,vers=3,rsize=32768,wsize=32768 <head node>:/<users homes>
sudo service autofs restart
sudo ln -s /nfs/users_homes /users_homes
with same uid/gid
mkdir torque
cd torque/
tar -xzf ../torque-ubuntu-12.04.4.tar.gz
sudo ./torque-package-mom-linux-x86_64.sh --install;
sudo ./torque-package-clients-linux-x86_64.sh --install;
sudo ./torque-package-pam-linux-x86_64.sh --install;
sudo ./torque-package-drmaa-linux-x86_64.sh --install;
sudo ./torque-package-devel-linux-x86_64.sh --install;
sudo ./torque-package-doc-linux-x86_64.sh --install
Add the same stuff as before to the ld config file
sudo vi /etc/ld.so.conf.d/torque.conf
sudo ldconfig
Copy the init script and change the DAEMON and PBS_HOME paths
sudo cp contrib/init.d/debian.pbs_mom /etc/init.d/pbs_mom
sudo chmod +w /etc/init.d/pbs_mom
sudo vi /etc/init.d/pbs_mom
put here the name of the head node
sudo vi /var/lib/torque/server_name
sudo service pbs_mom start
Now one can check that the node is good running the following command on the headnode
pbsnodes
<node name>
state = free
np = <number of procs>
ntype = cluster
status =
....
copy munge key from the headnode
sudo cp munge.key /etc/munge/
sudo chown munge:munge /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo service munge start
verify that you can run qstat
on the node
to allow submission from a node you have to run on the headnode
qmgr -c 'set server authorized_users = *@'
the login as a user on the node. Create a simple script script.sh
pwd ls -l id uname -a
then run
chmod +x script.sh qsub -keo ./script.sh .llrtest02.in2p3.fr
you should see files script.sh.(o|e)<job id>
in your home with the stdout and stderr of your job
tar -xzf torque-2.5.13.tar.gz
cd torque-2.5.13
./configure --prefix=/usr --mandir=/usr/share/man --libdir=/usr/lib64 --includedir=/usr/include --with-server-home=/var/lib/torque --with-default-server=localhost --enable-munge-auth --enable-munge-library --with-tcp-retry-limit=2 --disable-gui --with-tcl=no --with-tk=no --with-pam=yes --enable-syslog --enable-rpp --with-rcp=scp --enable-drmaa
make
make package
tar -czf torque-ubuntu-12.04.4.tar.gz torque-package-* torque.setup contrib
and put it somewhere we can download on all the machines.