====== How to use the PACE cluster ======
The Georgia Tech Public Access Cluster Environment (PACE) is a cluster of
number-crunching Linux boxes. The Center for Nonlinear Science owns a few
of the machines in this cluster with 4 to 8 CPUs per node. PACE is useful
for running lots of long, CPU-intensive calculations and for running parallel
programs with MPI.
The work process is roughly this: you log on to PACE, submit a process to the
CNS job queue, wait until it's finished, and then look at the results. This usually
involves transferring files back and forth between your local computer and the PACE
cluster using scp (the Unix secure copy utility). It ain't always pretty but it
keeps your laptop from melting.
====== PACE documentation ======
[[http://www.pace.gatech.edu/facilities/pacecc/starting.php|The official PACE documentation page from OIT.]]
{{:gtspring2009:pc.png|}} temptation to Balkanization is irresistible, but there is a
[[http://www.cns.gatech.edu/CNS-only/PACE.html|CNS PACE homepage]] on the svn wwwcns repository,
so please enter general (not channelflow specific) PACE documentation there. --- //[[predrag.cvitanovic@physics.gatech.edu|Predrag Cvitanovic]] 2009-02-22 16:41//
{{:gtspring2009:gibson.png?24}} You might be the only person on the planet with enough goodwill,
patience, and CNS permissions to document via svn checkout, editing html, checkin, and recheckout
as www on zero. Maybe we can get some contributions by lowering the overhead with dokuwiki editing.
====== PACE login ======
{{:gtspring2009:pc.png|}} Your PACE login ID is your GTID, but I have to create an account for each user, because CNS gets charged for PACE usage. --- //[[predrag.cvitanovic@physics.gatech.edu|Predrag Cvitanovic]] 2009-02-22 16:46//
====== ssh tunneling ======
The denizens of CNS have written shell scripts that simplify the process of logging in to the CNS machines and the PACE cluster from outside Georgia
Tech, and using the svn repositories. You will probably want to install these: [[http://www.cns.gatech.edu/CNS-only/fetchTunnel.html|"Fetch CNS ssh tunneling scripts"]].
Ask old-timers if you do not know the CNS webpages password.
If you run into a problem and then manage to solve it, please edit [[http://www.cns.gatech.edu/CNS-only/fetchTunnel.html|"Fetch CNS ssh tunneling scripts"]] in svn wwwcns repository accordingly, so the next person has less trouble figuring this out...
Channelflow is hosted at [[http://svn.channelflow.org]], so you can access it directly with svn commands from anywhere --no ssh tunneling required. See the [[docs:install|Installation instructions]].
====== install channelflow ======
Install channelflow on the PACE cluster. This should be straightforward. Follow
the [[docs:install]] directions.
====== submit a job to the queue ======
The PACE cluster has PBS job-queue software that automatically distributes processes among the nodes (computers)
in the cluster (e.g. pace1, pace2, ...). You //must// use this system to start long-running jobs (anything that
runs for more than a few minutes), instead of logging onto a particular node, starting a job on the command-line,
and backgrounding it. If you do the latter it'll interfere with the queuing system and annoy other users and
the PACE administrators.
Below are two sets of instructions for submitting jobs to the queue. The first describes direct use of the
PBS ''qsub'' command. The second uses a shell script named ''qsubmit'' I wrote to enable one-line
queue submissions.
For further information see the ''qsub'' and ''qstat'' man pages on pacemaker.gatech.edu (type ''man qsub'' at the command-line)
and [[http://www.pace.gatech.edu/facilities/pacecc/starting.php?section=developing&topic=Submitting+Your+Jobs|The official
PACE documentation]].
===== using the PBS qsub command =====
To submit a job using the PBS queue software,
1. Log on to pacemaker.pace.edu.
2. Create a PBS job description file with a text editor (e.g. xemacs, vi)
and with a ''.pbs'' filename extension. I'll use the filename ''foo.pbs''.
Use the following as a template.
#PBS -N arnoldi-EQ9-Re380
#PBS -l nodes=1:ppn=1,walltime=72:00:00
#PBS -q pace-cns
#PBS -k oe
#PBS -m abe
cd /nv/hp1/jg356/simulations/narrowbox/arnoldi-EQ9/Re380
arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff
The lines in the above file specify
* -N a name for the job
* -l how many nodes, processors per node, and wallclock time the jobs is allowed
* -q the name of the queue the job should be submitted to
* -k save output and error files to ''~/arnoldi-EQ9-Re380.oXXXXX'' and ''.eXXXXX''
* -m send email when the job starts, stops, or is terminated
* ''cd /nv/hp1/jg356/....'' The Unix commands to execute to start the job
4. Submit the job using ''qsub''
jg356@pacemaker$ qsub foo.pbs
===== using the qsubmit shell script =====
1. Put the following bash function definition in your ''~/.bash_aliases'' or ''~/.bashrc'' file
on PACE.
function qsubmit() {
tag=$1
shift
echo "#PBS -N $tag" > tmp.pbs
echo "#PBS -l nodes=1:ppn=1,walltime=72:00:00" >> tmp.pbs
echo "#PBS -q pace-cns" >> tmp.pbs
echo "#PBS -k oe" >> tmp.pbs
echo "#PBS -m abe" >> tmp.pbs
echo "cd $(pwd)" >> tmp.pbs
echo $* >> tmp.pbs
cat tmp.pbs
qsub tmp.pbs
}
2. Load the new definition into your current shell (this will happen automatically the next time
you log in).
jg356@pacemaker$ source ~/.bash_aliases
3. Submit a job using the script
jg356@pacemaker$ qsubmit arnoldi-EQ9-Re380 arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff
The first argument after ''qsubmit'' is the job name and the remaining arguments specify what is to be run,
i.e. the Unix commands you would type is you were just running the process directly in the shell.
The ''qsubmit'' script works by creating a ''tmp.pbs'' file with some default values for wallclock time
etc and then submitting that file to ''qsub''. Modify it to your liking.
====== view the results ======
The standard output and standard error streams of your process (what is normally printed in
a terminal) will be saved in the files ''~/jobname.oXXXX'' and ''~jobname.eXXXX'' where XXXX
is a job ID number set by the PBS queueing system. Any data saved to disk will be placed in the
directory where the job was started.