The Georgia Tech Public Access Cluster Environment (PACE) is a cluster of number-crunching Linux boxes. The Center for Nonlinear Science owns a few of the machines in this cluster with 4 to 8 CPUs per node. PACE is useful for running lots of long, CPU-intensive calculations and for running parallel programs with MPI.
The work process is roughly this: you log on to PACE, submit a process to the CNS job queue, wait until it's finished, and then look at the results. This usually involves transferring files back and forth between your local computer and the PACE cluster using scp (the Unix secure copy utility). It ain't always pretty but it keeps your laptop from melting.
The official PACE documentation page from OIT.
temptation to Balkanization is irresistible, but there is a CNS PACE homepage on the svn wwwcns repository, so please enter general (not channelflow specific) PACE documentation there. — Predrag Cvitanovic 2009-02-22 16:41
You might be the only person on the planet with enough goodwill, patience, and CNS permissions to document via svn checkout, editing html, checkin, and recheckout as www on zero. Maybe we can get some contributions by lowering the overhead with dokuwiki editing.
Your PACE login ID is your GTID, but I have to create an account for each user, because CNS gets charged for PACE usage. — Predrag Cvitanovic 2009-02-22 16:46
The denizens of CNS have written shell scripts that simplify the process of logging in to the CNS machines and the PACE cluster from outside Georgia Tech, and using the svn repositories. You will probably want to install these: "Fetch CNS ssh tunneling scripts".
Ask old-timers if you do not know the CNS webpages password.
If you run into a problem and then manage to solve it, please edit "Fetch CNS ssh tunneling scripts" in svn wwwcns repository accordingly, so the next person has less trouble figuring this out…
Channelflow is hosted at http://svn.channelflow.org, so you can access it directly with svn commands from anywhere –no ssh tunneling required. See the Installation instructions.
Install channelflow on the PACE cluster. This should be straightforward. Follow the install directions.
The PACE cluster has PBS job-queue software that automatically distributes processes among the nodes (computers) in the cluster (e.g. pace1, pace2, …). You must use this system to start long-running jobs (anything that runs for more than a few minutes), instead of logging onto a particular node, starting a job on the command-line, and backgrounding it. If you do the latter it'll interfere with the queuing system and annoy other users and the PACE administrators.
Below are two sets of instructions for submitting jobs to the queue. The first describes direct use of the
PBS qsub
command. The second uses a shell script named qsubmit
I wrote to enable one-line
queue submissions.
For further information see the qsub
and qstat
man pages on pacemaker.gatech.edu (type man qsub
at the command-line)
and The official
PACE documentation.
To submit a job using the PBS queue software,
1. Log on to pacemaker.pace.edu.
2. Create a PBS job description file with a text editor (e.g. xemacs, vi)
and with a .pbs
filename extension. I'll use the filename foo.pbs
.
Use the following as a template.
#PBS -N arnoldi-EQ9-Re380 #PBS -l nodes=1:ppn=1,walltime=72:00:00 #PBS -q pace-cns #PBS -k oe #PBS -m abe cd /nv/hp1/jg356/simulations/narrowbox/arnoldi-EQ9/Re380 arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff
The lines in the above file specify
~/arnoldi-EQ9-Re380.oXXXXX
and .eXXXXX
cd /nv/hp1/jg356/….
The Unix commands to execute to start the job
4. Submit the job using qsub
jg356@pacemaker$ qsub foo.pbs
1. Put the following bash function definition in your ~/.bash_aliases
or ~/.bashrc
file
on PACE.
function qsubmit() { tag=$1 shift echo "#PBS -N $tag" > tmp.pbs echo "#PBS -l nodes=1:ppn=1,walltime=72:00:00" >> tmp.pbs echo "#PBS -q pace-cns" >> tmp.pbs echo "#PBS -k oe" >> tmp.pbs echo "#PBS -m abe" >> tmp.pbs echo "cd $(pwd)" >> tmp.pbs echo $* >> tmp.pbs cat tmp.pbs qsub tmp.pbs }
2. Load the new definition into your current shell (this will happen automatically the next time you log in).
jg356@pacemaker$ source ~/.bash_aliases
3. Submit a job using the script
jg356@pacemaker$ qsubmit arnoldi-EQ9-Re380 arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff
The first argument after qsubmit
is the job name and the remaining arguments specify what is to be run,
i.e. the Unix commands you would type is you were just running the process directly in the shell.
The qsubmit
script works by creating a tmp.pbs
file with some default values for wallclock time
etc and then submitting that file to qsub
. Modify it to your liking.
The standard output and standard error streams of your process (what is normally printed in
a terminal) will be saved in the files ~/jobname.oXXXX
and ~jobname.eXXXX
where XXXX
is a job ID number set by the PBS queueing system. Any data saved to disk will be placed in the
directory where the job was started.