How to use the PACE cluster

The Georgia Tech Public Access Cluster Environment (PACE) is a cluster of number-crunching Linux boxes. The Center for Nonlinear Science owns a few of the machines in this cluster with 4 to 8 CPUs per node. PACE is useful for running lots of long, CPU-intensive calculations and for running parallel programs with MPI.

The work process is roughly this: you log on to PACE, submit a process to the CNS job queue, wait until it's finished, and then look at the results. This usually involves transferring files back and forth between your local computer and the PACE cluster using scp (the Unix secure copy utility). It ain't always pretty but it keeps your laptop from melting.

PACE documentation

The official PACE documentation page from OIT.

temptation to Balkanization is irresistible, but there is a CNS PACE homepage on the svn wwwcns repository, so please enter general (not channelflow specific) PACE documentation there. — Predrag Cvitanovic 2009-02-22 16:41

You might be the only person on the planet with enough goodwill, patience, and CNS permissions to document via svn checkout, editing html, checkin, and recheckout as www on zero. Maybe we can get some contributions by lowering the overhead with dokuwiki editing.

PACE login

Your PACE login ID is your GTID, but I have to create an account for each user, because CNS gets charged for PACE usage. — Predrag Cvitanovic 2009-02-22 16:46

ssh tunneling

The denizens of CNS have written shell scripts that simplify the process of logging in to the CNS machines and the PACE cluster from outside Georgia Tech, and using the svn repositories. You will probably want to install these: "Fetch CNS ssh tunneling scripts".

Ask old-timers if you do not know the CNS webpages password.

If you run into a problem and then manage to solve it, please edit "Fetch CNS ssh tunneling scripts" in svn wwwcns repository accordingly, so the next person has less trouble figuring this out…

Channelflow is hosted at http://svn.channelflow.org, so you can access it directly with svn commands from anywhere –no ssh tunneling required. See the Installation instructions.

install channelflow

Install channelflow on the PACE cluster. This should be straightforward. Follow the install directions.

submit a job to the queue

The PACE cluster has PBS job-queue software that automatically distributes processes among the nodes (computers) in the cluster (e.g. pace1, pace2, …). You must use this system to start long-running jobs (anything that runs for more than a few minutes), instead of logging onto a particular node, starting a job on the command-line, and backgrounding it. If you do the latter it'll interfere with the queuing system and annoy other users and the PACE administrators.

Below are two sets of instructions for submitting jobs to the queue. The first describes direct use of the PBS qsub command. The second uses a shell script named qsubmit I wrote to enable one-line queue submissions.

For further information see the qsub and qstat man pages on pacemaker.gatech.edu (type man qsub at the command-line) and The official PACE documentation.

using the PBS qsub command

To submit a job using the PBS queue software,

1. Log on to pacemaker.pace.edu.

2. Create a PBS job description file with a text editor (e.g. xemacs, vi) and with a .pbs filename extension. I'll use the filename foo.pbs. Use the following as a template.

#PBS -N arnoldi-EQ9-Re380
#PBS -l nodes=1:ppn=1,walltime=72:00:00
#PBS -q pace-cns
#PBS -k oe
#PBS -m abe
cd /nv/hp1/jg356/simulations/narrowbox/arnoldi-EQ9/Re380
arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff

The lines in the above file specify

-N a name for the job
-l how many nodes, processors per node, and wallclock time the jobs is allowed
-q the name of the queue the job should be submitted to
-k save output and error files to ~/arnoldi-EQ9-Re380.oXXXXX and .eXXXXX
-m send email when the job starts, stops, or is terminated
cd /nv/hp1/jg356/…. The Unix commands to execute to start the job

4. Submit the job using qsub

jg356@pacemaker$ qsub foo.pbs

using the qsubmit shell script

1. Put the following bash function definition in your ~/.bash_aliases or ~/.bashrc file on PACE.

function qsubmit() {
  tag=$1
  shift
  echo "#PBS -N $tag" > tmp.pbs
  echo "#PBS -l nodes=1:ppn=1,walltime=72:00:00" >> tmp.pbs
  echo "#PBS -q pace-cns" >> tmp.pbs
  echo "#PBS -k oe"   >> tmp.pbs
  echo "#PBS -m abe"  >> tmp.pbs
  echo "cd $(pwd)"    >> tmp.pbs
  echo $*             >> tmp.pbs
  cat tmp.pbs
  qsub tmp.pbs
}

2. Load the new definition into your current shell (this will happen automatically the next time you log in).

jg356@pacemaker$ source ~/.bash_aliases

3. Submit a job using the script

jg356@pacemaker$ qsubmit arnoldi-EQ9-Re380 arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff

The first argument after qsubmit is the job name and the remaining arguments specify what is to be run, i.e. the Unix commands you would type is you were just running the process directly in the shell.

The qsubmit script works by creating a tmp.pbs file with some default values for wallclock time etc and then submitting that file to qsub. Modify it to your liking.

view the results

The standard output and standard error streams of your process (what is normally printed in a terminal) will be saved in the files ~/jobname.oXXXX and ~jobname.eXXXX where XXXX is a job ID number set by the PBS queueing system. Any data saved to disk will be placed in the directory where the job was started.

channelflow.org

User Tools

Site Tools

Table of Contents