User Tools

Site Tools


gtspring2009:howto:pace

This is an old revision of the document!


A PCRE internal error occured. This might be caused by a faulty plugin

====== How to use the PACE cluster ====== FIXME This document is a mere stub of what it ought to be. The Georgia Tech Public Access Cluster Environment (PACE) is a cluster of number-crunching Linux boxes. The Center for Nonlinear Science owns a few of the machines in this cluster with 4 to 8 CPUs per node. PACE is useful for running lots of long, CPU-intensive calculations and for running parallel programs with MPI. The work process is roughly this: you log on to PACE, submit a process to the CNS job queue, wait until it's finished, and then look at the results. This usually involves transferring files back and forth between your local computer and the PACE cluster using scp (the Unix secure copy utility). It ain't always pretty but it keeps your laptop from melting. ====== PACE documentation ====== [[http://www.pace.gatech.edu/facilities/pacecc/starting.php|The official PACE documentation page from OIT.]] {{:gtspring2009:pc.png|}} temptation to Balkanization is irresistible, but there is a [[http://www.cns.gatech.edu/CNS-only/PACE.html|CNS PACE homepage]] on the svn wwwcns repository, so please enter general (not channelflow specific) PACE documentation there. --- //[[predrag.cvitanovic@physics.gatech.edu|Predrag Cvitanovic]] 2009-02-22 16:41// {{:gtspring2009:gibson.png?24}} You might be the only person on the planet with enough goodwill, patience, and CNS permissions to document via svn checkout, editing html, checkin, and recheckout as www on zero. Maybe we can get some contributions by lowering the overhead with dokuwiki editing. ====== PACE login ====== {{:gtspring2009:pc.png|}} Your PACE login ID is your GTID, but I have to create an account for each user, because CNS gets charged for PACE usage. --- //[[predrag.cvitanovic@physics.gatech.edu|Predrag Cvitanovic]] 2009-02-22 16:46// ====== ssh tunneling ====== The denizens of CNS have written [[http://www.cns.gatech.edu/CNS-only/index.html|a bunch of shell scripts]] that simplify the process of logging in to the CNS machines and the PACE cluster from outside Georgia Tech. You will probably want to install these and figure out how to use them. Ask oldtimers you do not know the CNS webpages password. ====== install channelflow ====== Install channelflow on the PACE cluster. This should be straightforward. Follow the [[docs:install]] directions. ====== submit a job to the queue ====== The PACE cluster has PBS job-queue software that automatically distributes processes among the nodes (computers) in the cluster (e.g. pace1, pace2, ...). You //must// use this system to start long-running jobs (anything that runs for more than a few minutes), instead of logging onto a particular node, starting a job on the command-line, and backgrounding it. If you do the latter it'll interfere with the queuing system and annoy other users and the PACE administrators. Below are two sets of instructions for submitting jobs to the queue. The first describes direct use of the PBS ''qsub'' command. The second uses a shell script named ''qsubmit'' I wrote to enable one-line queue submissions. For further information see the ''qsub'' and ''qstat'' man pages on pacemaker.gatech.edu (type ''man qsub'' at the command-line) and [[http://www.pace.gatech.edu/facilities/pacecc/starting.php?section=developing&topic=Submitting+Your+Jobs|The official PACE documentation]]. ===== using the PBS qsub command ===== To submit a job using the PBS queue software, 1. Log on to pacemaker.pace.edu. 2. Create a PBS job description file with a text editor (e.g. xemacs, vi) and with a ''.pbs'' filename extension. I'll use the filename ''foo.pbs''. Use the following as a template. <code> #PBS -N arnoldi-EQ9-Re380 #PBS -l nodes=1:ppn=1,walltime=72:00:00 #PBS -q pace-cns #PBS -k oe #PBS -m abe cd /nv/hp1/jg356/simulations/narrowbox/arnoldi-EQ9/Re380 arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff </code> The lines in the above file specify * -N a name for the job * -l how many nodes, processors per node, and wallclock time the jobs is allowed * -q the name of the queue the job should be submitted to * -k save output and error files to ''~/arnoldi-EQ9-Re380.oXXXXX'' and ''.eXXXXX'' * -m send email when the job starts, stops, or is terminated * ''cd /nv/hp1/jg356/....'' The Unix commands to execute to start the job 4. Submit the job using ''qsub'' <code> jg356@pacemaker$ qsub foo.pbs </code> ===== using the qsubmit shell script ===== 1. Put the following bash function definition in your ''~/.bash_aliases'' or ''~/.bashrc'' file on PACE. <code> function qsubmit() { tag=$1 shift echo "#PBS -N $tag" > tmp.pbs echo "#PBS -l nodes=1:ppn=1,walltime=72:00:00" >> tmp.pbs echo "#PBS -q pace-cns" >> tmp.pbs echo "#PBS -k oe" >> tmp.pbs echo "#PBS -m abe" >> tmp.pbs echo "cd $(pwd)" >> tmp.pbs echo $* >> tmp.pbs cat tmp.pbs qsub tmp.pbs } </code> 2. Load the new definition into your current shell (this will happen automatically the next time you log in). <code> jg356@pacemaker$ source ~/.bash_aliases </code> 3. Submit a job using the script <code> jg356@pacemaker$ qsubmit arnoldi-EQ9-Re380 arnoldi --flow -Na 100 -R 380 ../../continue-EQ9/Re380/ubest.ff </code> The first argument after ''qsubmit'' is the job name and the remaining arguments specify what is to be run, i.e. the Unix commands you would type is you were just running the process directly in the shell. The ''qsubmit'' script works by creating a ''tmp.pbs'' file with some default values for wallclock time etc and then submitting that file to ''qsub''. Modify it to your liking. ====== view the results ====== The standard output and standard error streams of your process (what is normally printed in a terminal) will be saved in the files ''~/jobname.oXXXX'' and ''~jobname.eXXXX'' where XXXX is a job ID number set by the PBS queueing system. Any data saved to disk will be placed in the directory where the job was started.

gtspring2009/howto/pace.1246298854.txt.gz · Last modified: 2009/06/29 11:07 by gibson