Using the Cluster

The recommended way to run jobs on the cluster is to use the queuing system (Sun Grid Engine or SGE).    Interactive usage is only allowed for short test runs on select nodes. 

For interactive use or to submit jobs, use egs01, egs02, atlas03 or atlas04.  Interactive jobs can also be run using the "qsh" command - starts up an X-term session on a node.

For user to sbumit batch jobs, please note that our cluster has been configured 6 major queues  ( atlas.q,  egs.q,   egsshort.q,  ottawahospital.q,  theory.q,  guest.q )  for our major groups ( atlas, egs, ottawahospital, theory, exo, sno, ilc, guest ). Users in each group are only allowed to submit jobs to their own queue.  Each queue contians their own nodes with high priority and another group nodes but with low priority. Please see more deatil as follows.

 1. atlas.q is configured as all atlas nodes with high priority, all other group resouce ( all egs nodes, all  ottawahospital nodes,  theory node ) with low priority. 

 2. egsshort.q is configured to run jobs in short time ( 15 minutes ), which contains 10 fast egs nodes ( egs43 - egs52) with high priotiy.

 3. egs.q is configured as the egs 10 fast nodes with middle priority, egs rest nodes with high priority and all other resouce ( all atlas nodes, all ottawahospital nodes, theory node ) with low priority.

  4. ottawahospital.q is configured as all ottawahospital nodes with high priority, all other resource ( all atals node, all egs nodes, theory node) with low priority.

  5. theory.q is configures as the theory node with high priority, all other resouce ( all atlas nodes, all egs node, all ottawahospital nodes ) with low priority.

  6. guest.q is configured as all resouce with low priority, which is for guests only.

  Exo group, Sno group, Ilc group are allowed to submit jobs to atlas.q since there is not any dedicate queue for them.

To submit jobs, use the qsub command, you have to specific queue name (ie "qsub -q  <queuename> <yourjob> ").  And  If you use  "qsub -q <queuename>  prog1",   SGE will then run the program and place two files in your current directory: prog1.e#    and prog1.o# ,  Where # is the job number assigned by SGE. The prog1.e# file contains the output from standard error and the prog1.o# file contains the output from standard output files of the job. You can use  qsub option the -o and -e flags in qsub to define where to send your sge script output and errors instead of current directory. For more detail, please look qsub by "man qsub".

 

Steps to Submitting a Job on the Cluster

  1. Enable your account for research cluster job submission by adding
    these lines at the bottom of your .cshrc file :

    if ( $?prompt && -e /usr/local/sge/boson_nest/common/settings.csh) then
    source /usr/local/sge/boson_nest/common/settings.csh
    endif

  2. Compose a script with the commands you want to execute in it, ie compile or run commands,
  3. Create the following file "sub1", for example :

    #!/bin/sh
    f77 -o prog1.o /home/yourusername/progs/prog1.f
    /home/yourusername/progs/prog1.o

  4. Use "ssh" (secure shell) to login onto a cluster submit node (ie "ssh egs01")
  5. Use the command "qsub" to submit your job. (ie "qsub -q egs.q -o <outputpath> -e <errorpath> sub1")

  6. To monitor your job, use the commands "qstat" and "qhost"

  7. Refer the man pages for a more detailed explanation of these commands (ie "man qsub")

  8. For assistance , please contact the system administrator