Home      Labs      Publications      People      Tools   

From CAGT

Bioinformatics Computing Cluster

Contents

Overview

The Bioinformatics Department has a linux cluster available for research computing and teaching. The cluster is called, Spartan. The cluster is a valuable resource for many departments and to users both inside and outside the University. The availability and access to the cluster’s resources is critical to all users.


The cluster has been purchased with an NSF Major Research Instrument grants awarded in 2007, to Profs. Zhiping Weng, Simon Kasif, Charles DeLisi, Sandor Vajda, Temple Smith, Jim Collins, Robert Berwick, Gary Benson, Daniel Segre, and Yu Xia.


Spartan Cluster


The Spartan cluster consists of the following IBM systems, x3655 management node, DS3300 storage, and two x3455 login/user nodes. There are a total of 86 computation nodes associated with Spartan cluster, 48 are IBM x3455 models and 34 are Sun VZ20 models.


The 48 x3455 compute nodes contain dual-dual core 2.8Ghz compute nodes for a total of 192 processor cores. Twenty of the x3455 compute nodes have 8GB RAM and the remaining 28 have 4GB RAM. The VZ20 compute nodes have dual processors with 2GB of RAM per node.


Login/user nodes provide home directories and a link within each users home directory to storage. There is approximately 30TB of total storage available to users of the spartan cluster. The management and user nodes are backup nightly by IT. The storage node is configured with RAID5 and has duplicate hardware mirror system for nightly backups.


The operating system for Spartan cluster is 64-bit Centos-5.2 (BU Linux 5, monde) and the computational workload is distributed across the compute nodes using Portable Batch System (PBS) and the scheduler Maui.

Request Cluster Account


The log accounts on the cluster are separate from any other University logins. To request an account on the cluster, email mfitzpat@bu.edu with the following information:

BU username

email address

Primary Advisor

Department


Accounts are usually approved and created within 24 hours. You will be contacted via email when your account is available.

Portable Batch System and Maui


The PBS resource management system handles the management and monitoring of the computational workload and Maui schedules and distributes the workload across the clusters.


Tasks or "jobs" are submitted to the cluster by creating a batch job command file, referred to as a PBS script. The jobs are queued using the PBS command qsub and then Maui will determine which jobs to run, when and where.


A PBS script is simply a shell script containing the set of commands you want run on the compute nodes. It also contains directives which specify the characteristics (attributes) of the job, and resource requirements (e.g. number of compute nodes and wall clock time) that your job needs. Once you create your PBS script, you can reuse it if you wish or modify it for subsequent runs.


Additional information about PBS(Torque) and Maui can be found at:

clusterresources.com

torque_info

maui_info


PBS Script


The PBS script is a shell script that contains PBS directives which are preceded by #PBS.


The PBS derivatives are defined in the table below:

      PBS Directive                         Function   

  #PBS -N <job name>             Specifies job name.

  #PBS -l nodes=1:ppn=2*          Specifies a PBS resource requirement of
                                 1 compute node and 2 processor per node.
                                 * Spartan nodes contain 4 cores/node: ppn=4

  #PBS -l mem=2000MB             Specifies a PBS resource requirement of 2000MB (2GB)
                                 per node. Maximum mem resource is 16000MB (16GB)


  #PBS -l walltime=4:00:00       Specifies a PBS resource requirement of 
                                 4 hours of wall clock time to run the job.

  #PBS -o output_filename        Specifies the name of the file where job
                                 output is to be saved. May be omitted to
                                 generate filename appended with jobid number.
 
  #PBS -j oe                     Specifies that job output and error messages
                                 are to be joined in one file.

  #PBS -m bea                    Specifies that PBS send email notification
                                 when the job begins (b), ends (e), or 
                                 aborts (a). 

  #PBS -M userid@bu.edu          Specifies the email address where PBS
                                 notification is to be sent.

     

After the PBS directives in the PBS script, shell commands entered and executed when the job runs.

Sample PBS Script


The process of continuously copying data back and forth between the compute nodes and users home/storage directories while a job is running, can cause heavy I/O traffic on the network. If the network traffic is severe enough, the overall cluster performance will be negatively impacted.


To resolve network I/O traffic bottleneck issues, all input/output data required for a job MUST be copied to/from the compute node's hard drive. This is accomplished by specify the following line in the pbs script: cd /scr/$PBS_JOBID. PBS will automatically create the directory under /scr on the compute node for each job ($PBS_JOBID) run on that node and grant the jobs owner permission to read/write to that directory.


The output data of the job is also be placed in the /scr/$PBS_JOBID directory. Once the job has finished, the output data is copied back to the users STORAGE directory.


The sample PBS script below illustrates how to incorporate the /scr/$PBS_JOBID directory and copying output data into your storage directory.


Input data required to run test.pbs script

How to Login


To access the cluster, users ssh into one of the clusters login/user node:


Spartan login/user nodes:

decima.bu.edu (internal cluster name: userA)

morta.bu.edu (internal cluster name: userB)


The home directories and storage links on each cluster are equally accessible from the respective login/user nodes.


How to Submit a job

The PBS command, qsub, is used to submit PBS script(s) to the cluster. Then based on the resources requested, the job is scheduled by Maui. Jobs can only be submitted from login nodes userA or userB. The nodes within the spartan cluster have different quantities of RAM and various processor core quantities and speed. The are several PBS queues to accommodate various requests and/or requirements. In addition, you may also specify the resources required to run the job within your pbs script.



Queue Name                                        Description
            
batch                                                admin 
            
web                                       reserved for web applications

test                                               test pbs jobs      

spartans                  all compute nodes, ~360 processor cores, various RAM amounts: 2GB - 16GB

bigmem                          subset of spartans with 4 cores and 16GB RAM per node

sunfire                         subset of spartans with 8 cores and 8GB RAM per node


Spartan Queues


Queue limits have been setup to limit the number of jobs that can be run from a queue on the cluster at any given time. This prevents the cluster from being monopolized by users. Queue names, number of cores and quantity of RAM are listed below.



Queue                    Run time           # processor cores            

test                     120 hours                 4 cores

spartans                 48 hours                360 cores
(RAM varies2GB to 16GB)

bigmem                   48 hours                192 cores(subset of spartan queue)
(16GB RAM/node)

sunfire                  48 hours                112 cores(subset of spartan queue)
(8GB RAM/node)



Based on type of resources your job requires, you should select the appropriate queue. For example if your job requires lots of cpu cycles then you would submit to the spartan queue, to access all of the cpu cycles available on the cluster. If your job have large memory requirements, you would submit to the bigmem queue.

Qsub Command


The PBS qsub command is used to submit the PBS script for scheduling and execution. For example, a user submits a job via a PBS script called "test.4" to the bigmem queue on the spartan cluster, the syntax would be

      userB:~$ qsub -q bigmem test.4
      200305.nona-man
      userB:~$

Notice that upon successful submission of a job, PBS returns a job identifier of the form jobid.nona-man where jobid is an integer number assigned by PBS to that job. You'll need the job identifier for any actions involving the job, such as checking job status or deleting the job.


There are many options to the qsub command as can be seen by typing man qsub. Below are three common options:

–I  job is to be run "interactively":users can access jobs on the compute nodes for debugging

–l  lists resource requirements.

-V  user's environmental variables are exported to the job 



How to Delete a Job


PBS provides the qdel command for deleting jobs from the system using the job identification number. You can only delete jobs that you own.

qdel <jobid>  delete specific job.


The following script will delete ALL jobs owned by you. You can add this to your .bashrc or .cshrc file as an alias.


Example of .bashrc file


# User specific aliases and functions
alias killMyPBS='for x in `qstat -a -u $USER -n | cut -f1 -d" " | grep \.nona-man
`; do qdel $x; echo killed $x; done'


*NOTE: Biowulf jobs, change to \.nona-man to \.man


  • Thanks to Jason Vertrees for providing the script.



How to Display Queue and Job Status


Qstat can be used to get the status of a PBS queues or jobs. For additional information about the qstat command, type man qstat. Here are some common qstat commands.


qstat –Q  status of all queues

qstat –a  status information about all jobs submitted to the cluster.

qstat –u <username> status of all jobs for a particular user.

qstat –f <jobid>  job specific information.

qstat -n shows what jobs are running on which node.


The following script allows you to query all of YOUR queued and running jobs on the cluster. You can add this to your .bashrc or .cshrc file as an alias.




# qs -- query all running and queued jobs on the cluster owned by you:

function qs {
        echo Running: `qstat -u $USER | grep " R " | wc -l`
        echo Queued : `qstat -u $USER | grep " Q " | wc -l`
        }


  • Thanks to Jason Vertrees for providing the script.



Message Passing Interface (MPI)



MPICH2


MPICH2 is a standard for programming distributed-memory parrallel computers. MPICH2 uses an external process manager called MPD, which consists of a ring of daemons running on the compute nodes. The cluster management node starts/stops the compute node MPD daemons. MPICH2 has been compiled for languages C, C++, Fortran-77 and Fortran-90 and the MPD daemons have been started on all compute nodes.


Additional MPICH2 information can be found at

MPICH2


MPICH2 is installed on both Biowulf (MPICH2-1.0.7) and Spartan (MPICH2-1.0.8) clusters under /usr/local/mpich2 and has been exported to all compute nodes.


Compiling Code

The languages listed above can be compiled with the following mpi compilers (all are in your path):

C = mpicc

C++ = mpicxx

Fortran-77 = mpif77

Fortran-90 = mpif90


To compile code write in C, use the mpi compilier, mpicc, to generate a mpi file.


mpicc -o <mpi_test> <mpi.c>


mpi_test = compiled mpi file

mpi.c = input file to be compiled


An example of mpi input file


The generated <mpi_test> file will be used in your pbs script.


Submitting Job


Once the mpi files have been generated, you can submit them to the cluster using any queue. The pbs script used for submitting mpich2 jobs has some environmental variables and those have been incorporated into the scripts.


The MPICH2 command mpiexec is used to run mpi jobs.


Spartan MPICH2 Script


To request the number of nodes and processor per node you need to edit the line in the script. "#PBS -l nodes=<# of nodes>:ppn=<# of processor cores>"


Note: if the number of nodes/processors your requested are not available, then the job will sit in the queue until the resources do become available.


To submit an MPICH2 job to the cluster:


qsub -V -q <queue name> <pbs.script>


  • Thanks to Brian Pierce, Julian Mintseris, and Jason Vertrees for testing and troublshooting mpi on the clusters.

Matlab


The Office of Information Technology (OIT) has a shared pool of 500 individual matlab licenses, as well as a number of licenses for various toolboxes. Each instance of matlab requires a matlab license. If OIT finds a particular user abusing matlab licenses, they will notify the user and matlab privileges will be revoked. For more information on matlab and available toolboxes go to

matlab_info


Individual Matlab jobs can be run on both clusters from any queue. Each job requires a matlab license to run and users should not submit more than five matlab jobs to either cluster at any give time.


Matlab jobs should not be run on the login/user nodes. Graphical instances of matlab running on the computes is not allowed, only text mode. To run matlab in text mode use the following command:


matlab -nodisplay -nosplash -nojvm


Text mode has been incorporated into the sample matlab pbs scripts below.


Running Matlab Jobs with a License


Below is a pbs script for running matlab jobs on the cluster which require a matlab license. On the Spartan cluster matlab is installed under /opt/matlab. The scripts contain the proper paths to matlab installation directories and will run the jobs in text mode on the nodes.



Simple code and matlab scripts for testing.


Testcode

testcodes.m


Spartan Matlab Script

spartan.matlab


Matlab Compiler


Some matlab code can be converted to function M-files, compiled with the matlab compiler (mcc) and then run on the Spartan cluster and a license is not required. There are several toolboxes that can not be compiled and run on the cluster. Refer to the following link for specific toolboxes and information:

Non-compiled_toolboxes


If the toolboxes required for your matlab job are not on the above list, then running jobs compiled with the matlab compiler (mcc) is an efficient way to run one or many matlab jobs. Matlab licenses will not be an issue.


MathWorks Documentation for information regarding the Matlab Compiler

Matlab Compiler


Only function M-files can be used with the matlab compiler, so any script M-files will need to be converted to a function M-files. A simple way to convert your script M-file to function M-file is to add a function line to the beginning of the script M-file.

Here is a simple example

testconvert.m


For additional information on converting your script M-file, refer to

Converting Script M-File to Function M-Files


Once your M-file has been converted to a function M-file, then you need to initialize the matlab compiler environment. This command only needs to be run once on either of the Spartan user nodes, UserA/UserB

Start matlab: matlab & Then in the command window, run the following to initialize the matlab compiler:

>> mbuild -setup

   Options files control which compiler to use, the compiler and link command
   options, and the runtime libraries to link against.
   Using the 'mbuild -setup' command selects an options file that is
   placed in ~/.matlab/R2008b and used by default for 'mbuild'. An options 
   file in the current working directory or specified on the command line 
   overrides the default options file in ~/.matlab/R2008b.

   To override the default options file, use the 'mbuild -f' command
   (see 'mbuild -help' for more information).

The options files available for mbuild are:

 1: /opt/matlab/bin/mbuildopts.sh : 
     Build and link with MATLAB C-API or MATLAB Compiler-generated library via the sys

tem ANSI C/C++ compiler


 0: Exit with no changes

Enter the number of the compiler (0-1): 1

/opt/matlab/bin/mbuildopts.sh is being copied to /fs/userB1/mfitzpat/.matlab/R2008b/mbuildopts.sh


After the compiler enviroment has been defined, the next step is to compile your function M-file. Either within Matlab or via command line


mcc -vm <file.m>


v = verbose

m = generate a C stand alone application


The compiled file with have the same name as the "file.m" without the .m.


Add the matlab compiled file to a pbs script, just as you would any other cluster job and submit to the cluster. Below is a simple function M-file for testing and a Spartan sample pbs scripts for running compiled matlab function M-file on the clusters.

testconvert.m
spartan.nolic.pbs


  • Thanks to Dustin Holloway for testing and troubleshooting matlab on the clusters.

Programming tips


Maui will schedule the jobs based on requested resources.


Use the local directory /scr on each compute node for copying the input files needed by the job. By doing so, will reduce I/O network traffic on the cluster. See example script test.pbs and the line under "# copy the date files to scratch".


Configure your PBS script so that all output data is sent directly to the storage link in your storage directory.

CAGT Cluster Policies (IMPORTANT! Please read.)


Fair Share


The goal is for all cluster users to have equal and timely access to the all cluster resources. The cluster’s workload varies daily and as does the number of active users. The Spartan cluster has 360 processor cores which are available to all users. There are times when all of the processor cores may not be available due to various issues.


The historical usage feature in Maui has been enable to enhance the fair share policy. Historical usage allows for queued jobs to be prioritized and scheduled based on active users, number of jobs queued, and cluster activity for the past two days.


This allows infrequent cluster users who submit jobs to the queues the ability to have their jobs prioritized and scheduled to run before other previously queued jobs.

Running jobs on the user nodes


Running jobs on any of the user nodes is not permitted. The user nodes are shared and these resources are limited.

If processes are found running on the user nodes, they will be terminated and the user contacted.


All jobs should be submitted to cluster via the qsub command.

Cluster abuse


Scripts running on the cluster that protect being killed by root, such as a job that spawns multiple children jobs, are not permitted. The cluster is a public resource that should be fairly shared.


BU Computing Ethics Policy


In addition, Boston University policy for computing ethics applies to all activity on the cluster. This policy is outlined at http://www.bu.edu/computing/ethics/.


Cluster Support


If you have questions or problems with your jobs, user nodes, etc, email to mfitzpat@bu.edu with following information:

  • Problems or questions
  • System name or node number
  • Error messages
  • When the problems occured and what was running
  • JOB ID, if applicable

FAQ’s



Q: How do I change my password?
A: Type “yppasswd�? on the user node and you will be prompted for your old password,
   then enter your new passwd. 


Q: How much disk space do I have on the cluster?
A: In your home directory ~5GB and ~20GB in your storage dir.


Q: My output files are quite large and how can I get more storage space? 
A: Send email to mfitzpat@bu.edu
requesting additional disk space.

Q: What applications are available on the clusters?
A: All shared applications are located under /usr/local.

Q: How can I get an application installed the cluster?
A: send email to mfitzpat@bu.edu requesting the application
   be installed and include web link, packages, tar balls, etc.


Q: Qsub error    Qsub:  Unknown queue
A: Job was not submitted the user nodes or incorrect queue
   UserA/UserB = Spartan login nodesqueues only (spartans, bigmem, sunfire, test)


Q: A user has monopolized the cluster by running the 
   maximum number of jobs allowed for a particular queue.
A: Email the user and mfitzpat@bu.edu 
   requesting that they delete some jobs.




Protein Engineering