Home      Labs      Publications      People      Tools   

From CAGT

Bioinformatics Computing Clusters

Contents

Overview

The Bioinformatics Department has two linux clusters available for research computing and teaching. The clusters are known as Biowulf and Spartan, respectively. Both clusters are a valuable resource for many departments and to users both inside and outside the University. The availability and access to the cluster’s resources is critical to all users.


Both clusters have been purchased with an NSF Major Research Instrument grants awarded in 2001 and 2007, respectively, to Profs. Zhiping Weng, Simon Kasif, Charles DeLisi, Sandor Vajda, Temple Smith, Jim Collins, Robert Berwick, Gary Benson, Daniel Segre, and Yu Xia.


Biowulf Cluster


In 2001, the Center for Advanced Genomic Technology(CAGT) purchased the Biowulf cluster and consisting of an IBM eServer xSeries with a management node, two user nodes, a storage node, and 128 compute nodes (dual 1GHz PIII processors with 2GB RAM).


Home directories are located on user nodes, user1 and user2, and the storage node has directories data2 and data3 each containing 350GB. Each user is allotted 5GB of disk space on the user nodes and a link in each user’s home directory provides an additional 10GB on the storage node. Additional disk space on the storage node can be granted with approval from the system administrator.

The management and user nodes are backup nightly by IT. The storage node is configured with RAID5 and has duplicate hardware mirror system for nightly backups.


The operating system for Biowulf is Red Hat Linux 7.3 and the computational workload is distributed across the compute nodes using Portable Batch System (PBS) and the scheduler Maui.


Logical structure of biowulf cluster

Spartan Cluster


The Spartan cluster consists of the following IBM systems, x3655 management node, DS3300 storage, and two x3455 login/user nodes. There are a total of 86 computation nodes associated with Spartan cluster, 48 are IBM x3455 models and 34 are Sun VZ20 models.


The 48 x3455 compute nodes contain dual-dual core 2.8Ghz compute nodes for a total of 192 processor cores. Twenty of the x3455 compute nodes have 8GB RAM and the remaining 28 have 4GB RAM. The VZ20 compute nodes have dual processors with 2GB of RAM per node.


Login/user nodes provide home directories and a link within each users home directory to storage. There is 5TB of total storage available to users of the spartan cluster. The management and user nodes are backup nightly by IT. The storage node is configured with RAID5 and has duplicate hardware mirror system for nightly backups.


The operating system for Spartan cluster is 64-bit Centos-5.2 (BU Linux 5, monde) and the computational workload is distributed across the compute nodes using Portable Batch System (PBS) and the scheduler Maui.

Request Cluster Account


The log accounts on the cluster are separate from any other University logins. To request an account on the cluster, email mfitzpat@bu.edu with the following information:

BU username

email address

Primary Advisor

Department


Accounts are usually approved and created within 24 hours. You will be contacted via email when your account is available.

Portable Batch System and Maui


The PBS resource management system handles the management and monitoring of the computational workload and Maui schedules and distributes the workload across the clusters.


Tasks or "jobs" are submitted to the cluster by creating a batch job command file, referred to as a PBS script. The jobs are queued using the PBS command qsub and then Maui will determine which jobs to run, when and where.


A PBS script is simply a shell script containing the set of commands you want run on the compute nodes. It also contains directives which specify the characteristics (attributes) of the job, and resource requirements (e.g. number of compute nodes and wall clock time) that your job needs. Once you create your PBS script, you can reuse it if you wish or modify it for subsequent runs.


Additional information about PBS(Torque) and Maui can be found at:

clusterresources.com

torque_info

maui_info


PBS Script


The PBS script is a shell script that contains PBS directives which are preceded by #PBS.


The PBS derivatives are defined in the table below:

      PBS Directive                         Function   

  #PBS -N <job name>             Specifies job name.

  #PBS -l nodes=1:ppn=2*          Specifies a PBS resource requirement of
                                 1 compute node and 2 processor per node.
                                 * Spartan nodes contain 4 cores/node: ppn=4

  #PBS -l walltime=4:00:00       Specifies a PBS resource requirement of 
                                 4 hours of wall clock time to run the job.

  #PBS -o output_filename        Specifies the name of the file where job
                                 output is to be saved. May be omitted to
                                 generate filename appended with jobid number.
 
  #PBS -j oe                     Specifies that job output and error messages
                                 are to be joined in one file.

  #PBS -m bea                    Specifies that PBS send email notification
                                 when the job begins (b), ends (e), or 
                                 aborts (a). 

  #PBS -M userid@bu.edu          Specifies the email address where PBS
                                 notification is to be sent.

     

After the PBS directives in the PBS script, shell commands entered and executed when the job runs.


Sample PBS Script


The process of continuously copying data back and forth between the compute nodes and users home/storage directories while a job is running, can cause heavy I/O traffic on the network. If the network traffic is severe enough, the overall cluster performance will be negatively impacted.


To resolve network I/O traffic bottleneck issues, all input/output data required for a job MUST be copied to/from the compute node's hard drive. This is accomplished by specify the following line in the pbs script: cd /scr/$PBS_JOBID. PBS will automatically create the directory under /scr on the compute node for each job ($PBS_JOBID) run on that node and grant the jobs owner permission to read/write to that directory.


The output data of the job is also be placed in the /scr/$PBS_JOBID directory. Once the job has finished, the output data is copied back to the users STORAGE directory.


The sample PBS script below illustrates how to incorporate the /scr/$PBS_JOBID directory and copying output data into your storage directory.



How to Login


To access the clusters, users ssh into one of the clusters login/user node:


Biowulf login/user nodes:

lachesis.bu.edu (internal cluster name: user1)

atropus.bu.edu (internal cluster name: user2)


Spartan login/user nodes:

decima.bu.edu (internal cluster name: userA)

morta.bu.edu (internal cluster name: userB)


The home directories and storage links on each cluster are equally accessible from the respective login/user nodes.


How to Submit a job


The PBS command, qsub, is used to submit PBS scipts to the clusters. Then based on the resources requested, the job is scheduled by Maui. Jobs submitted from user1/user2 will only run on the Biowulf cluster and jobs submitted from userA/userB will only run on the Spartan cluster.


There are five PBS queues with various runtimes on Biowulf and three PBS queues on Spartan. Queue limits have been setup to limit the number of jobs that can be run from a queue on the cluster at any given time. This prevents the cluster from being monopolized by users. Queue names, times and maximum jobs are listed below.


Cluster Queues

    Queue       Run time           Max # jobs

Biowulf

test            0.5 hour           2 jobs   For testing PBS scripts.
short           4 hours            256 jobs 
medium          12 hours           192 jobs
long            36 hours           128 jobs


Spartan*


Spartan nodes/queues have different quantities of RAM and based on type of resources your job requires,
you should select the queue appropriately.

spartan queue: 2GB - 4GB
bigmem queue: 8GB

For example if your job requires lots of cpu cycles then you would submit to the spartan queue, to 
access all of the cpu cycles available on the cluster.  If your job have large memory requirements, 
you would submit to the bigmem queue.


spartan(includes bigmem)         50 hours           86 individual jobs (total 260 processor cores)
bigmem                           50 hours           20 individual jobs (total 80 processor cores)
batch                            NA                 Test queue for adminstrator


* Note: Spartan nodes are dual-dual core processors.

Qsub Command


The PBS qsub command is used to submit the PBS script for scheduling and execution. For example, a user submits a job via a PBS script called "test.4" to the bigmem queue on the spartan cluster, the syntax would be

      userB:~$ qsub -q bigmem test.4
      200305.nona-man
      userB:~$

Notice that upon successful submission of a job, PBS returns a job identifier of the form jobid.nona-man where jobid is an integer number assigned by PBS to that job. You'll need the job identifier for any actions involving the job, such as checking job status or deleting the job.


There are many options to the qsub command as can be seen by typing man qsub. Below are three common options:

–I  job is to be run "interactively":users can access jobs on the compute nodes for debugging

–l  lists resource requirements.

-V  user's environmental variables are exported to the job 



How to Delete a Job


PBS provides the qdel command for deleting jobs from the system using the job identification number. You can only delete jobs that you own.

qdel <jobid>  delete specific job.


The following script will delete ALL jobs owned by you. You can add this to your .bashrc or .cshrc file as an alias.


Example of .bashrc file


# User specific aliases and functions
alias killMyPBS='for x in `qstat -a -u $USER -n | cut -f1 -d" " | grep \.nona-man
`; do qdel $x; echo killed $x; done'


*NOTE: Biowulf jobs, change to \.nona-man to \.man


  • Thanks to Jason Vertrees for providing the script.



How to Display Queue and Job Status


Qstat can be used to get the status of a PBS queues or jobs. For additional information about the qstat command, type man qstat. Here are some common qstat commands.


qstat –Q  status of all queues

qstat –a  status information about all jobs submitted to the cluster.

qstat –u <username> status of all jobs for a particular user.

qstat –f <jobid>  job specific information.

qstat -n shows what jobs are running on which node.


The following script allows you to query all of YOUR queued and running jobs on the cluster. You can add this to your .bashrc or .cshrc file as an alias.




# qs -- query all running and queued jobs on the cluster owned by you:

function qs {
        echo Running: `qstat -u $USER | grep " R " | wc -l`
        echo Queued : `qstat -u $USER | grep " Q " | wc -l`
        }


  • Thanks to Jason Vertrees for providing the script.



Message Passing Interface (MPI)



MPICH2


MPICH2 is a standard for programming distributed-memory parrallel computers. MPICH2 uses an external process manager called MPD, which consists of a ring of daemons running on the compute nodes. The cluster management node starts/stops the compute node MPD daemons. MPICH2 has been compiled for languages C, C++, Fortran-77 and Fortran-90 and the MPD daemons have been started on all compute nodes.


Additional MPICH2 information can be found at

MPICH2


MPICH2 is installed on both Biowulf (MPICH2-1.0.7) and Spartan (MPICH2-1.0.8) clusters under /usr/local/mpich2 and has been exported to all compute nodes.


Compiling Code

The languages listed above can be compiled with the following mpi compilers (all are in your path):

C = mpicc

C++ = mpicxx

Fortran-77 = mpif77

Fortran-90 = mpif90


To compile code write in C, use the mpi compilier, mpicc, to generate a mpi file.


mpicc -o <mpi_test> <mpi.c>


mpi_test = compiled mpi file

mpi.c = input file to be compiled


An example of mpi input file


The generated <mpi_test> file will be used in your pbs script.


Submitting Job


Once the mpi files have been generated, you can submit them to the cluster using any queue. The pbs script used for submitting mpich2 jobs has some environmental variables and those have been incorporated into the scripts.


The MPICH2 command mpiexec is used to run mpi jobs.


Biowulf MPI Script


Spartan MPICH2 Script


To request the number of nodes and processor per node you need to edit the line in the script. "#PBS -l nodes=<# of nodes>:ppn=<# of processor cores>"


Note: if the number of nodes/processors your requested are not available, then the job will sit in the queue until the resources do become available.


To submit an MPICH2 job to either cluster:


qsub -V -q <queue name> <pbs.script>


  • Thanks to Brian Pierce, Julian Mintseris, and Jason Vertrees for testing and troublshooting mpi on the clusters.

Matlab


The Office of Information Technology (OIT) has a shared pool of 500 individual matlab licenses, as well as a number of licenses for various toolboxes. Each instance of matlab requires a matlab license. If OIT finds a particular user abusing matlab licenses, they will notify the user and matlab privileges will be revoked. For more information on matlab and available toolboxes go to

matlab_info


Individual Matlab jobs can be run on both clusters from any queue. Each job requires a matlab license to run and users should not submit more than five matlab jobs to either cluster at any give time.


Graphical instances of matlab running on the user or computes is not allowed, only text mode. To run matlab in text mode use the following command:


matlab -nodisplay -nosplash -nojvm


Text mode has been incorporated into the sample matlab pbs scripts below.


Running Matlab Jobs with a License


Below are pbs scripts for running matlab jobs on the clusters which require a matlab license. On Biowulf, matlab is installed under /usr/local and on the Spartan cluster matlab is installed under /opt/matlab. The scripts contain the proper paths to matlab installation directories on the respective clusters, and will run the jobs in text mode.



Simple code and matlab scripts for testing.


Testcode

testcodes.m

Biowulf Matlab Script

biowulf.matlab

Spartan Matlab Script

spartan.matlab


Matlab Compiler (Spartan Cluster Only)


Some matlab code can be converted to function M-files, compiled with the matlab compiler (mcc) and then run on the Spartan cluster and a license is not required. There are several toolboxes that can not be compiled and run on the cluster. Refer to the following link for specific toolboxes and information:

Non-compiled_toolboxes


If the toolboxes required for your matlab job are not on the above list, then running jobs compiled with the matlab compiler (mcc) is an efficient way to run one or many matlab jobs. Matlab licenses will not be an issue.


MathWorks Documentation for information regarding the Matlab Compiler

Matlab Compiler


Only function M-files can be used with the matlab compiler, so any script M-files will need to be converted to a function M-files. A simple way to convert your script M-file to function M-file is to add a function line to the beginning of the script M-file.

Here is a simple example

testconvert.m


For additional information on converting your script M-file, refer to

Converting Script M-File to Function M-Files


Once your M-file has been converted to a function M-file, then you need to initialize the matlab compiler environment. This command only needs to be run once on either of the Spartan user nodes, UserA/UserB

Start matlab: matlab & Then in the command window, run the following to initialize the matlab compiler:

>> mbuild -setup

   Options files control which compiler to use, the compiler and link command
   options, and the runtime libraries to link against.
   Using the 'mbuild -setup' command selects an options file that is
   placed in ~/.matlab/R2008b and used by default for 'mbuild'. An options 
   file in the current working directory or specified on the command line 
   overrides the default options file in ~/.matlab/R2008b.

   To override the default options file, use the 'mbuild -f' command
   (see 'mbuild -help' for more information).

The options files available for mbuild are:

 1: /opt/matlab/bin/mbuildopts.sh : 
     Build and link with MATLAB C-API or MATLAB Compiler-generated library via the sys

tem ANSI C/C++ compiler


 0: Exit with no changes

Enter the number of the compiler (0-1): 1

/opt/matlab/bin/mbuildopts.sh is being copied to /fs/userB1/mfitzpat/.matlab/R2008b/mbuildopts.sh


After the compiler enviroment has been defined, the next step is to compile your function M-file. Either within Matlab or via command line


mcc -vm <file.m>


v = verbose

m = generate a C stand alone application


The compiled file with have the same name as the "file.m" without the .m.


Add the matlab compiled file to a pbs script, just as you would any other cluster job and submit to the cluster. Below is a simple function M-file for testing and a Spartan sample pbs scripts for running compiled matlab function M-file on the clusters.

testconvert.m
spartan.nolic.pbs


  • Thanks to Dustin Holloway for testing and troubleshooting matlab on the clusters.

Programming tips


Do not submit 2 jobs for each compute node. Maui will schedule the jobs based on requested resources.


Ust the local directory /scr on each compute node for copying the input files needed by the job. By doing so, will reduce I/O network traffic on the cluster. See example script test.pbs and the line under "# copy the date files to scratch".


Configure your PBS script so that all output data is sent directly to the storage link in your storage directory.


CAGT Cluster Policies (IMPORTANT! Please read.)


Fair Share


The goal is for all cluster users to have equal and timely access to the all cluster resources. The cluster’s workload varies daily and as does the number of active users. Biowulf has 256 processor cores and Spartan has ~268 processor cores which are available to all users. There are times when all of the processor cores may not be available due to various issues.


The historical usage feature in Maui has been enable to enhance the fair share policy. Historical usage allows for queued jobs to be prioritized and scheduled based on active users, number of jobs queued, and cluster activity for the past two days.


This allows infrequent cluster users who submit jobs to the queues the ability to have their jobs prioritized and scheduled to run before other previously queued jobs.

Running jobs on the user nodes


Running jobs on any of the user nodes is not permitted. The user nodes are shared and these resources are limited.

If processes are found running on the user nodes, they will be terminated and the user contacted.


All jobs should be submitted to cluster via the qsub command.

Cluster abuse


Scripts running on the cluster that protect being killed by root, such as a job that spawns multiple children jobs, are not permitted. The cluster is a public resource that should be fairly shared.


BU Computing Ethics Policy


In addition, Boston University policy for computing ethics applies to all activity on the cluster. This policy is outlined at http://www.bu.edu/computing/ethics/.


Cluster Support


If you have questions or problems with your jobs, user nodes, etc, email to mfitzpat@bu.edu with following information:

  • Problems or questions
  • System name or node number
  • Error messages
  • When the problems occured and what was running
  • JOB ID, if applicable

FAQ’s



Q: How do I change my password?
A: Type “yppasswd�? on the user node and you will be prompted for your old password,
   then enter your new passwd. 


Q: How much disk space do I have on the cluster?
A: In your home directory ~5GB and ~20GB in your storage dir.


Q: My output files are quite large and how can I get more storage space? 
A: Send email to mfitzpat@bu.edu
requesting additional disk space.

Q: What applications are available on the clusters?
A: All shared applications are located under /usr/local.

Q: How can I get an application installed the cluster?
A: send email to mfitzpat@bu.edu requesting the application
   be installed and include web link, packages, tar balls, etc.


Q: Qsub error    Qsub:  Unknown queue
A: Job was submitted from the wrong user node
   User1/user2 = Biowulf queues only (test, short, medium, long, mpi_q)
   UserA/UserB = Spartan queues only (spartans, bigmem, lomem)


Q: A user has monopolized the cluster by running the 
   maximum number of jobs allowed for a particular queue.
A: Email the user and mfitzpat@bu.edu 
   requesting that they delete some jobs.


Q: My perl script runs on Biowulf okay, but not on Zodiac?
A: Check the perl path in your script:
   Biowulf perl is located at /usr/local/bin/perl
   Spartans perl is located at /usr/bin/perl




Protein Engineering