Home      Labs      Publications      People      Tools   

From CAGT

Biowulf Computing Cluster

Contents

Overview

In 2001, the Center for Advanced Genomic Technology(CAGT) purchased a Beowulf cluster, referred to as Biowulf. Additional computational nodes were purchased in 2004 and configured as a separate cluster known as Zodiac.


Both clusters are a valuable resource for many departments and users, both inside and outside the University. The availability and access to the cluster’s resources is critical to all users.


This cluster has been purchased with an NSF Major Research Instrument grant (DBI-0116574) awarded to Profs. Zhiping Weng, Simon Kasif, Charles DeLisi, Sandor Vajda, Temple Smith, Jim Collins and Robert Berwick. Please make sure to cite this grant in all publications that involve usage of the cluster.

Biowulf Cluster

Biowulf is an IBM eServer xSeries with a 128 compute node distributed memory multi-processor system. Each compute node contains dual 1GHz PIII processors with 2GB RAM and 36.4GB hard disk. The nodes are interconnected with Gigabit Ethernet.


There is also a management node, two user nodes, and a storage node. The management and user nodes are backup by IT nightly. The storage node is configured with RAID5 and has duplicate hardware mirror (snap) for nightly backups.


The operating system for Biowulf is Red Hat Linux 7.3 and the computational workload is distributed across the compute nodes using Portable Batch System (PBS) and the scheduler Maui.


Home directories are located on user nodes, user1 (62GB) and user2 (26GB) and the storage node has directories data2 and data3 each containing 350GB. Each user is allotted 5GB of disk space on the user nodes and a link in each user’s home directory provides an additional 10GB on the storage node. Additional disk space on the storage node can be granted with approval from an advisor. Request for additional disk space


Logical structure of biowulf cluster

Zodiac Cluster

The Zodiac cluster consists of 32 Sun VZ20’s nodes, each with dual 2.2GHz Opteron processors, 2GB of RAM and a 73GB hard drive. User3 is the management node for the Zodiac cluster and contains all the shared applications, PBS, and maui software. The computational nodes and user3 are running 64-bit Centos 4, (BU Linux4.5 Zodiac).


Home directories and storage links are equally accessible from user3.

Request Cluster Account


The log accounts on the cluster are separate from any other University logins. To request an account on the cluster, fill out and submit the Request Access form


Accounts are usually approved and created within 24 hours. You will be contacted via email when your account is available.

Portable Batch System and Maui

The PBS resource management system handles the management and monitoring of the computational workload and Maui schedules and distributes the workload across the clusters.


Tasks or “jobs� are submitted to the cluster by creating a batch job command file, referred to as a PBS script. The jobs are queued using the PBS command qsub and then Maui will determine which jobs to run, when and where.


A PBS script is simply a shell script containing the set of commands you want run on the compute nodes. It also contains directives which specify the characteristics (attributes) of the job, and resource requirements (e.g. number of compute nodes and wall clock time) that your job needs. Once you create your PBS script, you can reuse it if you wish or modify it for subsequent runs.


Additional information about PBS(Torque) and Maui can be found at:

clusterresources.com

PBS Script

The PBS script is a shell script that contains PBS directives which are preceded by #PBS. The following is an example of a PBS command file to run a serial job, which would only require 1 processor on one node.

 #!/bin/bash
 
 #PBS –N <job name>
 #PBS -l nodes=1
 #PBS -o output_filename
 #PBS -j oe
 #PBS -m bea
 #PBS -M userid@bu.edu
 
 cd $PBS_O_WORKDIR
 executable commands

The PBS derivatives are defined in the table below:

      PBS Directive                         Function   

  #PBS -N <job name>             Specifies job name.

  #PBS -l nodes=1                Specifies a PBS resource requirement of
                                 1 compute node and 1 processor per node.

  #PBS -l walltime=4:00:00       Specifies a PBS resource requirement of 
                                 4 hours of wall clock time to run the job.

  #PBS -o output_filename        Specifies the name of the file where job
                                 output is to be saved. May be omitted to
                                 generate filename appended with jobid number.
 
  #PBS -j oe                     Specifies that job output and error messages
                                 are to be joined in one file.

  #PBS -m bea                    Specifies that PBS send email notification
                                 when the job begins (b), ends (e), or 
                                 aborts (a). 

  #PBS -M userid@bu.edu          Specifies the email address where PBS
                                 notification is to be sent.

After the PBS directives in the PBS script, the shell executes a change directory command to $PBS_O_WORKDIR, a PBS variable indicating the directory where the PBS job was submitted and nominally where the program executable is located. Other shell commands can be executed as well. In the last line, the executable itself is invoked.

Sample PBS scripts (written by Joe Szustakowski)

How to Login

To access the clusters, users can ssh to either lachesis.bu.edu (user1) or atropus.bu.edu (user2). Lachesis and atropus are the cluster’s external interfaces and user1/user2 are the internal interfaces. The Zodiac cluster does not have an external interface and can only be accessed by connecting (ssh) from user1 or user2 to the internal interface, user3.


The home directories and storage links are equally accessible from user1, user2 and user3.

How to Submit a job

The PBS command, qsub, is used to submit PBS scipts to the clusters. Then based on the resources requested, the job is scheduled by Maui. Jobs submitted from user1 or user2 will only run on the Biowulf cluster and jobs submitted from user3 will only run on the Zodiac cluster.


There are five PBS queues with various runtimes on Biowulf and two queues on Zodiac. Queue limits have been setup to limit the number of jobs that can be run from a queue on the cluster at any given time. This prevents the cluster from being monopolized by users. Queue names, times and maximum jobs are listed below.


Cluster Queues

    Queue       Run time           Max # jobs

Biowulf

test            0.5 hour           2 jobs   For testing PBS scripts.
short           4 hours            256 jobs
medium          12 hours           192 jobs
long            36 hours           128 jobs
mpi_q           36 hours           2 jobs   See Running MPI for number of processors allowed

Zodiac

zodiac          50 hours           64 jobs
mpi_q             50 hours           4 jobs   See Running MPI for number of processors allowed

Qsub Command


The PBS qsub command is used to submit the PBS script for scheduling and execution. For example, user jimmy submits his job of PBS script called "test.pbs" to the short queue, the syntax would be

      user1: /home/jimmy $ qsub -q short test.pbs
      1354.man
      user1: /home/jimmy $

Notice that upon successful submission of a job, PBS returns a job identifier of the form jobid.man where jobid is an integer number assigned by PBS to that job. You'll need the job identifier for any actions involving the job, such as checking job status or deleting the job.


There are many options to the qsub command as can be seen by typing man qsub. Below are two common options:

–I  job is to be run "interactively":users can access jobs on the compute nodes for debugging

–l  lists resource requirements.

How to Delete a Job

PBS provides the qdel command for deleting jobs from the system using the job identification number. You can only delete jobs that you own.

qdel <jobid>  delete specific job.

How to Display Queue and Job Status

Qstat can be used to get the status of a PBS queues or jobs. For additional information about the qstat command, type man qstat. Here are some common qstat commands.


qstat –Q  status of all queues

qstat –a  status information about all jobs submitted to the cluster.

qstat –u <username> status of all jobs for a particular user.

qstat –f <jobid>  job specific information.

Running MPI

MPI allows a single job to be run in parallel on mulptiple nodes. Symbolic links under /usr/local/mpich on both clusters point to MPICH-1.2.5.3 on Biowulf and MPICH2-1.0.4 on Zodiac, respectively.

Older versions of MPI used the command mpirun for executing commands. The MPICH versions on biowulf and zodiac use mpiexec. Check the man pages for mpiexec for additional information. The sample mpi.pbs scripts have the correct syntax.


Compiling Code


Compile your code using the mpi compiliers located under /usr/local/mpich/bin to generate a mpi file. The compilers are already in your path.

To compile your code, use either the command line or a simple makefile.

An example of mpi code


Command line:


mpicc -o mpi_test mpi.c


Where the generated mpi_test file will be used in your pbs script.


Using the makefile (Written by Brian Pierce).

Edit the makefile as needed and then type make to generate the outfile.


MPI Queue

A separate mpi queues has been setup for mpi jobs on each cluster.

Biowulf queue: mpi_q
Run time = 36 hours
MPI jobs allowed to run = 2 jobs
Maximum number of nodes requested/job = 40 nodes

Zodiac queue: mpi_q
Run time = 50 hours
MPI jobs allowed to run = 2 jobs
Maximum number of nodes requested/job = 16 nodes


MPI PBS Script

Once the mpi files have been generated, you can submit them to the cluster via the mpi_q queue. The pbs script used for submitting mpi jobs has some environmental variables and additional commands to cleanup any processes from previous jobs and after the job has finished.

Biowulf MPI Script


Zodiac MPI Script

To request the number of nodes and processor per node you need to edit the line in the script. "#PBS -l nodes=<# of nodes>:ppn=2"


Note: if the number of nodes/processors your requested are not available, then the job will sit in the queue until the resources do become available. You can also request one processor per node: nodes=<# of nodes>:ppn=1. This will increase the probability that your resource request will be met quicker.


To submit an mpi job to either cluster:


qsub -q mpi_q <pbs.script>



  • Thanks to Brian Pierce and Julian Mintseris for testing and troublshooting mpi on the cluster.

Running Matlab

The Office of Information Technology (OIT) has a shared pool of 500 individual matlab licenses, as well as a number of licenses for various toolboxes. Each instance of matlab requires a matlab license. If OIT finds a particular user abusing matlab licenses, they will notify the user and revoke the license. For more information on matlab and available toolboxes go to

matlab_info


Individual Matlab jobs can be run on both clusters from any queue, except mpi_q. Each job requires a matlab license to run and users should not submit more than five matlab jobs to either cluster at any give time.


If matlab scritp M-files are converted to function M-files and compiled with the matlab compiler, then a matlab license is not required to run the jobs on the cluster. Currently, compiled matlab jobs can only be run on the ZODIAC cluster. See the Matlab Compiler section below for details.


Graphical instances of matlab running on the user or computes is not allowed. The following command is used to run matlab in text mode and has been incorporated into the sample matlab pbs scripts below.

matlab -nodisplay -nosplash -nojvm


Running Matlab Jobs with a License

Below are pbs scripts for running matlab jobs on the clusters which require a matlab license. The scripts contain commands which will run the job in text mode and point to the correct matlab installation directory on each cluster. Also provided is a simple matlab script for testing.

testcodes.m
biowulf.matlab
zodiac.matlab


Matlab Compiler (Matlab License Not Required)

Zodiac Cluster Only


If matlab script M-files are converted to function M-files and compiled with the matlab compiler, then a matlab license is not required to run the job on the cluster(s). This is an efficient way to run one or many matlab jobs and the availability of matlab licenses is not be an issue.


MathWorks Documentation for information regarding the Matlab Compiler

Matlab Compiler


Only function M-files can be used with the matlab compiler, so any script M-files will need to be converted to a function M-files. A simple way to convert your script M-file to function M-file is to add a function line to the beginning of the script M-file.

Here is a simple example

testconvert.m


For additional information on converting your script M-file, refer to

Converting Script M-File to Function M-Files


Once your M-file has been converted to a function M-file, then you need to initialize the matlab compiler environment. This command only needs to be run once on user3.

mbuild -setup


After the compiler enviroment has been defined, the next step is to compile your function M-file


mcc -vm <file.m>


v = verbose

m = generate a C stand alone application


The compiled file with have the same name as the "file.m" without the .m.


Add the matlab compiled file to a pbs script, just as you would any other cluster job and submit to the cluster. Below is a simple function M-file for testing and a zodiac sample pbs scripts for running compiled matlab function M-file on the clusters.

hello.m
zodiac.compiled



  • Thanks to Dustin Holloway for testing and troubleshooting matlab on the clusters.

Programming tips

Do not submit 2 jobs for each compute node. Maui will schedule the jobs based on requested resources.


Ust the local directory /scr on each compute node for copying the input files needed by the job. By doing so, will reduce I/O network traffic on the cluster. See example script test4.pbs and the line under "# copy the date files to scratch".


Configure your PBS script so that all output data is sent directly to the storage link in your home directory.

CAGT Cluster Policies (IMPORTANT! Please read.)

Fair Share

CAGT's goal for the cluster is to allow all users equal access to the cluster resources and the ability to generate results in a timely manner. The cluster’s workload varies daily and as does the number of active users. Biowulf has 256 CPUs and Zodiac has 64 CPUs which are available to all users. There are times when all of the CPUs may not be available due to various issues.


To meet CAGT’s goal, the historical usage feature has been enabled in Maui. Historical usage allows for queued jobs to be prioritized and scheduled based on active users, number of jobs queued, and cluster activity for the past two days.


This allows infrequent cluster users who submit jobs to the queues the ability to have their jobs prioritized and scheduled to run before other previously queued jobs.


First In First Out (FIFO) has been eliminated on the cluster to prevent queue stuffing.

Running jobs on the user nodes

Running jobs on the user nodes, user1, user2 and/or user3, is not permitted. The user nodes are shared and these resources are limited.

Use the qsub command to submit your jobs to the clusters.

Cluster abuse

Scripts running on the cluster that protect being killed by root, such as a job that spawns multiple children jobs, are not permitted. The cluster is a public resource that should be fairly shared.

BU Computing Ethics Policy

In addition, Boston University policy for computing ethics applies to all activity on the cluster. This policy is outlined at http://www.bu.edu/computing/ethics/.

Cluster Support

If you have questions or problems with your jobs, user nodes, etc, email to cagt-rt@eng-rt.bu.edu with following information:

  • Problems or questions
  • System name or node number
  • Error messages
  • When the problems occured
  • JOB ID, if applicable

You may also use the Comments/Issues form.

FAQ’s


Q: How do I change my password?
A: Type “yppasswd� on the user node and you will be prompted for your old password,
   then enter your new passwd. 


Q: How much disk space do I have on the cluster?
A: ~5GB in your home dir and 10GB on the storage node


Q: My output files are quite large and how can I get more storage space? 
A: Fill out the form [[http://cagt.bu.edu/page/Bcluster_space Disk Space]]
for requesting disk space.

Q: What applications are available on the clusters?
A: All shared applications are located under /usr/local.

Q: How can I get an application installed the cluster?
A: send email to cagt-rt@eng-rt.bu.edu requesting the application
   be installed and include web link, packages, tar balls, etc.


Q: Qsub error    Qsub:  Unknown queue
A: Job was submitted from the wrong user node
   User1/user2 = Biowulf queues only (test, short, medium, long, mpi_q)
   User3 = Zodiac queues only (zodiac, mpi)


Q: A user has monopolized the cluster by running the 
   maximum number of jobs allowed for a particular queue.
A: Email the user and cagt-rt@eng-rt.bu.edu requesting that they delete some jobs.


Q: My perl script runs on Biowulf okay, but not on Zodiac?
A: Check the perl path in your script:
   Biowulf perl is located at /usr/local/bin/perl
   Zodiac perl is located at /usr/bin/perl


Q: Segmentation Fault error

A: This error is due PBS not cleaning up properly after mpi jobs have finished running. 
The example mpi pbs scripts include the commands for cleaning up segmentation 
faults. If you can not remove the faults, Email cagt-rt@eng-rt.bu.edu with the 
appropriate information.


Protein Engineering