Home      Labs      Publications      People      Tools   

From CAGT

Comet Download

Download Comet binary for Linux (Redhat 7.1)
Download Comet binary for Alpha (Compaq Tru64 UNIX V5.0A)
Download Comet binary for Sun (Solaris 8)
Download Comet binary for SGI / IRIX
Download Comet binary for Mac OS X (thanks to Eric Frangulian)

Don't forget to make the file executable by using chmod +x

Instructions for Using Comet from the Command Line

Example usage:

comet -i myseqs.fa -m mymatrices -a 20 -o outfile

Options

-i [required]
Follow this option with the name of a file containing the sequences to be analyzed. This file should be in fasta format, eg:
>first_sequence
AGGTCGAG...
GTGGAAC...

>second_sequence
...
-m [required]

Use this option to supply the program with a file containing a list of nucleotide count matrices. Each matrix defines the DNA sequence motif of a cis-element. The file has the following format:

>first_motif
1 1
5 2 38 5
29 1 15 5
3 7 5 35
>second_motif
1 1
4 2 2 12
...
The first line of each matrix definition begins with the symbol > followed by a name for the motif. The second line, which is optional, specifies two weights for the motif: one for the + strand and the other for the - strand. These weights let you specify how often you expect each cis-element to occur on each strand in regulatory clusters. The weights are relative, so multiplying all the weights for all the motifs by a constant makes no difference. If in doubt, leave it out. The remaining lines contain counts of adenine, cytosine, guanine and thymine observed at each position in the cis-element, in a sample of cis-elements of this type.

Palindromes: for matrices that are exact complementary palindromes, there is no distinction between the + and - strand. Comet automatically detects exact complement palindromes, and assigns an overall weight for the motif that is the sum of the two numbers on the second line of the matrix description.

-a [optional]
Specifies the average distance expected between motifs in a cluster. The default is 35.
-o [optional]
Specifies the name of a file to write the output to. The default is to write output to the screen.
-e [optional]
Specifies an E-value threshold to supress output of clusters with greater E-values. The default is 10.
-w [optional]
Local abundances of A, C, G and T are counted in windows of size 2w+1. The default is 75.
-p [optional]
Number of pseudocounts to add to all entries in cis-element matrices. The default is 1.
-s [optional]
Specifies a file to write statistical information used as an intermediate step in calculating the E-values. This option is mainly for development purposes.

Known Limitations

The E-values will not be accurate when using a collection of cis-element matrices including: very similar matrices, a matrix that is almost a complementary palindrome, or a matrix with a high propensity for self-overlap, e.g. consensus sequence AAAAAA. It is recommended that very similar matrices be combined into a single matrix, and near-palindromic matrices be made exactly palindromic.

Protein Engineering