From CAGT
ROVER
Relative OVER-abundance of cis-elements
ROVER is a tool for determining if one or more of a group of transcription factors is likely to regulate a group of genes. It was designed for use with promoters from groups of genes that are suspected of being co-regulated, such as those from a microarray study. ROVER compares two groups of promoters (a suspected co-regulated group and a non-regulated group) by determining the relative over-abundance of likely binding sites for a particular Transcription Factor (TF) in one group versus the other. ROVER calculates the significance of any over-abundance of binding sites for each TF and reports a probability of its chance occurrence. This can be interpreted as the probability that a given TF regulates the group of genes in question. Likely binding sites are found by looking for high-scoring matches to a Position Specific Weight Matrix (PSSM), which represents known binding sites for a transcription factor. In addition to determining the significance of each TF, ROVER also provides the subset of sequences likely to be regulated by each TF and the specific significant binding sites. ROVER is available as a command-line Java program (download below). A web version of ROVER is also available as part of the MotifViz web site. There is also a C++ version, which is no longer being maintained.
Input
ROVER expects three files as input:- Promoter sequence file
- Background promoter sequence file
- PSSM file
JASPAR is an open source database, so we can provide a complete version of JASPAR (Downloaded 12-15-03) formatted for ROVER: Sample or Complete.
JASPAR is described in the following paper:
JASPAR: an open access database for eukaryotic transcription factor binding profiles
Nucleic Acids Res. 2004 Jan; 32(1) Database Issue
Albin Sandelin, Wynand Alkema, Pär Engström, Wyeth Wasserman and Boris Lenhard
You may need to format your promoter sequences and/or PSSMs to fit ROVER's requirements:
The first line of each sequence or matrix starts with a ">" and includes an accession and name. The following lines should contain the sequence or binding site matrix. It is important that the accession for the gene or matrix be separated from the name by a tab character.
Here is a sequence file example:
>YBL002W HTB2 TACCCAATAGCTTGTTCAATTCATCATCATTTCTGATGGCCAATTGTAAATGTCTTGGAATAATTCTGGTTTTTTTGTTATCTCTAGCAGCATTACCAGCCAATTCTAAAATTTCAGCAGCCAAATATTCTAAGACAGCAGTTAGATAGACTGGAGCACCAGAACCAATTCTCTGGGCGTAGTTACCTCTTCTTAGCAATCTGTGCACTCTACCAACTGGGAATGTTAAACCAGCTTTAGCAGATCTAGATTGAGAAGCTTTAGCAGCTGAACCAGCTTTACCACCTTTACCACCGGACATTATATATTAAATTTGCTCTTGTTCTGTACTTTCCTAATTCTTATGTAAAAAGACAAGAATTTATGATACTATTTAATAACAAAAAACTACCTAAGAAAAGCATCATGCAGTCGAAATTGAAATCGAAAAGTAAAACTTTAACGGAACATGTTTGAAATTCTAAGAAAGCATACATCTTCATCCCTTATATATAGAGTTATGTTTGATATTAGTAGTCATGTTGTAATCTCTGGCCTAAGTATACGTAACGAAAATGGTAGCACGTCGCGTTTATGGCCCCCAGGTTAATGTGTTCTCTGAAATTCGCATCACTTTGAGAAATAATGGGAACACCTTACGCGTGAGCTGTGCCCACCGCTTCGCCTAATAAAGCGGTGTTCTCAAAATTTCTCCCCGTTTTCAGGATCACGAGCGCCATCTAGTTCTGGTAAAATCGCGCTTACAAGAACAAAGAAAAGAAACATCGCGTAATGCAACAGTGAGACACTTGCCGTCATATATAAGGTTTTGGATCAGTAACCGTTATTTGAGCATAACACAGGTTTTTAAATATATTATTATATATCATGGTATATGTGTAAAATTTTTTTGCTGACTGGTTTTGTTTATTTATTTAGCTTTTTAAAAATTTTACTTTCTTCTTGTTAATTTTTTCTGATTGCTCTATACTCAAACCAACAACAACTTACTCTACAACTA >YDR311W TFB1 TCTTTTATATGAAGCGGATTTGAACCAAAACCAGAGCCAACTTGTCGTTTTATATCAGAATCATCACTGACTGGTATGTCTGTGATGGATGGCAAAGCTTTAGCGTTCGCATCTGTATCTAGCTTCCTCAAACTATTAGCTTGATTTTGAGCACTGGTAAGTGCTAACGTATCTACGTCATCTTTGGGTCCAGACGGAAGTCTCTGTTCATTGGTTATGTTATCAGAAGGGGCTGTGGTGTTCTCAGACATCCCCGCAACAAACGAATTTTGTTAATTATGTATGAAACTTTTCGTTTGATCTCAATAATACCACTAGCGACTAAATTTTTATGATACTTAGCTACTTTAAACAAGTCCCTTGTGCTCTGTTTGCTGACACTTTTGATAAAATATGCCTGTGTATAATTCTTTTAGCAGTTTATTTCAAACACAAATGGTATTAAAAGGATAGATGAAAAAAAAAAAAAAAATTAAAGCCACTAGTAATGATACAATCGTGGTATCACAAGCGCTGAATGAAACAAGTGTGGCTATCTATAGCGGATGCAAGTGGAGAACTTGTGAATCCAAACTGAAATATTTTGCCATCATTTGTTGTCCTTTCCCTTTTCCATTCAGGAAAAAAAAAAAAAATTTGACGTCGCCGTCGCGTCGCAGTCATATAATTACAGCAATTTATCTTGTTGAACGACGCAAATTAATGGAAATTGTGACTTACATAGTAAGTATTAGTAAACGTAGTTAAGGCCACGTGGGAAAGATATGAAAGGAGTGTAAGTAATGGATATCGGTCTAACGAAAATGGAAACCAATCTTTAAAAATGATAGTATGATTCGACAGTAAACTAGAAAAGCCACAACCCGTGGGACATGATAAGGCTGCTCGTTTTTGACGCAATTTTTAGACAATACTGAAATTTAGCATAATAAGCTTTCCCAGTGAAAGTAATAATATTTAACCTAGGGTAGGGGTAGGGAAAAAATAAAAGTAAACCATA
and a matrix file example (from JASPAR):
>M00713 TBP 0 8 1 4 0 23 3 20 2 8 6 7 0 2 0 0 0 1 3 2 0 1 0 0 0 0 4 12 7 15 18 21 0 20 3 16 >M00728 ROX1 0 0 1 16 0 0 0 0 0 8 9 7 0 0 0 0 0 1 2 5 1 0 0 0 17 0 1 7 3 8 1 17 17 0 17 15
Matrices have four rows and n columns for the numbers of A,C,G, and T, respectively, in each binding site positions.
Sequences can span multiple lines.
Take care to avoid blank lines in all input files.
We have had good success using 10-50 promoters in each promoter file. ROVER is quite quick, so larger promoter sets are possible, but may not be biologically relevant. Both promoter files should contain an equal number of promoters of approximately the same length.
Options
Usage: java -Xmx 250m -jar rover.jar [-C] [-F] [-f] [-h] [-P pvalue] [-p pvalue] -C Pseudo-counts to add to each PWM cell. Default is 0.375. -p Cutoff for single site P-value. Default is 0.001. -P Cutoff for whole sequence P-value. Default is 0.01. -B File containing Fasta formatted background sequences. -F,--flat_base_frequencies ACGT have equal (or flat) background frequencies. -M File containing PSSMs. -S File containing Fasta formatted sequences. -f,--filter Filter out lower case characters (masked repeats). -h,--help Print help message.
The argument -Xmx250m tells java to let ROVER use 250Mb of memory. You can change 250m to another number to suit your system and data set.
The default sequence significance P-value cutoff is 0.01. This option only affects the output. It determines the cutoff for the overall significance of a sequence (multiple hits or single high-scoring hits).
The default individual cis-element significance cutoff is 0.001. This works well for promoters that are each of length 1000. We recommend adjusting this cutoff to approximately 1 / promoter length.
Output
The output is in an XML format we have described called CisML. CisML files contain the complete findings of ROVER as well as all information necessary to replicate a rover run. Our CisML website provides simple methods and explanations for generating various reports from CisML.Download ROVER Executable
Java JAR (Tested with Java 1.4.2) Last Updated 7-11-2005
Citing ROVER
ROVER was introduced as part of the CARRIE transcriptional regulatory network inference tool. Please use the following reference when citing ROVER: Haverty, PM., Hansen, U., Weng, Z. (2004) Computational Inference of Transcriptional Regulatory Networks from Expression Profiling and Transcription Factor Binding Site Identification. Nucleic Acids Research, Vol. 32, 179-188.Abstract PDF
Contact Us
Comments, Questions, and SuggestionsLast Modified:Thursday, 03-Nov-2005 00:29:58 EST