From CAGT
Nucleotide Count Matrices for a Selection of Cis-elements
Each matrix contains counts of adenines, cytosines, guanines, and thymines observed at each position in a sample of cis-elements of one type.
With current data, it is not possible to construct accurate matrices for each of the thousands of human transcription factors, or the tens of thousands of dimers. Fortunately, transcription factors naturally belong to families that posssess similar, though generally not identical, DNA binding properties. These matrices therefore represent approximate DNA binding representations for selected families of transcription factors. This list will grow in the future, and it will be necessary to accommodate factors that bind to motifs of varying length or half-site organization.
TATA box
- Bound by TBP (TATA binding protein), a component of TFIID
A | C | G | T |
---|---|---|---|
61 | 145 | 152 | 31 |
16 | 46 | 18 | 309 |
352 | 0 | 2 | 35 |
3 | 10 | 2 | 374 |
354 | 0 | 5 | 30 |
268 | 0 | 0 | 121 |
360 | 3 | 20 | 6 |
222 | 2 | 44 | 121 |
155 | 44 | 157 | 33 |
56 | 135 | 150 | 48 |
83 | 147 | 128 | 31 |
82 | 127 | 128 | 52 |
82 | 118 | 128 | 61 |
68 | 107 | 139 | 75 |
77 | 101 | 140 | 71 |
Source: Bucher P (1990) J Mol Biol 212, 563-578, Table 3
CCAAT box
- A motif often found in gene promoters
- Bound by the transcription factor NF-Y
A | C | G | T |
---|---|---|---|
52 | 47 | 41 | 22 |
22 | 51 | 38 | 52 |
16 | 67 | 40 | 41 |
80 | 19 | 60 | 5 |
68 | 7 | 78 | 11 |
0 | 164 | 0 | 0 |
2 | 161 | 1 | 0 |
160 | 0 | 1 | 3 |
164 | 0 | 0 | 0 |
0 | 1 | 0 | 163 |
20 | 88 | 50 | 6 |
96 | 10 | 55 | 3 |
21 | 28 | 98 | 17 |
58 | 57 | 33 | 16 |
47 | 11 | 72 | 34 |
34 | 53 | 40 | 35 |
Source: Mantovani R (1999) Gene 239(1), 15-27
Sp1
- A motif often found in gene promoters
- Bound by transcription factors in the Sp/KLF (kruppel-like factor) family
- DNA binding structure: three C2H2 type zinc fingers
A | C | G | T |
---|---|---|---|
32 | 21 | 35 | 20 |
24 | 20 | 56 | 8 |
14 | 10 | 65 | 19 |
17 | 1 | 89 | 1 |
0 | 0 | 108 | 0 |
0 | 2 | 106 | 0 |
19 | 80 | 0 | 9 |
2 | 5 | 99 | 2 |
0 | 1 | 99 | 8 |
21 | 5 | 76 | 6 |
17 | 10 | 72 | 9 |
3 | 55 | 21 | 29 |
9 | 40 | 32 | 27 |
Source: TRANSFAC 5.0 accession # M00196
AP-1 (Activator Protein 1)
- This motif is also known as the TRE (TPA (12-O-tetradecanoyl-phorbol-13-acetate) response element)
- The AP-1 transcription factor family consists of Fos and Jun subfamilies
- Jun proteins in mammals: c-Jun, JunB, JunD
- Fos proteins in mammals: c-Fos, FosB, Fra1, Fra2
- DNA binding structure: bZIP (basic region leucine zipper)
- AP-1 proteins bind as dimers with one another or with other bZIP proteins
- AP-1 dimers bind with maximum affinity to the TRE, and with lower affinity to the CRE
A | C | G | T |
---|---|---|---|
14 | 12 | 23 | 7 |
18 | 17 | 18 | 3 |
0 | 0 | 0 | 56 |
0 | 0 | 56 | 0 |
55 | 1 | 0 | 0 |
3 | 44 | 5 | 4 |
4 | 5 | 7 | 40 |
13 | 30 | 5 | 8 |
38 | 10 | 5 | 3 |
11 | 13 | 18 | 14 |
11 | 15 | 10 | 20 |
Source: TRANSFAC 5.0 accession # M00174
CRE (cAMP Response Element)
- Bound by members of the ATF/CREB (activating transcription factor / CRE binding) transcription factor family
- This family is probably not monophyletic, with some members more similar to AP-1 proteins than to other ATF/CREB proteins
- DNA binding structure: bZIP (basic region leucine zipper)
- They bind as dimers with one another, or with other bZIP proteins
A | C | G | T |
---|---|---|---|
2 | 6 | 7 | 5 |
1 | 7 | 11 | 1 |
0 | 0 | 0 | 20 |
0 | 0 | 20 | 0 |
20 | 0 | 0 | 0 |
0 | 20 | 0 | 0 |
1 | 0 | 19 | 0 |
1 | 5 | 0 | 14 |
10 | 8 | 0 | 2 |
14 | 1 | 4 | 1 |
2 | 6 | 6 | 6 |
3 | 9 | 6 | 2 |
Source: TRANSFAC 5.0 accession # M00178
Ets
- Mammals have over twenty Ets genes
- DNA binding structure: winged helix turn helix
- They bind DNA as monomers
A | C | G | T |
---|---|---|---|
7 | 3 | 10 | 5 |
15 | 5 | 12 | 5 |
2 | 17 | 18 | 1 |
29 | 7 | 3 | 0 |
0 | 0 | 39 | 0 |
0 | 0 | 39 | 0 |
39 | 0 | 0 | 0 |
33 | 0 | 0 | 6 |
10 | 2 | 26 | 1 |
6 | 8 | 6 | 18 |
7 | 1 | 13 | 2 |
Source: Mimeault M (2000) Crit Rev Oncog 11(3-4), 227-53
ERE (Estrogen Response Element)
Matrix updated 1.29.03
- Bound by the estrogen receptor
- DNA binding structure: C4 type zinc fingers
- Humans have two estrogen receptor genes: ERα and ERβ
- They bind DNA as dimers
A | C | G | T |
---|---|---|---|
16 | 3 | 4 | 2 |
2 | 0 | 23 | 0 |
5 | 2 | 18 | 0 |
2 | 0 | 3 | 20 |
0 | 21 | 4 | 0 |
22 | 0 | 3 | 0 |
0 | 14 | 7 | 4 |
3 | 16 | 3 | 3 |
6 | 7 | 5 | 7 |
3 | 0 | 2 | 20 |
1 | 0 | 24 | 0 |
19 | 3 | 3 | 0 |
0 | 25 | 0 | 0 |
1 | 20 | 2 | 2 |
Source: O'Lone R, Frith MC, Hansen U (in preparation)
GATA
- Vertebrates have six GATA factors: GATA-1 to 6
- DNA binding structure: two C4 type zinc fingers
A | C | G | T |
---|---|---|---|
15 | 1 | 11 | 10 |
10 | 11 | 12 | 15 |
11 | 25 | 9 | 3 |
25 | 2 | 2 | 19 |
0 | 0 | 48 | 0 |
48 | 0 | 0 | 0 |
0 | 0 | 0 | 48 |
48 | 0 | 0 | 0 |
27 | 1 | 16 | 4 |
11 | 8 | 24 | 5 |
12 | 8 | 18 | 10 |
15 | 13 | 16 | 4 |
8 | 13 | 14 | 12 |
Source: TRANSFAC 5.0 accession # M00128
Myc
Matrix corrected 11.18.02
- Binding motif for Myc-Max dimers
- DNA binding structure: basic helix-loop-helix leucine-zipper
- Humans have 3 Myc genes: c-Myc, N-Myc and L-Myc, and 2 Max genes: Max and Mlx
- This motif is a specialization of the broader E-box family, which is recognized by many bHLH and bHLHzip proteins.
A | C | G | T |
---|---|---|---|
2 | 4 | 8 | 0 |
3 | 7 | 3 | 1 |
0 | 14 | 0 | 0 |
14 | 0 | 0 | 0 |
0 | 10 | 0 | 4 |
0 | 0 | 14 | 0 |
0 | 2 | 0 | 12 |
0 | 0 | 14 | 0 |
2 | 6 | 4 | 2 |
3 | 7 | 2 | 2 |
Source: Grandori C & Eisenman RN (1997) TIBS 22 177-181, Table 1
Myf (Myogenin / MyoD family)
- The family of myogenic regulatory factors has 4 members in mammals: MyoD, myogenin, myf5, and MRF4
- DNA binding structure: basic helix-loop-helix
- They bind DNA as heterodimers with E12 or E47 proteins (splice variants of the E2A gene).
- This motif is a specialization of the broader E-box family, which is recognized by many bHLH and bHLHzip proteins.
A | C | G | T |
---|---|---|---|
3.5 | 4 | 0.5 | 0 |
4.5 | 0 | 3.5 | 0 |
2 | 1 | 5 | 0 |
0 | 7.5 | 0.5 | 0 |
8 | 0 | 0 | 0 |
3.5 | 0 | 4.5 | 0 |
0 | 7.5 | 0.5 | 0 |
3 | 0 | 0 | 5 |
0 | 0 | 8 | 0 |
0 | 5 | 3 | 0 |
3 | 0 | 0 | 5 |
0 | 0 | 8 | 0 |
Source: Wasserman WW, Fickett JW (1998) J Mol Biol 278, 167-81
E2F
- DNA binding structure: bHLH (basic region helix loop helix)
- E2F proteins bind DNA as heterodimers with DP proteins
- Mammals have six E2F genes: E2F1 to 6
- Mammals have two DP genes: DP1 and DP2
A | C | G | T |
---|---|---|---|
4 | 4 | 4 | 33 |
4 | 2 | 3 | 36 |
0 | 2 | 4 | 39 |
1 | 23 | 21 | 0 |
0 | 4 | 41 | 0 |
0 | 45 | 0 | 0 |
0 | 0 | 45 | 0 |
0 | 32 | 13 | 0 |
1 | 13 | 26 | 5 |
24 | 11 | 5 | 5 |
26 | 1 | 8 | 10 |
24 | 5 | 12 | 4 |
Source: Kel AE, Kel-Margoulis OV et al. (2001) J Mol Biol 309(1), 99-120
NF1
- Vertebrates have four NF1 transcription factors: NF1-A, NF1-B, NF1-C, NF1-X
- They bind DNA as dimers
A | C | G | T |
---|---|---|---|
17 | 18 | 14 | 26 |
17 | 24 | 14 | 20 |
5 | 11 | 1 | 58 |
0 | 0 | 1 | 74 |
0 | 0 | 75 | 0 |
1 | 0 | 74 | 0 |
5 | 67 | 2 | 1 |
30 | 13 | 8 | 19 |
23 | 20 | 20 | 12 |
16 | 16 | 32 | 11 |
20 | 19 | 18 | 18 |
34 | 7 | 19 | 15 |
10 | 17 | 27 | 21 |
10 | 41 | 9 | 15 |
11 | 40 | 12 | 12 |
30 | 18 | 9 | 18 |
27 | 12 | 21 | 15 |
22 | 14 | 20 | 19 |
Source: TRANSFAC 5.0 accession # M00193
LSF
Matrix corrected 12.09.02 - thanks to Vivek Ramaswamy. (The web tools used the correct matrix all along.)
- Mammals have three LSF genes
- They bind DNA as tetramers
A | C | G | T |
---|---|---|---|
5 | 0 | 11 | 3 |
0 | 17 | 2 | 0 |
2 | 0 | 0 | 17 |
0 | 0 | 17 | 2 |
0 | 1 | 18 | 0 |
2 | 3 | 6 | 8 |
1 | 2 | 1 | 15 |
4 | 6 | 2 | 7 |
2 | 3 | 10 | 4 |
6 | 6 | 4 | 3 |
5 | 1 | 9 | 4 |
0 | 17 | 2 | 0 |
1 | 6 | 2 | 10 |
5 | 1 | 8 | 5 |
0 | 5 | 14 | 0 |
Source: Frith MC, Hansen U, Weng Z (2001) Bioinformatics 17(10), 878-889
Mef-2 (Myocyte Enhancer Factor 2)
- Vertebrates have four MEF2 genes: MEF2A, MEF2B, MEF2C, MEF2D
- DNA binding structure: MADS box
- They bind DNA as dimers
- They possess the MEF domain, which causes them to dimerize only with other MEF2 proteins
A | C | G | T |
---|---|---|---|
5 | 0 | 4.5 | 1.5 |
0 | 1 | 10 | 0 |
0 | 6 | 1 | 4 |
0 | 0 | 0 | 11 |
11 | 0 | 0 | 0 |
0 | 0 | 0 | 11 |
3 | 0 | 0 | 8 |
1 | 0 | 0 | 10 |
1.5 | 0 | 0 | 9.5 |
2 | 0 | 0 | 9 |
11 | 0 | 0 | 0 |
5 | 0 | 5 | 1 |
Source: Wasserman WW, Fickett JW (1998)
SRF (Serum Response Factor)
- This motif is also known as the CArG box
- Mammals have one SRF gene, with several splice variants
- DNA binding structure: MADS box
- SRF binds DNA as a homodimer
A | C | G | T |
---|---|---|---|
3.5 | 2.5 | 3.5 | 1 |
4.5 | 1 | 2 | 3 |
0 | 10.5 | 0 | 0 |
0 | 8.5 | 0 | 2 |
9 | 0 | 0 | 1.5 |
4.5 | 0 | 0 | 6 |
7.5 | 0.5 | 2 | 0.5 |
4 | 0 | 1 | 5.5 |
10.5 | 0 | 0 | 0 |
7 | 1 | 0 | 2.5 |
0 | 0 | 10.5 | 0 |
0 | 0 | 10.5 | 0 |
3.5 | 4 | 3 | 0 |
Source: Wasserman WW, Fickett JW (1998)
Tef (Transcription Enhancer Factor)
- Mammals have four TEF genes: TEF-1, TEF-3, TEF-4, TEF-5
- DNA binding structure: TEA/ATTS domain
A | C | G | T |
---|---|---|---|
0.5 | 3 | 0.5 | 2 |
4.5 | 0 | 1.5 | 0 |
0 | 6 | 0 | 0 |
6 | 0 | 0 | 0 |
0 | 0 | 0 | 6 |
0 | 0 | 0 | 6 |
0 | 6 | 0 | 0 |
0 | 5.5 | 0 | 0.5 |
2.5 | 0 | 0 | 3.5 |
0.5 | 3.5 | 2 | 0 |
1 | 2 | 1.5 | 1.5 |
0 | 1 | 4 | 1 |
Source: Wasserman WW, Fickett JW (1998)