Home      Labs      Publications      People      Tools   

From CAGT

Nucleotide Count Matrices for a Selection of Cis-elements

Each matrix contains counts of adenines, cytosines, guanines, and thymines observed at each position in a sample of cis-elements of one type.

With current data, it is not possible to construct accurate matrices for each of the thousands of human transcription factors, or the tens of thousands of dimers. Fortunately, transcription factors naturally belong to families that posssess similar, though generally not identical, DNA binding properties. These matrices therefore represent approximate DNA binding representations for selected families of transcription factors. This list will grow in the future, and it will be necessary to accommodate factors that bind to motifs of varying length or half-site organization.

TATA box

  • Bound by TBP (TATA binding protein), a component of TFIID
ACGT
6114515231
164618309
3520235
3102374
3540530
26800121
3603206
222244121
1554415733
5613515048
8314712831
8212712852
8211812861
6810713975
7710114071

Source: Bucher P (1990) J Mol Biol 212, 563-578, Table 3

CCAAT box

  • A motif often found in gene promoters
  • Bound by the transcription factor NF-Y
ACGT
52474122
22513852
16674041
8019605
6877811
016400
216110
160013
164000
010163
2088506
9610553
21289817
58573316
47117234
34534035

Source: Mantovani R (1999) Gene 239(1), 15-27

Sp1

  • A motif often found in gene promoters
  • Bound by transcription factors in the Sp/KLF (kruppel-like factor) family
  • DNA binding structure: three C2H2 type zinc fingers
ACGT
32213520
2420568
14106519
171891
001080
021060
198009
25992
01998
215766
1710729
3552129
9403227

Source: TRANSFAC 5.0 accession # M00196

AP-1 (Activator Protein 1)

  • This motif is also known as the TRE (TPA (12-O-tetradecanoyl-phorbol-13-acetate) response element)
  • The AP-1 transcription factor family consists of Fos and Jun subfamilies
  • Jun proteins in mammals: c-Jun, JunB, JunD
  • Fos proteins in mammals: c-Fos, FosB, Fra1, Fra2
  • DNA binding structure: bZIP (basic region leucine zipper)
  • AP-1 proteins bind as dimers with one another or with other bZIP proteins
  • AP-1 dimers bind with maximum affinity to the TRE, and with lower affinity to the CRE
ACGT
1412237
1817183
00056
00560
55100
34454
45740
133058
381053
11131814
11151020

Source: TRANSFAC 5.0 accession # M00174

CRE (cAMP Response Element)

  • Bound by members of the ATF/CREB (activating transcription factor / CRE binding) transcription factor family
  • This family is probably not monophyletic, with some members more similar to AP-1 proteins than to other ATF/CREB proteins
  • DNA binding structure: bZIP (basic region leucine zipper)
  • They bind as dimers with one another, or with other bZIP proteins
ACGT
2675
17111
00020
00200
20000
02000
10190
15014
10802
14141
2666
3962

Source: TRANSFAC 5.0 accession # M00178

Ets

  • Mammals have over twenty Ets genes
  • DNA binding structure: winged helix turn helix
  • They bind DNA as monomers
ACGT
73105
155125
217181
29730
00390
00390
39000
33006
102261
68618
71132

Source: Mimeault M (2000) Crit Rev Oncog 11(3-4), 227-53

ERE (Estrogen Response Element)

Matrix updated 1.29.03

  • Bound by the estrogen receptor
  • DNA binding structure: C4 type zinc fingers
  • Humans have two estrogen receptor genes: ERα and ERβ
  • They bind DNA as dimers
ACGT
16342
20230
52180
20320
02140
22030
01474
31633
6757
30220
10240
19330
02500
12022

Source: O'Lone R, Frith MC, Hansen U (in preparation)

GATA

  • Vertebrates have six GATA factors: GATA-1 to 6
  • DNA binding structure: two C4 type zinc fingers
ACGT
1511110
10111215
112593
252219
00480
48000
00048
48000
271164
118245
1281810
1513164
8131412

Source: TRANSFAC 5.0 accession # M00128

Myc

Matrix corrected 11.18.02

  • Binding motif for Myc-Max dimers
  • DNA binding structure: basic helix-loop-helix leucine-zipper
  • Humans have 3 Myc genes: c-Myc, N-Myc and L-Myc, and 2 Max genes: Max and Mlx
  • This motif is a specialization of the broader E-box family, which is recognized by many bHLH and bHLHzip proteins.
ACGT
2480
3731
01400
14000
01004
00140
02012
00140
2642
3722

Source: Grandori C & Eisenman RN (1997) TIBS 22 177-181, Table 1

Myf (Myogenin / MyoD family)

  • The family of myogenic regulatory factors has 4 members in mammals: MyoD, myogenin, myf5, and MRF4
  • DNA binding structure: basic helix-loop-helix
  • They bind DNA as heterodimers with E12 or E47 proteins (splice variants of the E2A gene).
  • This motif is a specialization of the broader E-box family, which is recognized by many bHLH and bHLHzip proteins.
ACGT
3.540.50
4.503.50
2150
07.50.50
8000
3.504.50
07.50.50
3005
0080
0530
3005
0080

Source: Wasserman WW, Fickett JW (1998) J Mol Biol 278, 167-81

E2F

  • DNA binding structure: bHLH (basic region helix loop helix)
  • E2F proteins bind DNA as heterodimers with DP proteins
  • Mammals have six E2F genes: E2F1 to 6
  • Mammals have two DP genes: DP1 and DP2
ACGT
44433
42336
02439
123210
04410
04500
00450
032130
113265
241155
261810
245124

Source: Kel AE, Kel-Margoulis OV et al. (2001) J Mol Biol 309(1), 99-120

NF1

  • Vertebrates have four NF1 transcription factors: NF1-A, NF1-B, NF1-C, NF1-X
  • They bind DNA as dimers
ACGT
17181426
17241420
511158
00174
00750
10740
56721
3013819
23202012
16163211
20191818
3471915
10172721
1041915
11401212
3018918
27122115
22142019

Source: TRANSFAC 5.0 accession # M00193

LSF

Matrix corrected 12.09.02 - thanks to Vivek Ramaswamy. (The web tools used the correct matrix all along.)

  • Mammals have three LSF genes
  • They bind DNA as tetramers
ACGT
50113
01720
20017
00172
01180
2368
12115
4627
23104
6643
5194
01720
16210
5185
05140

Source: Frith MC, Hansen U, Weng Z (2001) Bioinformatics 17(10), 878-889

Mef-2 (Myocyte Enhancer Factor 2)

  • Vertebrates have four MEF2 genes: MEF2A, MEF2B, MEF2C, MEF2D
  • DNA binding structure: MADS box
  • They bind DNA as dimers
  • They possess the MEF domain, which causes them to dimerize only with other MEF2 proteins
ACGT
504.51.5
01100
0614
00011
11000
00011
3008
10010
1.5009.5
2009
11000
5051

Source: Wasserman WW, Fickett JW (1998)

SRF (Serum Response Factor)

  • This motif is also known as the CArG box
  • Mammals have one SRF gene, with several splice variants
  • DNA binding structure: MADS box
  • SRF binds DNA as a homodimer
ACGT
3.52.53.51
4.5123
010.500
08.502
9001.5
4.5006
7.50.520.5
4015.5
10.5000
7102.5
0010.50
0010.50
3.5430

Source: Wasserman WW, Fickett JW (1998)

Tef (Transcription Enhancer Factor)

  • Mammals have four TEF genes: TEF-1, TEF-3, TEF-4, TEF-5
  • DNA binding structure: TEA/ATTS domain
ACGT
0.530.52
4.501.50
0600
6000
0006
0006
0600
05.500.5
2.5003.5
0.53.520
121.51.5
0141

Source: Wasserman WW, Fickett JW (1998)

Views
Protein Engineering